Rust offers very good BNF parsers, but they are generally designed for the backend. With WebAssembly compilation, this is no longer really a problem. In this article, we build the foundations of an interactive sandbox with Leptos and Trunk to debug BNF entirely in the browser.
From regexes...
Lately, I had to parse biblical references from the content of daily publications, spanning over 15 years and dozens of contributors. For instance, “Jn 12.3-4” means verses 3 and 4 of the 12th chapter of the book of John — but so does “Joh 12.3,4” or “J12.3 and 4,” depending on the author.
As anyone would do, I started with regexes. A lot of regexes. Like this:
let main_regex = Regex::new(
r"(?:(?P<book>\d?[\wÉéèêîïâäàçôöûü]+)[ ]+)?(?P<verses>(?:(?:(?:\d+)[: ]+)?(?:(?:(?:\d+)-(?:\d+))|(?:\d+))(?:,|,[ ]|[ ]et[ ])?)+)"
)?;
let chapters_regex = Regex::new(
r"(?:(?P<chapter>\d+)[: ])(?P<verses>(?:(?:(?:(?:\d+)-(?:\d+))|(?:\d+))(?:,|,[ ]|[ ]et[ ]))*)(?P<last>(?:(?:(?:\d+)-(?:\d+))|(?:\d+))$)?",
)?;
let verse_range_regex = Regex::new(
r"(?:(?:(?:(?P<start>\d+)-(?P<end>\d+))|(?P<verse>\d+))(?:,|,[ ]|[ ]et[ ])?)"
)?;
... to Backus-Naur forms
Tired of doing this kind of monstrosities, and because of my very limited ability to debug this kind of things, even with the help of online playground, I looked at other technics.
Backus-Naur forms (BNF), are one of those. It is a formal representation of a grammar (a set of rules describing a valid or invalid input). For instance, a number would be defined as:
<number> ::= <digit> | <number> <digit>
<digit> ::= '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
Quite formal, I admit. We don't get the nice ranges and helpers from regexes. Actually, in the most formal sense, BNF describe a larger set: Context free grammars. (But in practice, most regex implementation have a few additional features making them more capable than theory).
Nicely though, we get an expressive description of what a valid sentence can be to respect a grammar. Previous example reads:
A digit is any of 1, 2, 3, 4, 5, 6, 7, 8, 9 or 0 A number is either a single digit, or a digit followed by something that is a number
This expressivity makes BNF nice to describe formal specifications.
Parsers
As you can guess, there exist some parsers for bnfs. I used for instance the bnf crate. Rust if typically a nice language to do this kind of parsing.
It works nicely, building a tree representing the result of the parsing. For instance, in the previous example, we would get
<number> ::= <number> <digit>
├── <number> ::= <digit>
│ └── <digit> ::= "1"
│ └── "1"
└── <digit> ::= "2"
└── "2"
On the Frontend
Nice, we have a way to describe grammars, we have a parser. If only we could have a nice little playground to debug that! Let's create a small server for that, and a react front...
Wait! Why?! Let's just run that in the browser!
Does it work? Perfectly!
Leptos and trunk
Leptos is a rust web framework, supporting serverside rendering, and client side rendering (using trunk). To begin, we install
cargo install trunk
cargo init epigrammars
cd epigrammars
cargo add leptos --features=csr
cargo add bnf
rustup target add wasm32-unknown-unknown
Then we can crate out app: let's just add 2 text areas. The reactivity is implemented
with signals. We connect them on the prop:value
and on:input:target
of the text fields.
Then in our view we can display the input and the grammar bellow.
use leptos::prelude::*;
fn main() {
leptos::mount::mount_to_body(App)
}
const EXAMPLE_GRAMMAR: &str = r#"<number> ::= <digit> | <number> <digit>
<digit> ::= '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
"#;
#[component]
pub fn App() -> impl IntoView {
let (grammar, set_grammar) = signal(EXAMPLE_GRAMMAR.to_string());
let (input, set_input) = signal("12".to_string());
view! {
<div>
<h2>Grammar</h2>
<textarea
on:input:target=move |ev| { set_grammar.set(ev.target().value()); }
prop:value=grammar />
</div>
<div>
<h2>Input</h2>
<textarea
on:input:target=move |ev| { set_input.set(ev.target().value()); }
prop:value=input />
</div>
<p>Input: {input}</p>
<p>Grammar: {grammar}</p>
}
}

For now, grammar is of type ReadSignal<String>
. We would like to parse to a bnf
grammar representation (and updated it if the input change. To derive new signals from
existing ones, and update only if the inputs change, we can use Memo
:
let parsed_grammar: Memo<Result<Grammar, String>> = Memo::new(
move |_| match Grammar::from_str(&grammar.get()) {
Ok(grammar) => Ok(grammar),
Err(bnf::Error::ParseError(e)) => Err(format!("Invalid grammar: {:}", e)),
Err(e) => Err(format!("Failed to parse grammar, unknown error: {:#?}", e)),
}
);
In the memo closure, we have to get the current value of the dependant signal (grammar.get()
).
Now, we can derive another signal with the parsed input:
let output = Memo::new(move |_| match parsed_grammar.get() {
Ok(grammar) => {
let binding = input.get();
grammar
.parse_input(&binding)
.map(|output| format!("{}", output))
.collect::<Vec<_>>()
.join("\n")
}
Err(message) => message,
});
And finally, print it:
<pre> {output} </pre>

Going further
This very simple example can be refined further, for instance by adding another recursive component to display the nested messages, and by adding buttons to choose the production to parse with.
