A library for parsing Backus–Naur form context-free grammars

Wikipedia page on Backus-Naur form

bnf

A library for parsing Backus–Naur form context-free grammars.

What does a parsable BNF grammar look like?

The following grammar from the exemplifies a compatible grammar. (*Note: parser allows for an optional ';' to indicate the end of a production)

 <postal-address> ::= <name-part> <street-address> <zip-part>

      <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
                    | <personal-part> <name-part>

  <personal-part> ::= <initial> "." | <first-name>

 <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

       <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>

<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
    <opt-apt-num> ::= <apt-num> | ""

Output

Take the following grammar for DNA sequences to be input to this library's parse function.

<dna> ::= <base> | <base> <dna>
<base> ::= "A" | "C" | "G" | "T"

The output is a Grammar object representing a tree that looks like this:

Grammar {
    productions: [
        Production {
            lhs: Nonterminal(
                "dna"
            ),
            rhs: [
                Expression {
                    terms: [
                        Nonterminal(
                            "base"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Nonterminal(
                            "base"
                        ),
                        Nonterminal(
                            "dna"
                        )
                    ]
                }
            ]
        },
        Production {
            lhs: Nonterminal(
                "base"
            ),
            rhs: [
                Expression {
                    terms: [
                        Terminal(
                            "A"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "C"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "G"
                        )
                    ]
                },
                Expression {
                    terms: [
                        Terminal(
                            "T"
                        )
                    ]
                }
            ]
        }
    ]
}

Once the Grammar object is populated, to generate a random sentence from it call the object's generate function. grammar.generate(). For the above grammar you could expect something like TGGC or AG.

If the generate function can't find a production for a nonterminal it tries to evaluate it will print the identifer as a nonterminal, i.e. <identifier>.

The generate function will return an error if it detects an infinite loop caused by a production such as <PATTERN> ::= <PATTERN>.

Parse Example

extern crate bnf;
use bnf::Grammar;

fn main() {
    let input =
        "<postal-address> ::= <name-part> <street-address> <zip-part>

            <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
                            | <personal-part> <name-part>

        <personal-part> ::= <initial> \".\" | <first-name>

        <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

            <zip-part> ::= <town-name> \",\" <state-code> <ZIP-code> <EOL>

        <opt-suffix-part> ::= \"Sr.\" | \"Jr.\" | <roman-numeral> | \"\"
            <opt-apt-num> ::= <apt-num> | \"\"";

    let grammar: Result<Grammar, _> = input.parse();
    match grammar {
        Ok(g) => println!("{:#?}", g),
        Err(e) => println!("Failed to make grammar from String: {}", e),
    }
}

Generate Example

extern crate bnf;
use bnf::Grammar;

fn main() {
    let input =
        "<dna> ::= <base> | <base> <dna>
        <base> ::= \"A\" | \"C\" | \"G\" | \"T\"";
    let grammar: Grammar = input.parse().unwrap();
    let sentence = grammar.generate();
    match sentence {
        Ok(s) => println!("random sentence: {}", s),
        Err(e) => println!("something went wrong: {}!", e)
    }
}
Issues

Collection of the latest Issues

shnewto

shnewto

question
2

I'd like to be smarter about how we generate random sentences from a given grammar.

There's a danger that a valid grammar can result in an infinite loop when randomly generating, so atm we just take an guess at if we're looping in a way that looks like it's not going to stop any time soon by checking the stack and bailing before we recurse too far.

For instance the case used in the tests is <nonterm> ::= <nonterm>. We can't get to a terminal state so we're doomed to failure if we try to generate a sentence.

I'm not sure what an ideal solution is here, for one, it'd be nice to figure something that didn't have to short circuit the potential for "maybe infinite loops".

Maybe there's a way to check for impossible grammars for the generate logic first and error early if that's the case. Then... if we have the potential for a terminal state but aren't reaching it because of random chance we need to do something sensible... maybe allow a configurable max generated length and do the work to find some terminal nodes when we get there?

If can manage a max length, I like the idea of implementing a minimum length too 🤔

hbina

hbina

3

Is your feature request related to a problem? Please describe. Most uses of BNFs out there are not formal i.e. they are meant to be human readable. As such, they contain non-standard features such as optional. For instance, consider this RFC that does not use standard BNF to describe Preferred Name Syntax for DNS

Additionally, are you interested in supporting EBNF and validating a given input.

Describe the solution you'd like I would like to have these features. I am not sure how fine the configuration can be done. Not sure entirely how this is done in Rust but if its like #ifdef in C then I think its fair to want the code to be readable.

Describe alternatives you've considered I think all of these features are expressable in terms of standard BNF. So users can always make that manual transformation.

Additional context I needed this while trying to validate (Another feature I want) a given string with the BNF from above. See Wikipedia's page for BNF Variants

  1. https://tools.ietf.org/html/rfc1035#section-2.3.1
  2. https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form#Variants
CrockAgile

CrockAgile

enhancement
1

Clippy has a lot of best practices, but BNF is in violation of a lot of them. Some would be breaking changes (like removing the non std::str::FromStr from_str public functions). Other changes require updating the parsers to nom 5 best practices.

Once that is done, would just need to revert previous attempt to re-enable clippy CI.

sergeyklay

sergeyklay

enhancement
5

Hello,

First of all wanted to say thank you for your hard work on this project.

What do you think about creating step by step example with using simple grammar? It seems to me this would help to see the usual work flow and may show general concepts better.

FYI: There is a template to make a good README.md https://gist.github.com/PurpleBooth/109311bb0361f32d87a2

shnewto

shnewto

enhancement
0

We need a way to allow incorporating comments into a grammar. It seems somewhat standard to use the ; to indicate the start of a comment and a \n to close it. It'd be a good add I think to make that work.

A couple initial questions:

  • Do we look for the ; all along the way?
  • Do we make a "first pass" that strips all comments before parsing?
CrockAgile

CrockAgile

enhancement
1

If #33 is addressed and bnf becomes a real parser generator, it could also become self-hosting, which would be pretty fun.

CrockAgile

CrockAgile

enhancement
1

core::ops allows implementing custom logic for operators. Would implementing these for Grammar/Production/Expression/Term enable any cleaner syntax or usage? If so, which of these operators would be included?

  • ops::Add (+)
    • Would this add same types (Grammar + Grammar -> Grammar)?
    • Would this add a component (Grammar + Production -> Grammar)?
    • Would this add components (Production + Production -> Grammar)?
    • Would this do all the above?
    • If this is implemented, what is the behavior or subtraction?
  • ops::AddAssign (+=)
    • Same question as Add
  • ops::BitOr ( | )
    • If Add is used only for components, could Or be used as a kind of union operator?
    • Should this only be implemented for Term | Term and Term | Expression to allow expr = Term | Term | Term?
  • ops::BitXor ( ^ )
    • If BitOr is used as a union operator, should BitXor be used for set difference?

These were the operators I identified as possible enhancements, but feel free to add more to the discussion that may be useful.

Also, as a possible alternative some of these use cases could be handled by macros to allow for special case syntax:

expression![ term1 | term2 | term3 ] -> Expression::from_parts(vec![ term1, term2, term3 ])

But that would require the user importing BNF's macros.

vityafx

vityafx

enhancement
6

When I was in the university there was a program for learning grammatics in the programming languages theory course. It was very useful for learners to see how the code was parsed in the grammar tree. Can we do the same thing? It must not be so hard, basically, we already have it while we output Grammar as a tree in the README file, the same thing is needed but with real data which could come not from the random generator but from the user input - this is how learners learn how to write grammar correctly.

shnewto

shnewto

3

Thinking about future features that could add value somewhere down the road. Initial ideas on design might be something like the following?

Alternatively we might be able to just make bnf::parse smart enough to intuit the variant used.

Versions

Find the latest versions by id

0.3.4 - Dec 20, 2021

What's Changed

New Contributors

Full Changelog: https://github.com/shnewto/bnf/compare/0.3.3...0.3.4

0.3.3 - Dec 30, 2020

Using the serde crate's Serialize/Deserialize in the derive macros on Gramamr, Production, Expression, and Term structs / enums

0.3.2 - Dec 22, 2020

  • Update nom and rand dependencies to latest.
  • Using eof parser from nom 6.0 instead of homerolled eoi
  • Increased threshold for "stack redzone" to address some issues where infinite loops could overflow the stack despite our checks to prevent it.

0.3.1 - Oct 04, 2020

Update rand dependency.

0.2.7 - Oct 31, 2019

Fixes for things that flagged us for future breakages on a crater run. Thanks @z2oh!

0.2.6 - Aug 25, 2019

  • Upgrade nom dependency
  • Get rid of a build warning.

0.2.4 - May 07, 2019

Information - Updated Jan 10, 2022

Stars: 137
Forks: 15
Issues: 13

Repositories & Extras

This is an example of a Rust server that functions as a remote schema for...

Rust + Hasura Rust server that functions as a Hasura

This is an example of a Rust server that functions as a remote schema for...

Newport Engine is a modular 2D and 3D game engine built in Rust for Rust

It is designed to be easily extendable and easy to use

Newport Engine is a modular 2D and 3D game engine built in Rust for Rust

Newport Engine is a modular 2D and 3D game engine built in Rust for Rust

It is designed to be easily extendable and easy to use

Newport Engine is a modular 2D and 3D game engine built in Rust for Rust

liboqs-rust: Rust bindings for liboqs

Qyantum Safe liboqs rust bindings

liboqs-rust: Rust bindings for liboqs

msgflo-rust: Rust participant support for MsgFlo

Flowhub visual programming IDE

msgflo-rust: Rust participant support for MsgFlo

Trojan-rust is a rust implementation for Trojan protocol that is targeted to circumvent GFW

Trojan protocol that is targeted to circumvent tokio-rs to achieve high performance async io

Trojan-rust is a rust implementation for Trojan protocol that is targeted to circumvent GFW
Actix

1.0K

How to be a full stack Rust Developer

Read Rust the Rust blog posts at Steadylearner

How to be a full stack Rust Developer

Rust library translation (rust-src/rust-std/stdlib/rustlib translation)

This is the place to translate Having a documentation in your native language is essential if you don't speak English, and still enjoyable even if...

Rust library translation (rust-src/rust-std/stdlib/rustlib translation)

False Positive for rust-lang/rust#83583

The deprecation lint proc_macro_derive_resolution_fallback is intended to catch proc macro generated code that refers to items from parent modules that should not be in scope:

False Positive for rust-lang/rust#83583

A CHIP-8 &amp; SuperChip interpreter written in Rust using rust-sdl2

If you're getting compile errors it may be because

A CHIP-8 &amp; SuperChip interpreter written in Rust using rust-sdl2

Rust-Svelte-on-Rust

Starter template for Rocket backend server

Rust-Svelte-on-Rust
Facebook Instagram Twitter GitHub Dribbble
Privacy