## Parsing Expressions Grammar

PEG are a (no longer so) new approach to parsing.

Clibutl offers a set of functions to write parsers in C without having to use an external "parser generator".

Rather than a dry list of functions, I'll present here a short tutorial on how to build a simple parser.

The approach to PEG used by Clibutl is based on the article:

R. R. Redziejowski. Parsing Expression Grammar as a Primitive Recursive-Descent Parser with Backtracking. Fundamenta Informaticae 79, 3-4 (2007), pp. 513 - 524.

### Define the grammar

First of all, let's define a grammar. I'll use the classical example of numerical expressions. A full example can be found here: https://github.com/rdentato/clibutl/blob/master/examples/calculator/calculator.c . Let's start simple, just additions and subtractions:expr ::= term (op term)* term ::= num op ::= '+' | '-' num ::= digit+ digit ::= '0' | '1' | ... | '9'

Each non-terminal is represented by a

`pegrule`. This is how

`term`looks like:

pegrule(expr) { pegref(term); pegstar { pegref(op); pegref(term); } }

The

`pegref()`function allows referencing other non-terminals, the

`pegstar`

*instruction*represent unbounded optional repetitions. It is important to notice that the code above is valid C code. Of course after having included

`utl.h`!

Let's write down the entire grammar:

pegrule(expr) { pegref(term); pegstar { pegref(op); pegref(term); } } pegrule(term) { pegref(num); } pegrule(op) { pegchoice { pegeither { pegstring("+"); } pegor { pegstring("-"); } } } pegrule(num) { pegplus { pegref(digit); } } pegrule(digit) { pegoneof("0123456789"); }As you can see, this is an almost one-to-one translation of the original grammar. I hope this feature will make peg functions easier to use.

The rule for

`op`shows how to express the PEG ordered choice. The

`pegchoice`instruction reminds the

`switch`statment where the either/or branches are similar to the the

`case`clauses.

The

`pegplus`is similar to

`pegstar`but requires at least one match. You can guess there's also a

`pegopt`instruction (not shown in this example) for optional match.

The

`pegstring()`and

`pegoneof()`functions are atomic recognizers that, respectively, match that specified string or one of the characters in the string.

### Do the parse

With those simple instructions we have now defined a

*recognizing parser*, i.e. a parser that can tell if a string abides to the specified grammar or not. Let's see how to invoke it:

#define MAX_EXPR_LEN 256 char expression[MAX_EXPR_LEN]; int main(int argc, char *argv []) { peg_t parser; parser = pegnew(); while (fgets(expression,MAX_EXPR_LEN,stdin)) { if (pegparse(parser,expr,expression)) printf("(%.*s) is a valid expression\n",pegpos(parser)-expression,expression); else printf("(%s) is not a valid expression\n",expression); fflush(stdout); } parser = pegfree(parser); exit(0); }The

`pegparse()`does all the magic. It takes a

*parser*created with

`pegnew()`, a peg rule to be matched and a string and returns a true value if the initial portion of the string matches the rule or 0 if the string doesn't match at all.

Here some examples of the program output:

"5+4" -> (5+4) is a valid expression "1+3XX" -> (1+3) is a valid expression "YZX" -> (YZX) is not a valid expression "3 + 2" -> (3) is a valid expressionThe last example shows that we need to explicitly take care of white spaces. The recognizer

`pegwspace`matches a possibly empty sequence of white spaces ('\t' and ' '), while

`pegvspace`also matches

*vertical*space like '\r' and '\n'. Look at the

`expr2.c`listing to see how simply you can handle spaces.