I wrote a generic parser in JavaScript, gepars. Currently, I’m using it in my projects, along with gelex (generic lexer) and geast (generic Abstract Syntax Tree). Written using TDD (Test-Driven Development) the interesting part is that I found it very useful: I could write some interpreters and compilers, with a few lines of code. Usually, the generic parser generates a representation, an Abstract Syntax Tree. The hard part is to interpreter or transform the AST to machine code. But parsing the source code, now it is easier.

A simple code to generate a constant node in generic AST:

The integer: notation refers to an integer detecter by the parser. The funcion receives value as the string value of that token. And returns the constant AST node.

Parsing a real number:

Parsing boolean constants (notice the use of lexer token WITH the specified value):

We can define string` and other constant types. A variable referred by name, could be defined as:

A basic term in an expression:

Notice the definition of an array in the last definition: a sequence of parts to be parsed, describing a generic expression in parentheses.

A generic expression could be defined taking into account operator precedence and associativity:

where the binary operators are:

The parser can manage left recursion, and expand appropriately:

where an expression1 could have a left expression1 .

Having a lexer definition, usually I parse a text using code like:

type is the non-terminal part to be parse, ie expression , term , etc.

These days, I’m using gepars in these personal projects:

  • Walang: programming language to be compiled to WebAssembly
  • Selang: simple programming language for Ethereum smart contracts
  • Rlie: R-like programming language interpreter
  • Erlie: Erlang-like programming language interpreter
  • Solcom: Solidity programming language compiler to Ethereum virtual machine bytecodes.

Related posts:

Geast, a generic Abstract Syntax Tree

Gelex, a generic lexer

Angel “Java” Lopez