Gepars: a generic parser

Angel Java Lopez
3 min readMay 18, 2019

--

I wrote a generic parser in JavaScript, gepars. Currently, I’m using it in my projects, along with gelex (generic lexer) and geast (generic Abstract Syntax Tree). Written using TDD (Test-Driven Development) the interesting part is that I found it very useful: I could write some interpreters and compilers, with a few lines of code. Usually, the generic parser generates a representation, an Abstract Syntax Tree. The hard part is to interpreter or transform the AST to machine code. But parsing the source code, now it is easier.

A simple code to generate a constant node in generic AST:

const gepars = require('gepars');
const geast = require('geast'); // generic AST
// parser definition
const pdef = gepars.definition();
// define integer node
pdef.define('integer', 'integer:', function (value) {
return geast.constant(parseInt(value));
});

The integer: notation refers to an integer detecter by the parser. The funcion receives value as the string value of that token. And returns the constant AST node.

Parsing a real number:

pdef.define('real', 'real:', function (value) { 
return geast.constant(parseFloat(value));
});

Parsing boolean constants (notice the use of lexer token WITH the specified value):

pdef.define('boolean', 'name:true', function (value) { return geast.constant(true); });
pdef.define('boolean', 'name:false', function (value) { return geast.constant(false); });

We can define string` and other constant types. A variable referred by name, could be defined as:

pdef.define('name', 'name:', function (value) { return geast.name(value); });

A basic term in an expression:

// terms
pdef.define('term', 'integer');
pdef.define('term', 'real');
pdef.define('term', 'string');
pdef.define('term', 'boolean');
pdef.define('term', 'name');
pdef.define('term', [ 'delimiter:(', 'expression', 'delimiter:)' ], function (values) { return values[1]; });

Notice the definition of an array in the last definition: a sequence of parts to be parsed, describing a generic expression in parentheses.

A generic expression could be defined taking into account operator precedence and associativity:

// expressions
pdef.define('expression', 'expression0');
pdef.define('expression0', 'expression1');
pdef.define('expression0',
[ 'expression0', 'binop0', 'expression1' ],
function (values) { return geast.binary(values[1], values[0],
values[2]); });
pdef.define('expression1', 'expression2');
pdef.define('expression1',
[ 'expression1', 'binop1', 'expression2' ],
function (values) { return geast.binary(values[1], values[0],
values[2]); });
pdef.define('expression2', 'term');
pdef.define('expression2',
[ 'expression2', 'binop2', 'term' ],
function (values) { return geast.binary(values[1], values[0],
values[2]); });

where the binary operators are:

`pdef.define('binop0', 'operator:<');
pdef.define('binop0', 'operator:<=');
pdef.define('binop0', 'operator:>');
pdef.define('binop0', 'operator:>=');
pdef.define('binop0', 'operator:==');
pdef.define('binop0', 'operator:!=');
pdef.define('binop1', 'operator:+');
pdef.define('binop1', 'operator:-');
pdef.define('binop2', 'operator:*');
pdef.define('binop2', 'operator:/');

The parser can manage left recursion, and expand appropriately:

pdef.define('expression1', 
[ 'expression1', 'binop1', 'expression2' ],
function (values) { return geast.binary(values[1], values[0],
values[2]); });

where an expression1 could have a left expression1 .

Having a lexer definition, usually I parse a text using code like:

function parseNode(type, text) {
const lexer = lexers.lexer(text);
const parser = pdef.parser(lexer);

return parser.parse(type);
}

type is the non-terminal part to be parse, ie expression , term , etc.

These days, I’m using gepars in these personal projects:

  • Walang: programming language to be compiled to WebAssembly
  • Selang: simple programming language for Ethereum smart contracts
  • Rlie: R-like programming language interpreter
  • Erlie: Erlang-like programming language interpreter
  • Solcom: Solidity programming language compiler to Ethereum virtual machine bytecodes.

Related posts:

Geast, a generic Abstract Syntax Tree

Gelex, a generic lexer

Angel “Java” Lopez
https://github.com/ajlopez
https://twitter.com/ajlopez

--

--