Gepars: a generic parser
I wrote a generic parser in JavaScript, gepars. Currently, I’m using it in my projects, along with gelex (generic lexer) and geast (generic Abstract Syntax Tree). Written using TDD (Test-Driven Development) the interesting part is that I found it very useful: I could write some interpreters and compilers, with a few lines of code. Usually, the generic parser generates a representation, an Abstract Syntax Tree. The hard part is to interpreter or transform the AST to machine code. But parsing the source code, now it is easier.
A simple code to generate a constant node in generic AST:
const gepars = require('gepars');
const geast = require('geast'); // generic AST// parser definition
const pdef = gepars.definition();// define integer node
pdef.define('integer', 'integer:', function (value) {
return geast.constant(parseInt(value));
});
The integer:
notation refers to an integer detecter by the parser. The funcion receives value
as the string value of that token. And returns the constant AST node.
Parsing a real number:
pdef.define('real', 'real:', function (value) {
return geast.constant(parseFloat(value));
});
Parsing boolean constants (notice the use of lexer token WITH the specified value):
pdef.define('boolean', 'name:true', function (value) { return geast.constant(true); });
pdef.define('boolean', 'name:false', function (value) { return geast.constant(false); });
We can define string`
and other constant types. A variable referred by name, could be defined as:
pdef.define('name', 'name:', function (value) { return geast.name(value); });
A basic term in an expression:
// terms
pdef.define('term', 'integer');
pdef.define('term', 'real');
pdef.define('term', 'string');
pdef.define('term', 'boolean');
pdef.define('term', 'name');
pdef.define('term', [ 'delimiter:(', 'expression', 'delimiter:)' ], function (values) { return values[1]; });
Notice the definition of an array in the last definition: a sequence of parts to be parsed, describing a generic expression in parentheses.
A generic expression could be defined taking into account operator precedence and associativity:
// expressions
pdef.define('expression', 'expression0');
pdef.define('expression0', 'expression1');
pdef.define('expression0',
[ 'expression0', 'binop0', 'expression1' ],
function (values) { return geast.binary(values[1], values[0],
values[2]); });
pdef.define('expression1', 'expression2');
pdef.define('expression1',
[ 'expression1', 'binop1', 'expression2' ],
function (values) { return geast.binary(values[1], values[0],
values[2]); });
pdef.define('expression2', 'term');
pdef.define('expression2',
[ 'expression2', 'binop2', 'term' ],
function (values) { return geast.binary(values[1], values[0],
values[2]); });
where the binary operators are:
`pdef.define('binop0', 'operator:<');
pdef.define('binop0', 'operator:<=');
pdef.define('binop0', 'operator:>');
pdef.define('binop0', 'operator:>=');
pdef.define('binop0', 'operator:==');
pdef.define('binop0', 'operator:!=');pdef.define('binop1', 'operator:+');
pdef.define('binop1', 'operator:-');pdef.define('binop2', 'operator:*');
pdef.define('binop2', 'operator:/');
The parser can manage left recursion, and expand appropriately:
pdef.define('expression1',
[ 'expression1', 'binop1', 'expression2' ],
function (values) { return geast.binary(values[1], values[0],
values[2]); });
where an expression1
could have a left expression1
.
Having a lexer definition, usually I parse a text using code like:
function parseNode(type, text) {
const lexer = lexers.lexer(text);
const parser = pdef.parser(lexer);
return parser.parse(type);
}
type
is the non-terminal part to be parse, ie expression
, term
, etc.
These days, I’m using gepars in these personal projects:
- Walang: programming language to be compiled to WebAssembly
- Selang: simple programming language for Ethereum smart contracts
- Rlie: R-like programming language interpreter
- Erlie: Erlang-like programming language interpreter
- Solcom: Solidity programming language compiler to Ethereum virtual machine bytecodes.
Related posts:
Geast, a generic Abstract Syntax Tree
Angel “Java” Lopez
https://github.com/ajlopez
https://twitter.com/ajlopez