A Solidity Compiler: Introduction

A few years ago I wrote about compiling Solidity and other ideas on this post. Then, I started to code some of those ideas. Today I want to introduce a personal project, a Solidity compiler, written in JavaScript, called solcom. You can see the current work-in-progress code in this repository.

Keep it simple

As usual, I prefer simplicity, emergent design and to advance using TDD. This project is written with simple plain JavaScript. The dependencies that contains are are other personal projects of mine. They were very useful in other projects, and now it is time to use them in this compiler. Progress is achieved using the TDD workflow. I think it is a good example of how I like to program. But this goes beyond personal taste: I am a big supported of this style of programming, and I think it is the best way to build production code.

Dogfooding

When the dog food is good enough

After having written interpreters and transpilers for several years, I have managed to distill the main use cases that I need to implement when I start a new project of that type. So a few months ago I built these libraries:

And I use them in this Solidity compiler project.

The lexer

Has a few rules now, to identifies tokens in the source code. A fragment:

const gelex = require('gelex');
const ldef = gelex.definition();
ldef.define('name', '[a-zA-Z_][a-zA-Z0-9_]*');
ldef.define('integer', '[0-9][0-9]*');
ldef.define('operator', '+-*/=<>'.split(''));
ldef.define('delimiter', '()[]{};,'.split(''));
ldef.defineText('string', '"', '"');

Maybe, I could refactor the name of some tokens (ie comma as separator instead of delimiter). But using TDD, the change is easy!

The parser

Its definitions are oriented to produce an Abstract Syntax Tree. Some code:

const gepars = require('gepars');
const geast = require('geast');
geast.node('contract', [ 'name', 'body' ]);const pdef = gepars.definition();// program and its declarations
pdef.define('program', 'topdecllist', function (value) { return geast.sequence(value); });
pdef.define('topdecllist', [ 'topdecllist', 'topdecl' ], function (values) { values[0].push(values[1]); return values[0]; });
pdef.define('topdecllist', 'topdecl', function (value) { return [ value ]; });
pdef.define('topdecllist', [ '!', 'null' ], function (values) { return [] });
pdef.define('topdecl', 'contract');
// contract and its declarations
pdef.define('contract', [ 'name:contract', 'name:', 'delimiter:{', 'contdecllist', 'delimiter:}' ], function (values) { return geast.contract(values[1], geast.sequence(values[3])); });
pdef.define('contdecllist', [ 'contdecllist', 'contdecl' ], function (values) { values[0].push(values[1]); return values[0]; });
pdef.define('contdecllist', 'contdecl', function (value) { return [ value ]; });
pdef.define('contdecllist', [ '!', 'delimiter:}' ], function (values) { return [] });
pdef.define('contdecl', [ 'type', 'name:', 'delimiter:;' ], function (values) { return geast.variable(values[1], values[0]); });
pdef.define('contdecl', 'method');

Each definition usually consist of:

  • the name of the node to generate
  • the other nodes that compose the new node
  • a function that construct the new node object

From Source Code to AST to Bytecodes

The process is:

  • Reads the source code from file
  • Use the lexer to generate an stream of token
  • Use the parser to read the tokens and build the Abstract Syntax Tree

After the AST (Abstract Syntax Tree) was generated, I use another project named evmcompiler to visit each of its nodes and then generate the corresponding bytecodes.

A diagram:

From Source Code to Bytecodes

To do:

  • Generate bytecode for the constructor: I’m thinking to generate this code at the end of the code, to improve the current Solidity implementation.
  • Complete type management: ie, adding compiler instructions to add uint8 with another uint8
  • Support import in source code files
  • Improve EVM compiler to have an optimization phase of the generated bytecode
  • etc…

Other outcomes

In the blog post I mentioned at the beginning, I wrote about other ideas to implement. One is to use the Solidity language as input and TRANSPILE to another language:

Generating Java or C# code from Solidity

One issue: this kind of transpiling usually implies to abandon the gas cost logic associated with EVM (Ethereum Virtual Machine) bytecodes. I’m exploring this kind of process (not linked yet with solcom) in the project SolidityCompiler.

Another path to explore is to compile EVM bytecodes to Java or C# programs. In this way, the gas cost could be preserved. I wrote some experiments in the project evm2code.

Next posts: more detailed description of generated AST, code generation when visiting the AST, implementing storage vs memory access, mapping implementation, string and variable array implementation, etc.

Angel “Java” Lopez