Decoupling Storage from Execution in RSK Blockchain
These years I was programming a log, using TDD (Test-Driven Programming), to keep me trained in design and developer skills. To me, TDD is a great workflow to discover emergence design solutions, to simple and complex domains. One of my preferred domains is blockchain, and I wrote tons of code in Java, C# and JavaScript, even Solidity smart contracts. The top example is my personal blockchain project in Java (see also my published posts).
I learned about blockchain thanks to my work in RSK. The company flag product is an Ethereum-like blockchain, connected to the Bitcoin blockhain. The open source node is a derived work from EthereumJ implementation (now deprecated).
One thing I learned from my personal experiments in blockchain implementation, is that the Ethereum-like smart contract storage COULD (and SHOULD) BE decoupled from block/transaction execution. I described some of my implementation in Building a Blockchain: Executing Smart Contracts.
The usual implemenation in Ethereum for smart contract storage involves the use of Trie (see Ethereum State Trie Architecture Explained for a general description). Implementing a Trie is a beautiful work, and using TDD, I implemented it many times (including most of the code of a new Trie implementation for RSK I wrote in 2016, completed and reviewed with team work, replacing the original EthereumJ implementation).
Since then, I think that the execution of smart contract could be executed WITHOUT knowing the existence of a trie. That is, the storage can be easily decoupled from execution.
I started a proposal for RSK in 2018 (see branch) to get rid of any coupling (EthereumJ implementation was very convoluted in that area). But the project took another path, that in my opinion, didn’t solved this issue: nothing in the transaction execution SHOULD know about a trie storage. Also, I though that a more clear implementation of the use and access of the storage cells would be better for the project.
I wrote another proposal for RSK in 2019 (see branch) to have instances like ExecutionContext.java from my personal project. Only the TopExecutionContext.java KNOWS the existence of a trie (and it could even improved), while ChildExecutionContext.java only manage internal cache for accounts and storage valeus.
My informal experiments showed an improvement in execution time also (see my comments in a pull request). There are INFORMAL, but they look good to explore. My first thought: the improve in performance is due to this decoupling, and to have a clear key-value cache, separated by account and by storage, avoiding in most case the calculation of long trie keys. But I should try to improve these experiments, but, as you know: lack of time, so many ideas, only one life.
One more thing; this decoupling facilitates the migration to have a journal of key values instead of a nested stack of execution context, avoiding the abuse of retrieving values from deep-nested smart contract code. I will wrote a journal implementation in my personal project, to show that is totally transparent to the execution logic.
(the top image was taken from Decoupling blog post)
Angel “Java” Lopez