Building a Blockchain: Towards Beam Synchronization
I’m very enthusiastic about my personal open source project BlockchainJ, a blockchain written in Java using TDD. To me, it’s very interesting (and a lot of fun) to design, write and explore ideas and code using the Test-Driven Development workflow. I think that you can understand any problem if you write a solution to it. And to write a blockchain is a not trivial project, but also one that can be achieved. Every day, I wrote few lines of test and production code, to complete the task. Now, I can run many nodes, communicate them, mine blocks, execute transfer transactions, and some smart contract transactions. There are many things to complete, but in these years, I learned a lot from this assignment.
In a previous post I described the key value stores I’m using, and at the end of the post, I presented the remote storage: instead of having a local key value store, you can access the data from other nodes in the network. Using this feature, I could implement the execution of a block that comes from the network WITHOUT knowing the previous blocks in the blockchain. It could be described as a beam synchronization: the local node does not need to retrieve all the blocks in the blockchain to start operation. Also, I was thinking about these features to implement a light client.
In this post, I describe remote store in a bit more detail, and present a short code tests executing a block without having local accounts, contract codes and contract storage cells: they are provided by ANOTHER node transparently.
Remote Stores
The retrieve of a remote value associated with a key could be described in this diagram and steps description:
- Remote Store receives a request: given a key (byte array) retrieve the associated value (byte array)
- Remote Store sends a request to Send Processor, and waits for a future instance completion.
- Send Processor sends a request message to some of the peer nodes.
- One peer node sends the value in a message. KeyValueProcessor check the consistency (ie key as a hash correspond to the supplied value, in case the key value store is the Account Trie).
- Key Value Processor completes the pending future for Remote Store.
- Remote Store response with the required value.
In order to use remote stores in a transparent way, there are Dual Stores:
- Dual Store receives a request: given a key (byte array) retrieve the associated value (byte array)
- It checks the existence of the corresponding value consulting Local Store. If it exists, jump to step 4.
- It retrieves the value using the Remote Store. If the value is retrieved, it is ALSO SAVED into Local Store. In this way, the local store is populated incrementally, using the most required data first.
- Remote Store response with the required value.
A Beam Execution
This is a fragment of a test code that executes a block using dual stores:
The key piece is the construction of the BlockExecutor: it is injected with dual stores and store providers (the creation of these store is not included in the above fragment). There are two NodeProcessors that are communicated using peer to peer messages. The dual stores provided to the BlockExecutor have: local stores for node processor 2, and remote stores that access the rest of the network: in this case, the node processor 1.
When the BlockExecutor executes a block that is totally new in the context of node processor 2, it retrieves all the information needed from node processor 1, transparently. After the execution, the account trie nodes and any storate trie nodes retrieved are now available in the local stores of node processor 2. Then, along new blocks are received, they are executed faster, as the local stores (account trie, storage trie and key value code store) have more values, usually the most used in this range of the blockchain.
In this way, the node can start to validate blocks without dying in a long synchronization process. I could use partial merkle proof to assert the value that correspond to a given key, but it is interesting to collect more data about this alternative approach: one thing is that each value retrieved adds information to the local for FUTURE executions. Even shortened the size of the partial proof (maybe the first part is already know by the local node), this algorithm allows to detect quickly if some part of the required path IS ALREADY known by the node. It employs more bandwidth in messages, to be analyzed.
As I confessed, I had lot of fun implementing this feature.
Angel “Java” Lopez