A Simple Trie Pruning Algorithm

In Ethereum type nodes (such as RSK or my personal blockchain project) the state of the network (blocks, accounts, code and storage of contracts, …) is generally saved on key value databases. Some of those databases contain the nodes of an structure called the trie. Read Modified Merkle Patricia Trie — How Ethereum saves a state.

The status of the blockchain accounts (such as nonce, balance, associated code) and the status of the contracts (their storage) are saved in tries. This state is generated every time a new block is executed, and it grows with the blockchain. While there are use cases for having all that information in the node, it may take up a lot of disk space (gigabytes). An alternative is to keep only the information that will be used for the next blocks.

Two years ago I wrote an algorithm to prune the status of account and storage tries within the RSK project. But I think that the algorithm can be improved in two ways: to make it easier and to avoid long pausing the node during the process.

First, the trie is a tree-like structure. What matters in consensus is knowing the top of the state tree (actually its hash), which contains in its branches the entire state at the end of the execution of that block.

A very simplified trie structure: a top node and its descendants, with many levels. Only the leaf nodes has associated values (at least in Ethereum and in my personal project)

When one of the trie values is updated (for example, the status of an account when changing its balance), NEW nodes are generated in the trie, keeping the previous tree without changes. Each tree is immutable.

New value and nodes can be added without changing the original trie

Usually the nodes of a trie are stored into a key value database in disk. It is very common to use LevelDB or similar key value stores. In this algorithm description I’m going to call that database (in LevelDB, a dedicated directory) a bucket.

A database with many tries, a bucket in this algorithm description

Something in the environment decides that it is time to execute the pruning, for example, the database has grown to more than half a mega or 10,000 blocks have been processed since the last pruning. When that moment arrives, a new bucket (database) is created, and from there, the new nodes that are generated in the process of new incoming blocks are saved in the new bucket.

When the pruning algorithm starts, a new bucked is created. The new generated nodes are saved in that second bucket

More precisely, the steps are:

  • Create the new bucket
  • Stop the process of new blocks
  • Select the root nodes that will survive, for example the last 1000 blocks of the current best blockchain, and all of the siblings from the current height minus 50
  • Restart the new blocks process
  • The new generated nodes are written in the new bucket

The only time the block processing is suspended is in the second step: some root nodes will be selected as the top nodes to be kept: there are the top nodes of the alive tries; this algorithm is similar to garbage collector.

In this way only for a moment the process of the node stops. It is important to stop the process, to have a still photo of the siblings from the last heights. Note that if the best blockchain is at height 100_000, not only are the top nodes of the siblings selected from 99_950 to 100_000, but ALL the siblings are selected FROM 99_950 (may be there areknown blocks with a height greater than 100_000 but they have not been chosen as part of the best blockchain in the consensus algorithm).

Once the pruning begins, the tries corresponding to the selected top nodes are copied to the new bucket. You can visit each node in each selected trie, using a stack or queue to avoid recursion, and placing any child node that should be copied to the new bucket into that queue.

Each selected trie will be copy to the new bucket

Attention: it is important that if a new trie is generated by a new block, DO NOT TRY to avoid saving one of your new nodes BECAUSE it is in the old bucket. All nodes must be recorded. Because if we do not save that node into the new bucket, trusting that it will go from the old bucket to the new one during the process of the selected top nodes, it may not be so because it may not be one of the nodes that are reached from the selected top nodes.

When the queue of nodes to process is empty, then the process is finished. All new tries, and the tries corresponding to the selected top nodes will be recorded in the new bucket. The original is deleted, and the normal process is resumed.

While all this copy is running, the block process may continue to work. When this process reads a node, the internal reading process will read the new bucket first, and if it can’t find the requested node there, it goes to the old bucket (or vice versa).

Extended Bucket

I mentioned that on Ethereum and in my personal project, there is a separate trie for storing contracts.

An account could have a pointer (a hash) to the top node of the trie that represents the contract storage associated with such account. So, there is a trie for accounts, and another trie for the storage (two key value database), but the concept of bucket is the same.

This doesn’t complicate the algorithm above: just think of the bucket as an extended bucket, which has two databases (it’s what I did in the original pruning algorithm I wrote in 2018). I could add that I prefer the solution of having two tries instead of one, but that would give for another future post.

Stop and Resume

If the process is interrupted during the copy, when relaunching the node we can see that there are two buckets, and continue from there, again selecting the stops to copy, and choosing one of the buckets as the new one. So the algorithm is resilient to interruptions.

More than two buckets

Another alternative is to have two buckets during the normal process. Then, during the pruning process, a third bucket is created to store the upcoming new nodes, and the selected top nodes are copied, if needed, from the first bucket to the second bucket. Is like generation garbage collector. But I think it is not really needed in a first implementation.

Conclusion

There are many solutions to this problem, but I think that I could implement this simple approach in my personal project. Only if the solution is not satisfactory, I could switch to other more sophisticated algorithm.

Angel “Java” Lopez