Building a Blockchain: Serialization

In my previous post, I described the main entities that resides into my personal blockchain project in Java. Having seen the main entities that the project handles, let’s see how those entities are serialized to byte arrays. Serialization is important for these reasons:

  • it influences the consensus (eg the hash of a block is calculated on the serialized block header)
  • it is used in message propagation between nodes

So a clear-defined serialization is required in the base protocol.

The serialization I adopted for my personal project is the one specified in the Ethereum Yellow Paper: Run Length Prefix (RLP). A more detailed description at: Ethereum Wiki: RLP. In that page, I read:

The purpose of RLP (Recursive Length Prefix) is to encode arbitrarily nested arrays of binary data, and RLP is the main encoding method used to serialize objects in Ethereum. The only purpose of RLP is to encode structure; encoding specific data types (eg. strings, floats) is left up to higher-order protocols; but positive RLP integers must be represented in big endian binary form with no leading zeroes (thus making the integer value zero be equivalent to the empty byte array). Deserialised positive integers with leading zeroes must be treated as invalid. The integer representation of string length must also be encoded this way, as well as integers in the payload. Additional information can be found in the Ethereum yellow paper Appendix B.

Well, I don’t adopted the idea that integer zero has no leading zero. To represent a zero value, I use a single byte array with one byte in zero. So, I have a clear distinction between null (empty byte array) or zero (one byte in zero). Having writing the code using TDD (Test-Driven Development) I could change this decision at any moment, with confidence and no pain.

More RLP resources:

Then, RLP encodes byte arrays, and lists of byte arrays:

Notice that the increase of size representation is, in general, low. There are other conventions (prefixes to add with lengths) to represent lists of RLP encoded bytes (don’t put simple list of byte arrays: the list elements SHOULD BE RLP encoded byte arrays).

To encode and decode these arrays, I wrote RLP.java and RLPTest.java. To encode some simple values (hashes, long integers, Coin values, etc), I have RLPEncoder.java.

To encode an entity, first I should encode its properties to byte arrays (using some conventions I adopted, similar to Ethereum ones but with the difference of zero integers), then encode these byte arrays to RLP encoded byte arrays, and at the end, encode all these data into a RLP list.

I have dedicated encoder classes for each entity. Ie to encode and decode a Transaction there is a TransactionEncoder.java. Its encode code:

public static byte[] encode(Transaction transaction) {byte[] rlpSender = RLPEncoder.encodeAddress(transaction.getSender());byte[] rlpReceiver = RLPEncoder.encodeAddress(transaction.getReceiver());byte[] rlpValue = RLPEncoder.encodeCoin(transaction.getValue());byte[] rlpNonce = RLPEncoder.encodeUnsignedLong(transaction.getNonce());byte[] rlpData = RLP.encode(transaction.getData());byte[] rlpGas = RLPEncoder.encodeUnsignedLong(transaction.getGas());byte[] rlpGasPrice = RLPEncoder.encodeCoin(transaction.getGasPrice());return RLP.encodeList(rlpSender, rlpReceiver, rlpValue, rlpNonce, rlpData, rlpGas, rlpGasPrice);}

Its decode code:

public static Transaction decode(byte[] encoded) {byte[][] bytes = RLP.decodeList(encoded);if (bytes.length != 7)throw new IllegalArgumentException("Invalid transaction encoding");Address sender = RLPEncoder.decodeAddress(bytes[0]);Address receiver = RLPEncoder.decodeAddress(bytes[1]);Coin value = RLPEncoder.decodeCoin(bytes[2]);long nonce = RLPEncoder.decodeUnsignedLong(bytes[3]);byte[] data = ByteUtils.normalizeBytesToNull(RLP.decode(bytes[4]));long gas = RLPEncoder.decodeUnsignedLong(bytes[5]);Coin gasPrice = RLPEncoder.decodeCoin(bytes[6]);return new Transaction(sender, receiver, value, nonce, data, gas, gasPrice);}

Pending Work

All the RLP logic was written using TDD (Test-Driven Development) so each of the use cases are tested. But maybe I could add additional tests, against a RLP suite tests (inputs, expected outputs) to validate the implementation. Also, the current implementation copy many byte arrays, and it could be improved using byte slices, like in Golang or Rust.

Upcoming posts: stores, internode messaging, transaction and block execution, EVM contracts execution, etc.

Angel “Java” Lopez