Trie

A trie is a data structure that behaves like a dictionary of key-values. Any data can be encoded as binaries in a key-value pair. The trie takes care of hashing the data and deriving the Merkle root. Internally, the key-values of a trie are ordered lexicographically by their keys, which form a prefix tree, as follows:

Trie Dot Graph

No matter what order the key-values are put in, the same bag of key-values will always result in the same trie with the same Merkle root. This determinism is known as commutative and associative of data hashing, which allows a trie to be built incrementally with efficiency. The inserts, lookups, updates, and deletes of key-values in a trie are all in O(log(n)). Thus, a new trie can be easily derived by modifying an existing trie. In storage-wise, each trie node is stored in a key-value store with the node's hash as the key and node's data as the value. This makes the trie storage agnostic and flexible enough to support all-sorts of key-value stores, such as embedded databases like LevelDB or blob storages like S3. This also makes modifying an existing trie memory and computation efficient, because only the modified trie nodes need to be loaded from the persistent store, and new nodes need to be saved back.

In Proofable, we extended the trie to be an all-in-one solution package - Proofable certificate. When a trie is updated, a new trie root will be created, but previous trie node data are still there and immutable. The series of trie roots resembles the history of the trie data change. A trie proof (anchoring trie path of the certificate) can be created for each root, which means the hash of the root will be anchored to a blockchain of the user's choice permanently. Once the anchoring is confirmed, all the data contained in the trie at the given root will be mathmatically/cryptographically authenticated, i.e. it will be able to claim that the trie data exist exactly like this at the time of the confirmation. A key-values proof (sub-certificate) can be also extracted from a trie to independently prove a subset of the key-values the user is interested in, e.g. claiming that two key-values of particular interest exist exactly like this at the time of the confirmation. When all manipulations are done, the trie can be exported by the user and put in a safe place they have full control. Later on, the trie can be imported back to do further manipulations.

Using the trie as the basis for Proofable and ProvenDB has the following advantages over more simplistic approaches:

  • Very large amounts of data can be anchored to a blockchain in a single transaction. Only the root of a trie needs to be included in the blockchain transaction. The trie structure itself forms the link between that transaction and the individual elements being proved.
  • The trie supports hierarchies of proof – we can generate proofs for a specific directory for instance, as well as proofs for all the items in the directory.
  • The trie can be built incrementally, reducing the burden of recalculating an entire proof when new elements are added.