Archivers - Solana’s Solution to Petabytes of Blockchain Data Storage
Learn more about 1 of the 8 innovations that make Solana the most performant blockchain in the world.
Solana is the most performant permissionless blockchain in the world. On current iterations of the Solana Testnet, a network of 200 physically distinct nodes supports a sustained throughput of more than 50,000 transactions per second when running with GPUs. Achieving as such requires the implementation of several optimizations and new technologies, and the result is a breakthrough in network capacity that signals a new phase in blockchain development.
There are 8 key innovations that make the Solana network possible:
- Proof of History (POH)— a clock before consensus;
- Tower BFT — a PoH-optimized version of PBFT;
- Turbine — a block propagation protocol;
- Gulf Stream — Mempool-less transaction forwarding protocol;
- Sealevel — Parallel smart contracts run-time;
- Pipelining — a Transaction Processing Unit for validation optimization
- Cloudbreak — Horizontally-Scaled Accounts Database; and
- Archivers — Distributed ledger store
In this blog post, we’ll explore Archivers, Solana’s distributed ledger store for petabytes of blockchain data storage. We were introduced to Proof of Replication by Filecoin in 2017. In 2018, we built our version of PoRep for Solana using a VDF, and optimized for batch verification.
At full capacity, the Solana network will generate 1gb/s * 365 days = 4 petabytes of data every year. If every node in the network is required to store all that data, it would limit network membership to a centralized few that maintain that kind of storage capacity. Our Proof of History technology can be leveraged to mitigate this problem by allowing a fast-to-verify implementation of Proof of Replication and enabling a bit torrent-esque distribution of the ledger across millions of Archiver nodes around the world. Archivers are not consensus participants, and have very low hardware requirements.
At a high level, the Solana Archiver network functions as follows: Archivers must signal to the network that they have X bytes of space available for storing data. On some frequency, the network divides the ledger history into pieces to target some replication rate (currently we’re expecting a target rate around 100x) and fault tolerance (achieved with erasure coding) based on the number of Archiver identities and total available storage of Archivers. Once Replicator:data assignments are made, each Replicator downloads her respective data from consensus validators. On some frequency, Archivers will be challenged to prove they’re storing data, at which point they must complete a proof of replication (PoRep). Archivers are awarded ~3% of inflation for their efforts.
Proof of Replication in Further Depth
The basic idea of Proof of Replication is to encrypt a dataset with a public symmetric key using CBC encryption, and then hash the encrypted dataset. This method is explained in detail in Filecoin’s Proof of Replication Technical Report. Unfortunately, the problem with this approach is that it is vulnerable to attack.
For example, a dishonest storage node can stream the encryption and delete as it’s hashed. The simple solution is to force the hash to be done on the reverse of the encryption, or perhaps with a random order. This ensures that all the data is present during the generation of the proof, and it also requires the Validator to have the entirety of the encrypted data present for verification of every proof of every identity. The space required to validate becomes (Number of CBC keys)*(data size).
We improve on this approach by randomly sampling the encrypted blocks at a faster pace than the speed of encryption, and record the hash of those samples into the PoH ledger. Thus the blocks stay in the exact same order for every PoRep and verification can stream the data and verify all the proofs in a single batch. This way, we can verify multiple proofs concurrently, each one on its own CUDA core.
With the current generation of graphics cards, the Solana network can support up to 1500 replication identities or symmetric keys per GPU card. The total space required for verification is (2 CBC blocks) * (Number of CBC keys), with core count of equal to (Number of CBC keys). A CBC block is expected to be 1MB in size.
Next, we have to construct a game between Validators and Archivers that ensures that archivers are generating proofs, and that validators are actually verifying PoReps.
To begin producing PoReps of the ledger, the Archiver client does the following:
- Clients sign a PoH hash at a regular period
- Signature is used as the source of randomness to pick a specific slice of the ledger
- Signature is used to create a symmetric CBC key and the client encodes the slice of the ledger with the key.
Since each client sligns the same PoH hash, the signatures are randomly distributed between all the clients. Clients then continuously sample the encrypted sample:
- Clients sign a PoH hash at a regular period
- Signature is used as the source of randomness to sample 1 byte per 1MB of the slice.
- Samples are hashed with SHA256
All the clients are forced to use the same PoH hash value as the signature. Since the signature tied to PoH, the resulting hash of samples is unique to that point in time and to that specific replication.
Validators in turn check the clients’ proofs:
- Validator declares how many PoReps it can verify, based on number of GPU cores
- Periodically validators will sign a PoH hash
- The signature is used to select a slice of the ledger to verify, and a mask to select which samples to verify up to the capacity of the validator
- Validator uploads the proofs that failed verification
A client can challenge a Validator for a failed proof by fishing for lazy validators. To prevent grinding attacks, clients must use the same keypair identity continuously. To prevent spam, all the messages in the protocol incur tx fees. Archivers earn rewards based on the number of successful submitted proofs. Validators earn a stake weighted reward for verifying proofs, and fishermen earn a reward by taking a validators slashed coins when fishermen publish a proof of a fake proof.
Solana’s utilization of Archivers alongside innovations like Proof of History, Sealevel, and Gulf Stream combine to create the most performant blockchain in the world. Solana’s testnet is live today. You can see it at https://explorer.solana.com. For cost purposes, we are only running a handful of nodes. However, we have spun it up on many instances to over 200 physically distinct nodes (not on shared hardware) across 23 data centers on AWS, GCE, and Azure for benchmarking.
Solana will soon launch a public beta incentivizing Validators to run nodes via Tour de SOL — analogous to Cosmos’s Game of Stakes — that challenges the public at large to test the limits of the Solana network while earning tokens for doing so.