Tower BFT: Solana’s High Performance Implementation of PBFT
Understand 1 of 8 key technologies that make Solana the most performant blockchain in the world
Solana is the most performant permissionless blockchain in the world. On current iterations of the Solana Testnet, a network of 200 physically distinct nodes supports a sustained throughput of more than 50,000 transactions per second when running with GPUs. Achieving this requires the implementation of several optimizations and new technologies, and the result is a breakthrough in network capacity that signals a new phase in blockchain development.
There are 8 key innovations that make the Solana network possible:
- Proof of History (POH) — a clock before consensus;
- Tower BFT — a PoH-optimized version of PBFT;
- Turbine — a block propagation protocol;
- Gulf Stream — Mempool-less transaction forwarding protocol;
- Sealevel — Parallel smart contracts run-time;
- Pipelining — a Transaction Processing Unit for validation optimization
- Cloudbreak — Horizontally-Scaled Accounts Database; and
- Archivers — Distributed ledger store
In this blog post, we will explore Tower BFT, Solana’s custom implementation of PBFT that prefers liveness to consistency. Tower BFT leverages Solana’s PoH as a clock before consensus to reduce messaging overhead and latency .
“In order to provide aliveness, replicas must move to a new view, if they are unable to execute a request. However, it is important to maximize the period of time, when at least 2f + 1 non-faulty replicas are in the same view, and to ensure that this period of time increases exponentially until some requested operation executes” (Practical Byzantine Fault Tolerance, Miguel Castro and Barbara Liskov).
Solana implements a derivation of PBFT, but with one fundamental difference. Proof of History (PoH) provides a global source of time before consensus. Our implementation of PBFT uses the PoH as the network clock of time, and the exponentially-increasing time-outs that replicas use in PBFT can be computed and enforced in the PoH itself.
PoH is a Verifiable Delay Function implemented as a sequential hash function. We use a loose definition of a VDF, since verification requires (compute time)/(number of cores). The basic principles of how PoH works are as follows:
- Sha256 loops as fast as possible, such that each output is the next input.
- The loop is sampled, and the number of iterations and state are recorded.
The recorded samples represent the passage of time encoded as a verifiable data structure. In addition, this loop can be used to record events.
- Messages that reference any of the samples are guaranteed to have been created after the sample.
- Messages can be inserted into the loop and hashed together with the state. This guarantees that a message was created before the next insert.
This data structure guarantees both time and order of events embedded within, and this core idea is the basis of all of the major technical optimizations in Solana.
Stated another way: Imagine you are on an island, and a bottle floats by with a thumb drive inside. On that drive is the Solana PoH ledger. Using only the PoH ledger, you can compute the state of all the nodes in the network. For example, a node is considered failed if a vote for the ledger has not been recorded in the last X hashes. We can consider the ledger to be valid if over the last X hashes hashes a supermajority of the network that has signed validation messages.
- All the nodes that examine this data structure will compute the exact same result, without requiring any peer-to-peer communication.
- The PoH hash uniquely identifies that fork of the ledger; and
- A validation vote message is only valid if the PoH hash that it voted on is present in the ledger.
This brings us to voting and PBFT. Since the ledger itself works as a reliable network clock, we can encode the PBFT time-outs in the ledger itself.
1. A vote starts with a time-out of N hashes.
A Validator guarantees (with slashing) that once a vote for a PoH hash has been cast, the Validator will not vote for any PoH hash that is not a child of that vote, for at least N hashes.
2. The time-outs for all the predecessor votes double
To render operations more manageable, voting is restricted to a fixed period of hashes, which we call slots. Our goal for slots is the number of hashes that represents around 400ms. Every 400ms, the network has a potential rollback point, but every subsequent vote doubles the amount of real time that the network would have to stall before it can unroll that vote.
Imagine that each validator has voted 32 times in the last 12 seconds. The vote 12 seconds ago now has a timeout of 2³² slots, or roughly 54 years. Effectively, this vote will never be rolled back by the network. Whereas the most recent vote has a timeout of 2 slots, or about 800ms. As new blocks are added to the ledger, old blocks are increasingly likely to be confirmed because the number of slots old votes are committed to doubles every slot, or every 400ms.
Note that while this sounds like probabilistic finality in proof of work, it is not. Once ⅔ of validators have voted on some PoH hash, that PoH hash is canonicalized, and cannot be rolled back. This is distinct from proof of work, in which there is no notion of canonicalization.
To defend against being locked out from the rest of the network, each Validator ensures that they only vote if they see a supermajority of the network voting on the same ledger as well. Each Validator monitors when the timeout on an ancestor vote will increase past a predefined threshold — such as from 5 to 10 minutes — and ensures that a supermajority of the network has voted on a fork containing that vote. In practice, validators:
- Check if the supermajority has voted on a slot that will commit to a timeout of 10 minutes
- If not, don’t vote
So what happens to the network during a partition and timeouts actually start expiring?
- Any votes that have expired are cleared
- Timeouts double for ancestors if and only if the child has an equal timeout
For example, let’s consider a scenario in which the current timeouts are:
64, 32, 16, 8, 4, 2
If a validator stopped voting for 17 slots and voted again, the resulting timeouts for the validator would be:
64, 32, 2
It would take 4 more consecutive votes before all the ancestors double in timeouts again.
64, 32, 4, 2
64, 32, 8, 4, 2
64, 32, 16, 4, 2
Finally the 4th vote would double all the timeouts
128, 64, 32, 16, 8, 4, 2
This approach allows the network to continuously stream blocks without stalling the ledger until a supermajority observes the same ledger. Another notable aspect is that every participant in the network can compute the timeouts for every other participant without any P2P communications. This is what makes Tower BFT asynchronous.
We expect there to be many micro forks that are quickly discarded. When a Validator detects multiple forks, honest Validators compute the effective stake-weighted timeout of every fork and pick the heaviest one. Validator rewards are generated only for the votes that reach the 2³² timeout. Therefore, it is incentive compatible for validators to vote on top of the heaviest fork, since the fork with the largest amount of stake-weighted timeouts will generate the largest amount of rewards for the network.
Solana’s implementation of Tower BFT, alongside innovations like Proof of History, Proof of Replication, and Gulf Stream combine to create the most performant blockchain in the world. Solana’s testnet is live today. You can see it at https://testnet.solana.com. For cost purposes, we are only running a handful of nodes. However, we have spun it up on many instances to over 200 physically distinct nodes (not on shared hardware) across 23 data centers on AWS, GCE, and Azure for benchmarking.
Solana will soon launch a public beta incentivizing validators to run nodes via Tour de SOL — analogous to Cosmos’s Game of Stakes — that challenges the public at large to test the limits of the Solana network while earning tokens for doing so.