Solana Foundation Validator Health Report: March 2023

by Solana Foundation

Solana Foundation Validator Health Report: March 2023

The Solana validator network continues to thrive. With respect to metrics including node count, consensus nodes (block-producing validators), Nakamoto coefficient, and node distribution and diversity, the Solana network improved significantly over the past year. It’s made several advancements in resiliency in this period, particularly at the validator client level. Today, Solana is one of the largest proof of stake networks in the world by node count, and one of the most distributed, as measured by Nakamoto Coefficient.

Footnotes. Updated March 31, 2023.

One of the important measures of validator health isn’t a number, but how the validator network performs in a crisis. In one recent example, server provider Hetzner abruptly blocked Solana validators from using its services in November 2022. Over 1000 validators and 20% of active stake went offline in a matter of hours. The network remained fully online and performant during this transition. Within a few days, nearly all the affected validators were back online at different data centers. Within a few weeks, almost all of the deactivated stake was back online.

It’s important to note that Solana mainnet experienced a significant performance degradation on February 25. The validator community successfully restarted the network on February 26, 2023. The root cause remains under investigation and more information will be provided when available.

The above events, in addition to significant changes in the market in the past year, have led the Solana Foundation to broaden its evaluation of validator network health. This is expounded on below.

Evolutions in measuring the health of the validator network

The Foundation’s opinion on what makes a healthy validator network has evolved over the last six months, shaped both by data on validator network health and the crises that the industry faced last fall.

In particular, the Foundation has broadened its efforts on ensuring that the validator network is healthy at the software level. The community of core engineers working on the Solana validator client are working to mitigate the chance that a single line of code in a single software program halts the network. In this vein, the Foundation has focused on supporting a network of core contributing developers from multiple organizations.

Some of the efforts to improve health at the software level over the past year include:

New Software Clients in Jito and Firedancer: Validators are computers that run the Solana validator client, which is the operating system of the Solana network. A single zero day or bug in a validator client could bring down an entire network if there’s no “backup” software for the network to run on. An important way to hedge against this risk is for validators to be able to choose between multiple software clients. One of the biggest victories for the ecosystem is Solana becoming a multi-client network. Today, sixteen percent of stake runs through the Jito Labs client. Last August, Jump Crypto announced their plan to build Firedancer, an even more performant validator client for the Solana network.

Formalization of Core Contributor Policies and Workstreams: Anyone in the world can contribute code to the Solana network. In the past 120 days, 104 core contributors across multiple organizations have committed code to the Solana network Github repo. They come from multiple organizations, including several independent developers. But the process for becoming a core contributor has long been ad-hoc. The Solana Foundation introduced a few new processes to make core contribution more streamlined and accessible, inspired by best practices in open-source networks like Mozilla. These include:

  • Core Developer Community Calls: Sharing information between multiple teams working on different Solana software clients can get messy. That’s why the Solana Foundation has begun hosting monthly core contributor calls, open for anyone to join or watch recordings.
  • Solana Improvement Documents: Some changes on the network require coordination and buy-in from all the teams working on Solana validator clients. To make this process efficient, the Solana Foundation introduced Solana Improvement Documents (SIMD). SIMD are proposed design documents for how to make changes to the network that require coordination across multiple core development teams. Anyone can submit a SIMD for the core contributor community to review, and then core contributors across teams can come to a consensus on whether and how to adopt the proposal.
  • Solana Labs Contributor Access Policy: Solana Labs formalized a contributor access policy, so that it’s clear for outside contributors to know how they can go about contributing to the Solana Labs code base.

Solana mainnet beta launched in March 2020, three years ago. In that time, the ecosystem has matured substantially. Today, the Solana Foundation is focused on increasing node quality in addition to node quantity, supporting an active and independent core developer community, and ensuring there are multiple software clients for validators to choose from.

As we dig into the data below, we’ve included some updated thinking about the importance of the traditional metrics for measuring validator health. The Solana Foundation strives to be rigorous and intellectually honest as we assess the network’s health and opportunities to make it even more resilient, and we encourage the community to share their thoughts here.

Total Validator Count

Blockchains with more validators tend to be more resilient. When a user executes on a contract on a blockchain, they need to be confident that their transmission will be recorded. Ideally, each transmission on a blockchain is recorded on every validator on that chain, which is why a higher number of validators is important: The more times that a message is recorded, the more confident a user can be that their message is accurately recorded and won’t be tampered with.

There are two types of validators on the Solana network:

  • Consensus nodes: Consensus nodes are central to the functioning of the network by providing two essential functions: (1) creating and proposing new blocks to the rest of the network and (2) voting on the validity of new blocks proposed by other nodes on the network.

  • Each block contains many messages that are submitted by various users and applications on the network. Every consensus node independently verifies all new messages in a proposed block before voting on its validity. The more nodes that participate in this consensus process, the more confidence a user or third party has that a change to the network or a transaction was verified by a large population.
  • RPC nodes: Remote Procedure Call (RPC) nodes are an application’s gateway to the Solana infrastructure. RPC node operators can offer API, indexing, or other services to provide a convenient interface for users and applications to the core Solana network. These are often commissioned or run by individual applications and are dedicated to that program’s particular task, rather than maintaining consensus on the blockchain. RPC nodes, like consensus nodes, all independently verify all new blocks and changes to the network. They do not vote.

A large number of nodes is critical for the health of the network. There’s no clear number for how many nodes is enough. What’s important is that:

  • Users feel confident that their submission will be recorded, no matter what. This is why it’s important to have a large number of copies of the current state available on many nodes, and that they exist in a broad distribution across the world. Solana’s single global state tracks the latest contents of each account, updated in real time with the latest information.
  • Nodes operate independently of each other. A failure of a single node or set of nodes (say, run by a single entity or in a particular geography) should not impact the functioning of the network.
  • Users can verify the accuracy of submissions by looking at other nodes. If a single node goes down or has an issue recording a message, users can rely on other nodes to verify the accuracy of the blockchain.

How Solana is doing:

The Solana mainnet beta network went live in March 2020. Since then, it’s grown into a network of over 3,400 validators, including over 2,400 consensus nodes. 2

Last updated 3/6/23.

The absolute number of nodes on Solana is quite high relative to other proof of stake blockchains, and the Foundation is increasingly paying attention to node quality – not just node quantity. What makes a “high-quality validator” is subjective, but some examples include uptime, service levels in the case of a user issue, or how active the validator operator is in the broader validator community. The Foundation is exploring how it can encourage even more quality validators and will share these reflections with the community as they roll out.

In the next section, we’ll discuss the Solana network’s health in terms of its Nakamoto Coefficient.

Nakamoto Coefficient for Stake Distribution/Voting Power

Users of a blockchain must be confident that any valid message they submit will be included in a block and then confirmed through consensus. If a group of consensus nodes becomes compromised or acts maliciously in a coordinated manner, it can attempt to alter or prevent the network from achieving consensus on new blocks. The Nakamoto Coefficient is a common way to measure a blockchain’s resilience against such behavior. The Nakamoto Coefficient, a metric first proposed by Balaji Srinivasan, is defined as the minimum number of nodes that would need to be compromised to alter or stop consensus in a network, thereby preventing some or all new blocks (and therefore the transactions within them) from being confirmed. This process is known as censorship, and could impact the entire network or some subset of users or applications. In proof of stake networks, the Nakamoto Coefficient is the minimum number of nodes required to represent at least 33.4% of voting power.

Consider: A business wants to maintain a monopoly over a certain type of app on a chain. If they can strike a deal with validators who represent 33.34% of the stake on a proof-of-stake network, they can stop the entire blockchain from accepting transactions from competitive businesses by refusing to vote on blocks containing the censored transactions.

How Solana is doing:

On Solana, the Nakamoto Coefficient is 31. This means the lowest number of validators that would have to collude to censor the network is 31. The Nakamoto Coefficient is unchanged from the last Validator Health Report (August 2022), when it was also 31.

Solana’s the Nakamoto Coefficient grew steadily from the chain’s launch in March 2020 through September 2022 and has remained relatively stable since then.

The Solana Foundation is evaluating options for encouraging more high-quality validators. We expect that over time, as high-quality validators get rewarded and more validators work to achieve higher performance, the Nakamoto Coefficient will naturally rise since users will be incentivized to stake to a broader pool of validators.

We’ve also included the Nakamoto Coefficient of several other proof of stake blockchains, for the sake of benchmarking.

Footnotes

A set of validators with a cumulative 33% of the network’s stake delegation is known as a superminority. A compromise of a superminority would impact the blockchain’s real-time ability to guarantee that new blocks be voted on and added to the chain. In the event that a superminority is compromised, the blockchain could recover by excising the affected validators and restarting consensus without them. The Nakamoto Coefficient for stake distribution represents the size of the smallest population of individual validators which comprise a superminority of stake delegations. When stake distribution is highly centralized, a smaller number of validators may represent 33% (superminority) of total stake delegations. In a more decentralized distribution of stake and consensus power, this set is larger.

There are other factors that impact the resilience of the blockchain. An underestimated one is validator client diversity. 

Validator Client Diversity

One of the most underappreciated failure points in a blockchain is the validator client software. Validator clients are like an operating system — they’re the software a validator runs to participate in the network.

Software is written by humans and is prone to error. Clients might have bugs, or a malicious actor might be able to sneak through a harmful piece of code, or compromise code in software used by the validator client. This is called a supply chain attack. The open source nature of blockchain makes bugs and attacks easier to catch. Another important way is by building additional software clients, so that validators have options of which validator client they use.

Validator client diversity is important beyond providing protection from zero day exploitation. If a bug exists in one client, it’s highly unlikely that it exists in other clients. That means that a bug in a single client is much less likely to cause a long network outage, especially if multiple clients are being used at high rates. More validator clients processing more stake may have been useful in preventing or shortening the network’s performance degradation over the weekend of February 25, 2023.

We can measure client diversity through 1) the total number of clients available for use on a blockchain, and 2) the percentage of stake being run through each client.

How Solana is doing:

Until recently, there was one validator client on Solana, which was originally developed by Solana Labs. In August 2022, Jito Labs released a second validator client to mainnet. This is a fork of Solana Labs code that Jito is independently building and is responsible for maintaining, changing, and deploying.

Also in August 2022, Jump Crypto announced plans to build a completely new validator client on Solana. This validator client is being developed from the ground up in C++ and has a theoretical throughput of one million transactions per second (compared to the original Solana Labs client, whose theoretical throughput is 55,000 transactions per second). Jump Crypto demoed an early version of this client, Firedancer, at the Breakpoint conference in November 2022, showing the client processing 1.2 million transactions per second with simulated traffic.

While most of the current validator network uses the original Solana Labs client, an increasing percentage of the network is opting to use Jito Labs’ software. We’re monitoring this number closely and hope to see it become more evenly distributed over time, particularly once Firedancer is deployed to mainnet beta.

Even a large, highly distributed network with multiple software clients is vulnerable to several exogenous factors that could impact the resilience of a blockchain. We discuss those in the next and final section.

Distribution

The Nakamoto Coefficient and client diversity are critical metrics, but don’t capture the human element involved in running a blockchain. One of the least appreciated aspects of validator network health is the role of exogenous factors, such as geopolitics, natural disasters, and corporate interests.

That’s why, in this final section, we look at the Solana network’s resilience in the context of a few exogenous factors and why they’re important.

Data Center Concentration:

Anyone can run a Solana node. Because Solana requires highly performant hardware, validator operators will often rent server space from privately run data centers to run their nodes. This is not unusual; the majority of the computing power on most blockchains is done on privately owned servers in large data centers.

The risk of using private data centers to run validators means that the owners of data centers have disproportionate power over the functioning of a blockchain. It’s important that stake on a blockchain is relatively distributed among private companies that rent server space, in order to minimize the risk that a single company can compromise a chain.

We saw this risk rear its head in November, when server provider Hetzner blocked Solana nodes. This was the equivalent of a 20% attack on the network and demonstrates why distribution of stake across multiple server providers is so critical.

How Solana is doing:

We’ve split out the data below based on the Autonomous System Number (ASNs) of major data centers, based on data that’s publicly available. An Autonomous System (AS) is a network of servers with a single routing number. Different Autonomous Systems are identified by a unique number, known as the ASN. Depending on how the internal networking/routing is configured, a single ASN could span multiple physical locations in different geographies.

Source (last updated 2/6/23)

Stake on Solana is relatively distributed among ASNs, with no one autonomous system hosting anything close to 33.3% of active stake. At the moment, at least 3 data centers would have to collude to assemble more than 33.3% of stake and halt the network.

Geography:

A global, resilient blockchain has to continue operating, no matter the events in a given part of the world. Consider:

  • One government carries out an attack on underwater fiber cables that deliver internet, and knocks out internet to an entire region.
  • A dissident facing retribution from a dictatorial regime has to feel confident she can access funds, even if that regime chooses to shut down servers running a chain in-country.
  • A natural disaster disrupts all the nodes in a particular region. Users of a blockchain in any part of the world still need to feel confident that chain will keep running, even when many validators are unexpectedly knocked offline

How Solana is doing:

Here’s a snapshot of the geographic distribution of the network, organized based on the percentage of stake in each country.

Source (last updated 2/5/23)

The network is well distributed geographically, with no one country having 33.3% of active stake.

Looking to the future

The Solana Foundation is continuously working to improve the health of the validator network by providing tools and education to our global community of validators and stakers, and by encouraging the community to be thoughtful participants in securing the network.

We continue to celebrate growth in the validator community and Nakamoto Coefficient, but our focus has broadened to include less immediately measurable ways to improve the network health. These include:

  • Organizing opportunities for core developer teams to learn from each other and collaborate where possible.
  • Supporting the validator network with new documentation and tools to make it easier to run a successful validator.
  • Hiring a “Head of Staking Ecosystem” to help ensure the Solana Foundation’s staking program effectively supports the Solana validator community, and to serve as an ambassador for the Solana community of stakers, validators, and integrators.

We welcome feedback and questions as we engage with the community.

Special thanks to data consultant Joseph Burgess, and the teams at Block Logic and Chainflow. Your input in helping make the data in this report as accurate as possible is deeply appreciated.

Editor's Note, March 31, 2023: In response to community feedback, the Nakamoto Coefficient for Ethereum was updated to 20 after previously being listed as 1. The Foundation calculated Ethereum's Nakamoto Coefficient using data from Rated to pull the stakeweights of major node operators and Lido to calculate the stakeweights of individual Lido node operators. The Foundation also believes that a real-time halt to the network would require a 50% attack on the Ethereum network (as opposed to 33% on most Proof of Stake networks). With these factors in mind, we've updated the Ethereum Nakamoto Coefficient to 20, which we believe is the minimum number of nodes that would need to be compromised to halt real-time block production on the Ethereum network. The Foundation's original source for Ethereum's Nakamoto Coefficient, Nakaflow, makes two different assumptions in its calculations: 1) It counts all Lido nodes as a single node, and 2) uses the 33% attack threshold, instead of a 50% threshold. We welcome continued community feedback on this calculation and others across the report.


Footnotes (Click to expand)