Plans to Improve Network Upgrades
This blog post first appeared on the Solana Labs medium
Delivering a fast, reliable and scalable network in order to move toward a better, decentralized web remains a top priority. The issues around last week’s 1.14 network update – which focused on improvements for speed and scale – made it clear how maintaining stability during these major updates remains a challenge.
An investigation is still ongoing and more details will be provided here when available, but in the meantime, I want to share the plans in motion to address the balance between reliability and building a scalable and fast network, and where we go from here.
Up to the 1.14 release, core engineers were working to fix live problems that were impacting the network’s speed and usability. These issues included invalid gas metering, lack of flow control for transactions, lack of fee markets, spiraling ram, storage and restart overhead.
Addressing these issues was prioritized in an attempt to improve the user experience on the network. Following the latest release, core engineers plan to improve the process for software release rollouts by bringing in additional external developers and auditors to test and find exploits, and continuing to support external core engineers - including the Firedancer team building a second validator client.
Improve the upgrade process
Core engineers will work with validators to improve the software release process. Previous releases followed a certain pattern like the one for 1.14, which was as follows:
- Mainnet-beta validators run 1.13
- Testnet validators run 1.14
- Devnet validators run 1.14
- Mainnet-beta validators begin running 1.14 on master canary nodes (i.e. test nodes)
- Validators, RPC operators, as well as teams deploying dApps on the network, provide feedback on 1.14
- Mainnet-beta validators began a full deployment of 1.14, initiating the upgrade process
Despite having mixed nodes running against mainnet-beta, the behavior of the network changes when the supermajority changes versions.
Core engineers plan to help improve the process as follows:
- Before the mainnet-beta upgrade, downgrade testnet to the current mainnet-beta version and feature-set
- Upgrade testnet to the release candidate of the new version
- Observe how the testnet migration goes in real-time
- Downgrade testnet back to current mainnet-beta version
- Repeat this process while stress-testing the testnet
- Release new version to mainnet-beta validators for upgrade
This would require regenesis of the testnet image during the first downgrade. Part of this simulation should include changing the stake distribution to mirror mainnet-beta.
Both status.solana.com and @SolanaStatus, will continue to be used to signal the upgrade sprints and the status.
Forming an adversarial team
While core engineers previously performed integration testing, an adversarial team has also been formed comprised of nearly 1/3rd of the Solana Labs core engineering team to build additional hooks and instrumentation into the validator code to help find exploits across the underlying protocols and provide hardware to run medium to large clusters for adversarial simulation.
Improve the restart process
While fully automating the process is difficult, different kinds of failures can be solved with simpler procedures in an effort to improve the restart process. Nodes should be automatically discovering the latest optimistically confirmed slot and sharing the ledger with each other if it is missing.
Continuing to focus on stability
Over the last 12 months, Solana Labs and third party core engineering teams have also been working to improve the network, and will continue to do so with a focus on stability. For example:
- A second validator client is being built by Jump Crypto’s Firedancer team, focused on increasing the network’s throughput, efficiency and resiliency.
- Mango DAO developers are focused on the tooling needed to build on Solana.
- Network communication technology transitioned to QUIC, a more advanced networking protocol
- Local fee markets were implemented
- Stake weighted QoS was incorporated to improve the ability to land transactions
- Jito’s MEV client is providing alternative paths for landing transactions
- Improvements to RPC infrastructure to reduce their load
Today, there are more than 2,000 developers building thousands of programs on Solana. These developers were attracted to Solana because it lets them build things they can build nowhere else, but those developers also need a stable and predictable foundation. Core engineers are committed to making these changes so that reliability does not suffer for the sake of innovation and speed.
The community’s input and support is invaluable and helps the network get ever closer to a more decentralized future.