Solana's engineering north star is IBRL—Increase Bandwidth, Reduce Latency. Every major upgrade shipping today has that goal in mind. As the network gets faster, the gap between the protocol and the hardware it runs on keeps shrinking.
The closer the protocol gets to the hardware, the less room there is for the abstraction layers that cloud providers and container platforms put in between. You can run a validator on AWS, GCP, or inside a container. However, in practice core engineers have seen cloud based solutions performing poorly when compared to bare metal hardware under load.
XDP and 100M CUs
The upcoming protocol feature activation for 100M CUs demonstrates the need for bare metal hardware. 100M CUs is a 66% increase over today's 60M CU cap. More compute units per block means more transaction capacity, but it also moves the bottleneck. At 100M CUs, the constraint is no longer execution; it's Turbine, the layer that propagates blocks across the network. If shreds can't fan out to thousands of nodes fast enough, the extra capacity is not helpful to the network.
To support the 100M CU feature activation, XDP will soon be enabled by default for all clients. To read more about XDP, see the XDP on Solana post.
XDP: High Performance Networking
XDP is high-performance mode for your network interface card. It skips the slower more generalized path that your kernel uses to handle networking and instead moves the logic to the hardware. The Anza XDP setup guide walks operators through XDP configuration and explains what operators must do to get the most performance from their hardware.
Here are a few requirements to highlight:
- Elevated capabilities. The validator process needs
CAP_NET_RAW,CAP_NET_ADMIN,CAP_BPF, andCAP_PERFMON - Dedicated cores. XDP and Proof of History (PoH) must be assigned to separate physical cores. Not threads, not "vCPUs"—physical cores.
- Serious packet rates. Because Turbine fans shreds out aggressively, a highly staked validator can push approaching 150,000 outbound packets per second. Highly staked nodes send even more, because they get more leader slots.
Every one of those requirements is easy to satisfy when you have control over the hardware: you can pick the NIC, you pin the cores, you choose the driver and kernel.
"Possible" in the cloud is not the same as "competitive"
It would be incorrect to claim XDP can't run in the cloud. It can.
- AWS's ENA driver supports native AF_XDP with zero-copy.
- GCP's gVNIC driver supports driver-mode XDP.
However, the details matter when running a high performance validator. These NIC drivers are supported, but a cloud VM still puts abstractions between the validator and the hardware that XDP wants to reach directly.
- A virtual NIC is not a NIC you own. ENA and gVNIC are fast,
provider-managed devices. You don't choose the physical NIC model, firmware,
queue implementation, or switch path. The
ethtool, ring-size, IRQ-affinity, and NUMA tuning that validator operators rely on is either unavailable or only partially effective.
A cloud instance can be extensively tuned to perform well. However, to make a cloud or containerized validator competitive, you end up dedicating the instance, pinning vCPUs to physical cores, enabling host networking, granting elevated capabilities, and giving the process direct access to the host NIC. Each of those steps moves you away from what virtualization and containers are for. Once you have done all of them, you have discarded the elasticity, isolation, and portability that justified the cloud in the first place. You are now doing more operational work than you would running directly on a bare machine, and the best case is that you only match its performance.
Containers add a layer you then have to delete
Containers add another layer of abstraction that can lead to performance issues.
The only container shape that preserves XDP performance is one that
systematically removes container isolation: --network=host to share the
host's network namespace, the elevated capabilities listed above, direct
access to the host interface.
This is why Anza's own validator requirements strongly suggest that running an Agave validator for live clusters, including mainnet-beta, inside Docker is "not recommended and generally not supported," citing containerization overhead and performance degradation unless specially configured. The same page warns that running in the cloud "requires significantly greater operational expertise to achieve stability and performance."
CPU, RAM, and Storage
XDP is an upcoming high performance improvement, but operators should strive to take advantage of the hardware directly. Bare metal's advantages extend across the whole machine. Anza's requirements call for a high-clock CPU (2.8GHz+ base, AMD Gen 3 / Intel Ice Lake or newer, with SHA and AVX2 support). They also call for generous ECC memory and fast NVMe storage.
On a generic cloud instance, storage is often a network-attached block device with provisioned IOPS and throughput ceilings. On bare metal you choose known enterprise NVMe drives and avoid hidden shared-infrastructure limits.
The community has created a very useful resource, solana hardware compatibility list, cataloging common Solana validator hardware along with a summary of operator opinions on the hardware. The site contains known good CPUs, storage, and networking for mainnet validators, and on recommends dedicated hardware. A good example from the site is Anza's networking requirement for a staked node is a 2 Gbit/s symmetric connection, but the community recommendation of 10–25GbE comes from experience running a high performance machine in practice.
Recommendations
For a production, staked Agave validator, the metrics that matter most are skip rate, block propagation, and 100M CU readiness. The strong recommendation is to run on dedicated bare metal. Prefer a high-clock CPU, ECC RAM, fast enterprise NVMe drives, and 10 to 25GbE symmetric connectivity. At today's throughput of roughly 2,000 TPS and 100M CU blocks, most modern NICs are sufficient. As TPS grows, a high-end NIC family proven for AF_XDP zero-copy gives you the headroom to keep up, and Mellanox/NVIDIA ConnectX is the standout choice. Run Agave directly under systemd rather than inside a container.
A cloud VM or container is a perfectly reasonable tool for the right job, such as experimentation, monitoring, or RPC prototyping. If the goal is a top-tier validator with XDP enabled, running on bare metal is strongly recommended over a cloud or container based deployment.
References: