How Does Solana Work? Anatoly Yakovenko Episode 2: No Sharding Podcast
About Solana: We are a lightning-fast distributed ledger technology for mission-critical decentralized apps. This podcast is a discussion between our staff, community developers, and industry leaders. You can follow us on Twitter @solana or GitHub @solana-labs. Subscribe on Spotify, Apple Podcasts, Google Podcasts, or the direct rss feed.
Andrew: I am here with one of the co-founders of Solana, Anatoly Yakovenko. How’s it going?
Anatoly Yakovenko: Hey. It’s going well.
Andrew: We’re in our San Francisco office. This is the first podcast. We haven’t actually set up the studio yet, we’re kind of piecing things together. But it’s exciting to be out here. We are at New Montgomery and Howard Streets, so right downtown in SoMa.
Anatoly Yakovenko: Yep.
Andrew: Amazing office. We talked with Greg on the last episode about kind of how Solana came together, what it was and the main question that I still get from people is about the speed. About how it actually works. Do you want to give me the very basic, how does this actually speed up to this level? And then we’ll dive into more.
Anatoly Yakovenko: There’s a couple aspects to speed. People focus on TPS, which is transactions per second. You can think of that number as how many ways you can, for example, talk to a database per second. Maybe do a bunch of reads per second or a bunch of writes per second and there’s different ways to measure all those things. And kind of another interesting number, in terms of speed, is block time.
Anatoly Yakovenko: So Bitcoin block times are notoriously long, 10 minutes. In Ethereum, I think it’s about 15 seconds, which is much faster but still way too slow for humans. Our block times are 800 milliseconds, and I think we’ll ship with 400 milliseconds, which is pretty awesome. And, again, all these things are possible because of this secret sauce, which is proof of history.
Andrew: I love that we have the secret sauce, but it’s also open source software.
Anatoly Yakovenko: Yeah, yeah. yeah.
Andrew: So you can check it out, you can go into GitHub, you can see how it works. I’m sure there’s a few people listening that actually can make that happen, but not many in the world.
Anatoly Yakovenko: Yeah. So, if you back up into history, like that people discovered radio and they started communicating to each other. And what they noticed is that if you have two people transmitting over the same frequency at the same time, you get noise. So you get a collision. So the first thing that they did is kind of, you know, first of all you get kind of a license for specific frequency so you’re the only one that can transmit. But that’s only one way communication.
Anatoly: So if you want to do bi-directional communication over the same frequency, you need people that can transmit and how they organized everyone that was transmitting over the same frequency is by using a clock and then giving them a particular minute, or hour, or second. Now in the phone, five millisecond slot when you can actually transmit.
Anatoly Yakovenko: So imagine, you know, you have what is it? Two billion LTE devices worldwide, but within a tower, you might have, you know, hundreds of thousands of these connecting to a single tower, and they all want to talk to each other and to talk to other people worldwide. So every five milliseconds, only one of them can transmit over particular chunk of the frequency that’s available. So that allows many more participants to send data over the same channel.
Anatoly Yakovenko: So once you have a source of time that’s global, that everyone can trust you can do these kinds of optimizations. Of course, everything in blockchain is harder. So the reason why it’s harder is because we’re trying to build this distributed system where nobody trusts each other, right? Nobody trusts what you think the right time is and nobody trusts what I think the right time is, and everyone is expected to try to cheat.
Anatoly Yakovenko: So, you know, as the story goes, in 2017, I had too much coffee and I was up till four in the morning, and I had this realization that you can use this mechanism called sequential hashing. And if you run this hash function like a SHA-56, same hash function that Bitcoin uses, you run it over itself. So output is the next input. So imagine this really, really long chain of these hashes. You can generate a data structure that can only be generated with real-time passing. There’s no way for you to spend billions of dollars to make it parallelized and run, you know, a million times faster. There’s no way for you to change the mathematical, like find some mathematical function that shortcuts the computation and you can run it instantly. It basically means that, you know, if somebody ran this thing and generated this data that they actually spend some amount of real time doing it, and the amount of time is not limited by the speed of their single threaded single core CPU.
Anatoly Yakovenko: And to build a faster CPU, you need to spend, you know, 20/40 billion dollars to just get a 50% improvement at best, 30 to 50%. So that’s our kind of our clock. It’s very inaccurate. So you running this hash function and me running was hash function, we’re going to get different real-time results. But what matters is that when I present to you a proof that time passed, that you know that I didn’t cheat. That I actually spend some real time generating this. And because of this we can construct a distributed system that treats this data set almost like a global water clock. So imagine you have water dripping and the level’s rising, and everybody can see that the levels rising, and they can trust that real time is passing. And then you can start writing software that depends on certain events happening at certain levels in this clock.
Andrew: So at this point when I’m telling people about it or discussing it, somebody says, “What’s the catch?” So are you not completely decentralized? Is this not really a blockchain? And the answer is.
Anatoly Yakovenko: Ah, well proof of work has this really awesome property that when you generate a block that it’s totally random. Unfortunately because of that randomness, it’s very hard to … What the randomness buys you in some way is censorship resistance to some extent and maybe some fault tolerance. So the catches are … And it’s very, very simple. Like the bitcoin white paper, I think you don’t need a computer science degree to understand it. Most folks should be able to read it and maybe there’s some terms you don’t understand, but they’re all, you know, I expect anyone to be able to really get it. What we’re building because of the optimizations we’re doing is just so much more complicated. So the catch is just the software is more complex. So-
Andrew: I highly recommend everybody read the bitcoin of [crosstalk 00:07:45].
Anatoly: Yeah, yeah.
Andrew: A lot of people haven’t and it’s actually just very elegantly written and lovely. You were going to continue on or something or?
Anatoly: yeah, so complexity is this a problem because for building this global distributed financial system, if the design is so hard to understand for everyone and it’s hard for people to trust, and therefore it’s going to take much longer for us to achieve that kind of level of ubiquitous that bitcoin has. And Greg actually has done a lot of work to help about, if you guys go to our website, there’s a link to a book. A basically in-depth documentation of every design we have in the system and you know, reasoning behind them and how and why those pieces are there.
Andrew: And I’ve given that book, it’s on GitHub, right?
Andrew: I’ve given that book to several friends that are really big into Ethereum, and they’re trying to understand how we’re working and then they’ll spend the weekend. I mean, it’s not a light read, but there’s definitely a couple of light bulb moments of like, “Oh, I mean is crazy that you were able to figure it out this way.” So there’s some really interesting tech there. It’s a fun read if you’re into challenging your mind. You might have to look up some things of course. So we’re coming up to showing this to the world.
Andrew: We’re coming up to this actually not being on Testnet and being quite real. Walk me through the next couple months as far as what’s going to happen.
Anatoly: So in April, this was a few months ago. Actually start of kind of around Christmas time last year we were kind of moving along fairly quickly, but this major piece was missing, which is the fact that previously our Testnet was running in Google Cloud with a single block producer that would just run until failure, and in that network, right? We can demonstrate some really important things that we can get a bunch of transactions to a single node and then replicate them and confirm that all these other machines replicated, and ran the exact same transactions and came to the exact same result. And that network can do like 200,000 TPS. I could probably do more. We were saturating the switch and the Google cloud, on Google Cloud. If you can believe that, which is awesome and distributed the state to 200 machines in 800 milliseconds, get a confirmation back that everything is exactly the same on all the machines.
Anatoly: The hard part is taking that thing and making the leaders, the block producers rotate in such a way that if any of them fail that the network just continues, and moves forward and doesn’t have a hiccup. We had a really good design. It was just extremely painful to make the changes. Such that we could enable this rotation and fault tolerance. And what was crazy is what was stopping us is almost 3000 lines of tests that were verifying the correctness of the system, but doing it in such a way that any minor change would break the tests in this very brutal way. And the thing wasn’t broken, it wasn’t that the test was broken in terms of it was detecting failures. It was the test itself was no longer compatible and broken and you had to go spend a lot of time fixing them. And this is a big pile of tests and just progress almost came to a halt. And I tried to just to force it through and for about a month we had this, you know, 2000 line PR open and people were arguing, and eventually I just deleted the tests.
Anatoly: This was kind of a light bulb moment for me, like people in development and talk about technical debt. It was crazy, but our technical debt was in our testing and it took effectively, you know, “I’m the CEO and I can do this.” right? It’s just kind of just do it, right? If we weren’t making progress, like in this startup, you’re basically failing, right? And after deleting the tests, I think around February. So basically January was spent trying to move this thing forward. February deleted everything. And in about a month we had everything up and running. And effectively made her first stable, fault-tolerant, Byzantine fault tolerance is kind of like this BFT magic term release that could demonstrate that the network can handle later failure. And after that I kind of started sleeping more, and we started planning the next phase of the company and the project. And this is Tour De Soul, we’re cycling nerds. Four the five co-founders or Ironman athletes. Raj, you’re next.
Andrew: You’re making threats.
Anatoly: Yeah, yeah. Raj is the only one.
Andrew: Marathon is first then we invite him in.
Anatoly: So yeah, so we decided to kind of follow in the footsteps of Cosmos. I think they trail blazed a lot to the right ways to launch a network, and it’s an awesome community and we’re talking to a lot of the validators that are running Cosmos to participate in Tour De Soul with similar goals. I think our twist on it as we’re doing kind of multiple stages where we’re demonstrating our strength, which is latency and performance. And that’s actually fairly hard to come up with how do we measure this thing that is really important to the network and to the users, but each validator and all these other people that are not part of the company anymore, right? This is a distributed network of people with different interests, and different amount of time they can put into this. So we had to give it some serious thought about how to do that.
Andrew: So Tour de Soul is coming up. You can go to a solona.com/tds. Very Tour De Francy of us.
Anatoly: Yep. Actually Tour de France is happening right now.
Andrew: Right now? Great speech last night or this morning.
Andrew: We have, you know this, it’s a pretty niche offering, right? The Tour De Soul event is going to be validators, and if you’re not a validator this is a fun time to brush off some GPOs and play around. But that’s going to be a small group of people. The broader use of Solana, can we talk a little bit about that? In four or five years, how do you see Solana being used?
Anatoly: Oh. Yeah, so I think, you know, crypto wouldn’t have happened. If you’re just listening now you might’ve not even realized that it happened, but this last year people were worried that whole space is going to die. That this was like, “Oh, you know bitcoin hit 20,000 and then it just went to 3000.” Right? Many people lost a lot of money and were basically done with the space, and slowly but surely, you know, it didn’t die and kind of came back. And the projects that survived I think were the top tier projects from 2017/2018, and it’s really actually exciting to see so many people launch and start building, you know, their crazy dreams. So my crazy dream is I think this idea that you have self-custody. That you own your private keys and those keys represent things of value is here to stay. And there’s a lot of flavors of how that’s going to be used. I think a lot of companies are going to launch their own blockchains. Like Facebook is actually launching a, you know, eventually open permission-less network, but you know Amazon will probably launch something where they’re like, “Hey, we’ll just control this thing forever.”
Anatoly: But you’ll still have public key, right? You’ll still have a private key that you own and will still represent value. So imagine thousands of these settlement chains, right? I think what’s cool about us and what’s cool about what we’re building is that we can be this really, really low latency, high throughput execution engine. So if you have a thousand different settlement platforms, the cool thing to be is be the execution and clearing of all these things. Because this idea that if you have ownership and self-custody, it doesn’t really matter what ledger the settlement occurs on. Because I can come to agreement with you about what we are actually going to trade. Whether I give you my Digital Hat and you know Fortnite for your digital gun, and whatever the other cool games are.
Anatoly: And the settlement of those things could occur on totally different games, right? Totally different worlds. But us coming together and making those trades, and doing that in the single open market that’s borderless, that’s really, really fast. That doesn’t have any shards is the really cool thing, right? For us to be in that spot. So everything we’re building is to, you know, make it faster, right? In every way possible. Which means reducing latency. So block times of 400 milliseconds. Increasing the capacity of the network, right? So right now we’re like 60 to 80,000 TPS and there’s no theoretical reason for us not to do 800,000 TPS. It’s just blood, sweat and tears, right? Of engineering. If you look at some of the folks working in our performance stuff, it’s just constantly running experiments and looking at resource grabs from the Linux kernel about where every latency is being hit, and where memory is taking longer than expected, and trying to figure out how do we move information faster through silicon?
Andrew: In a tongue in cheek way we named this podcast No Sharding.
Andrew: You just mentioned sharding for the first time. Let’s dive in. What is sharding? Why the is podcast name No Sharding?
Anatoly: So sharding, the way most people have interacted with it is that Intel in the end of the ’90s and early aughts, they ran into a problem where they couldn’t really increase clock speeds anymore. But you know the fabrication process, while it could reduce the smallest possible feature from I think 20 to 10 nanometers. It couldn’t really increase clock speeds anymore because that requires higher voltages and more heat, and the materials just couldn’t really handle that. So they started shipping chips with a single piece of silicon with multiple CPUs yews on them. And the first design of a multi-core system was basically like, “Let’s just stick two Intel CPUs is at the same die.” And the problem there is that when you do this kind of splitting, this sharding, the state that each CPU is running is not synchronized.
Anatoly: So you have this big pile of memory, like your RAM, you know, that is the single source of truth and then bits of it are sent to the CPU, and the CPU makes changes to that and then sends them back. But now you have two CP use that are trying to change the exact same global big pile of RAM. If they’re not synchronized then they may do changes that are not consistent. So they have to do a big pile of work to try to figure out that only one CPU processes parts of the state that the other one isn’t touching. And if they are touching then they have to do this really, really slow and expensive synchronization step. And that is really the core problem with sharding. I spent, you know, the last five years of my career at Qualcomm basically dealing with this. Like your mobile processor, like your arm device on your android phone as a bunch of cores. All of them are constantly fighting each other. And anytime you’re trying to do anything fast you have to go and ask all the other cores, “Hey, you’re touching into this memory. If so we have to slow everything down and then go clear the caches.”
Anatoly: And it’s a massively complex problem. So when I talked about complexity, what’s the trade-off between us and Bitcoin? Yeah, we’re more complicated but wait until we deal with any system that implements sharding. That is another order of magnitude and complexity, and the only thing you’re getting there is being able to do more things in parallel. But because you’re separating state, what you’re actually building is almost separate blockchains that to synchronize they have to go through the some third mechanism. So in Ethereum they go through the beacon chain, and that takes ,I think Ethereum, that might take six hours.
Anatoly: So it’s really, really slow, right? So effectively those applications are kind of segregated from each other. We have just one giant big pile of state and a bunch of GPUs to churn through it. So applications that want to talk to each other can do that in any transaction without actually needing to wait. So that is the big difference. The reason why we went this path is because I just didn’t even think of it. As soon as had this idea that we have a source of time that actually set us down this road of we don’t need sharding. Because we have this source of time that’s our synchronization point everything else can then be designed for speed using the traditional optimization techniques. Which is, you know, use as many local CPU cores or GPU cores as you can and to do as many things per second. So yeah.
Andrew: I like it. That makes a lot of sense and I think it’s funny that one of the main reasons we don’t use it is it’s just not needed.
Anatoly: Yeah, yeah.
Andrew: They’re fast enough on, I mean, it’s being introduced to a lot of crypto projects just to speed it up, because you’re having very simple projects totally blow the network.
Anatoly: But the thing is, you can’t really even compare a TPS of a sharded chain or the non-sharded one, because what you’re talking about is like, yeah, okay so you can do like you know with Ethereum in a million shards you can do, you know, 10 million things per second. But if you have to talk between the two different shards, then it takes six hours to synchronize. How often is that going to happen? Right? How expensive is that going to be? Reality is that for us, because all these shards are still blockchains that are open, we can be the execution layer where those applications can talk to each other a lot faster without needing to wait six hours. And the complexity there and the costs there are basically how you handle risk, and how you financially secure the risk of those transfers.
Anatoly: And that’s still not trivial but I think it’s less technologically complex, and more transparent to humans. Imagine I have something on, you know, Digital Hat and Fortnite and a sword World of Warcraft, and we want to trade it. Using Solana you can use some native token like our you know, [lamp parts 00:00:24:56] Or it’s what we call her our way or Satoshi’s to basically agree that, “Hey, you’re not going to do this trade and here’s some lamp parts to guarantee that we’re not going to back out in the middle of it or try to screw each other.” And that’s a very transparent and simple way to do things that doesn’t require the technical complexity of dealing with cross sharding, cross-shard application transfers. I honestly suspect that the cost of doing cross-shard transactions is going to be quite expensive in terms of gas fees, because you’re effectively paying Ethereum on 1.0 gas fees to do those.
Andrew: And we’ll dive into talking about gas fees and a lot of other topics on future podcasts.
Anatoly: Blockchain is so complicated. It’s crazy.
Andrew: Hopefully speed will simplify it.
Andrew: That’s my big wish. So final, wrapping up the podcast, the final thing I’d like to talk about is the San Francisco office. So if you’re in SoMa, if you’re walking on Howard Street you’ve probably seen our office from the outside.
Andrew: We’re going start hosting a lot of events here. So we’ve got some ideas on that. We’re also doing podcasts. So we’re going to, you know, if you have any of your heroes that you want to hear from, and I don’t want to do a podcast where it’s like, “How’d you get a new crypto?” You know, I actually want to talk about deep dives into tech. So if you have any ideas on that, please email us at firstname.lastname@example.org. We’ll get back to you. So I think the challenges for the community, let’s challenge the community right now. You’re listening this podcast, my biggest challenge is to get on Discord. Like check out the GitHub book. I think if you want to read Greg’s writing on that. Jump on Discord. We’re a super friendly group of people. I’ve worked at a lot of places and this is by far the most friendly group of founders I’ve ever run into.
Andrew: So that’s a really fun thing to do. And then we are doing one final thing, which is Tour De Soul. So hop on on it, and after that we’ve got any other challenges for the community?
Anatoly: There’s a bunch of a good first issues in all our GitHub projects. Just try one. We’ll make sure that you’re rewarded, you know?
Andrew: We’ll notice. We’ll say, thanks. We’ll get you an nice cycling cap.
Andrew: We should get a cycling caps.
Anatoly: Yeah, for sure.
Andrew: That’s some good schwag. Thank you so much, Anatoly. Anything else you’d like to say?
Anatoly: No, that’s it. So thank you, Andrew.
Andrew: Thanks for joining us on No Sharding.