This is a transcript of the discussion after the call. Feel free to continue in this topic.
- 1D vs. 2D vs. 3D+
Q: Seeing the step from 1d to 2d, the naive thought that pops up: "why stop at 2 dimensions, why not 3+?
A: We have analyzed the possibility to move to a three-dimensional erasure coding. Looking at the numbers it doesn’t really bring much in comparison to what we have in 2D. It doesn’t really increase the reliability level that much and it does increase the complexity significantly. So it’s not really necessary at this point. But it was analyzed, it was evaluated and, who knows, maybe in the future we will want to move to three-dimensional erasure coding. That’s a possibility indeed.
Q: What is it about why it doesn’t scale that way can help give some more intuition about what’s happening?
A: Okay, I’m not sure I completely understood the question, but if you’re asking why we need two-dimensional erasure coding versus one-dimensional — it’s primarily about data recoverability.
With 2D erasure coding, you have many more ways to recover lost data than with 1D. For example, in one-dimensional coding, if you lose a part of the data — say one chunk plus one extra cell — you will not be able to recover that portion. If you lose just over 50%, recovery becomes impossible.
But with two-dimensional coding, you can still recover data even if you lose an entire row. Say you retain just 25% of the total data — like this small blue square — you might still be able to reconstruct the missing pieces using the corresponding columns. So in terms of reliability and recoverability, 2D coding is much stronger.
On top of that, 2D coding lets us detect availability faster. You don’t need to sample as much. There’s a simple probabilistic calculation — after sampling about 70–73 cells, the chance that data is unavailable drops to something like 10⁻¹⁰. That’s a very low failure probability. So not only does it help with recoverability, but also with efficient data availability checks.
- Incentivization
Q: What are the incentives for nodes to store and propagate blobs? What happens if a node doesn’t store the blob data it’s supposed to? Is there a punishment mechanism?
A: That’s a great question. This is something I didn’t mention. You are supposed to store data based on node ID so you take your node ID (by the way, of course, your node ID is public, everybody knows your node ID is disseminated over the peer-to-peer network) and based on your node ID, we can know what are the custody columns that you need that basically what are the columns IDs that you have custody of. And the minimum requirement is four columns as I said. If you store more I will know which one are those because just from the node ID I will be able to compute deterministically and anybody in the network will be able to compute deterministically that you need to store this column, this one, and maybe this one just by looking at your node ID.
So what happens now? I look at your node ID in my peer table and then I will be allowed to ask you for any of the specific columns that I know that you have custody of. If you don’t send it, then I can decrease your score in kind of my local reputation system, saying this guy didn’t provide me the data that I wanted this time. And then I can ask it again and again and again. if you never provide me the data that you are supposed to have then I will disconnect from you.
So overall using this mechanism what will happen is the nodes in the peer-to-peer network will always try to be connected with those nodes that are always providing the data that they are supposed to have and they are supposed to have it because the node ID is public and if they don’t then you will be downscoring them to the point that you will be disconnecting from them. So what will happen automatically is, nodes that do not provide the data will be kind of kicked out of the peer-to-peer by the nodes that what happens.
Q: Are peer IDs recyclable? what prevents me from saving bandwidth by not storing blobs? And when I’m disconnected, I just generate a new peer ID.
A: So, you could do that. For that, you will basically need to erase your local information and then you will probably need to relaunch your node. Probably re-sync from zero and then retry to discover peer-to-peer network. so, it’s feasible. It will take you some manual work and it will take you some time to reboot the node. But yeah that’s something you could do. It doesn’t sound like really dangerous attack. I mean what you are doing mostly is wasting time, but it’s not really going to affect the other peers in the network and most likely you will be picked out very easily and it’s not going to affect. Most of the nodes are connected to 100, 150 nodes so one guy acting maliciously in that way is not really going to do much harm.
- Peer scoring and gossipsub
Q: So with the peer scoring - how do you integrate that, because gossipsub would not be aware that a specific peer has to have this data so there would be some layer on top of that which hooks into peer scoring.
A: Yes, absolutely. So this happens at the consensus layer client logic. This is not implemented on gossipsub. This is where you have to keep track of what are the nodes that are around you, what data they should hold and keep track of those that provide data and those that don’t provide data. Again most of the nodes right now I think Nimbus usually runs with 160 up to 200 nodes connected constantly. So there is a large variety of nodes to choose from when you need data and if there is a node that is constantly ignoring you or failing to send you what you need that is going to very quickly go to the bottom of that table and going to get disconnected.
Q: How do you actually request that specific data? Is this a separate protocol, some request response protocol or? Just wondering how you put this on top of gossipsub.
A: It’s a request response, yes. So there are two - one that is push, so there is the way that I got this column I verify that it’s correct and then I push it over my gossipsub topic and that is just plain push based and so you receive it from your peers and everything is fine. But there is also a request response which is pull-based of course way in which if a specific node is not providing you the data that you want then you can downscore it.
Q: Do you use IHAVE, the gossipsub internal mechanisms or something outside of gossipsub to request that specifically?
A: So right now we already implemented IDONTHAVEs in most of the clients so I think Lighthouse already implemented, Prisma has implemented, I think Teku already implemented probably the large majority of the network is already running with IDONTWANTs which is once you receive the data you can send this message to avoid wasting bandwidth and we’re also working on IHAVE which is a way of gossip which is I have this data I’m not going to send it to you, but if you want it, you can ask for it.
- Alternatives to RS coding
Q: Is there any other research direction using alternatives to RS codes?
A: This was evaluated at the very beginning. There was a little bit of research on what other erasure codes we could use instead of Reed-Solomon. The standard in the industry all over — actually not only in blockchain technology but almost every industry — is to use Reed-Solomon because it’s very practical and it’s also very fast. Right now we have several libraries that manage to do Reed-Solomon encoding very fast. I know that we evaluated other types of codes but Reed-Solomon also has this property. MDS — minimum distance — that’s also very nice to have. Also that property is kind of lost when you do it on a two-dimensional schema as I presented but yeah I think it is the best choice at the moment given the libraries that are out there and how developed Reed-Solomon is in the system. In conclusion, we did evaluate a few other options it’s possible that in the future we switch to something else if we find something that is faster and more reliable. At the moment we are with Reed-Solomon.
- Gossipsub pt.2
Q: One thing I also wanted to ask is on gossipsub — you would only send columns or even parts of columns, but the full blob would not be sent over gossipsub anymore? I’m asking this because that will kind of change gossipsub improvements that have been done so far which were addressing very large messages, but now — is this your kind of break down messages and then the actual messages that get disseminated over gossip will be small?
A: That’s right. At this moment what is used on the execution layer network to disseminate blob data — full blobs — is dev peer-to-peer. And I think that’s working quite well there. Then from the consensus layer client, with this proposal basically you don’t need to send anything over the network — you just get it from local — and then you disseminate parts, columns, over gossipsub as you said, partial columns are likely to be much smaller than the full blob so it will not take advantage of those features that were implemented specifically for large messages but I think it’s definitely going to make use of other features that were implemented recently on gossipsub that are for other purposes.
Q: Yes, that’s very very interesting. And I mean these things — they could also be ported to dev peer-to-peer. There are also some efforts that people want to switch that to gossip unifying because there’s only a few differences — would make sense to have that yeah.
A: Absolutely. There are some discussions about whether we should switch to gossipsub on the execution layer as well or not. I know that they are already working on discovery v5. Previously the execution layer was always working on discovery v4. Now I think most of the implementations are already working with discovery v5. so I think there is a lot of transition going there, so maybe in the future blobs will be disseminated with gossipsub.
Q: And one more question — I want just to clarify — when you said that dissemination starts at several points that’s because blocks have been shared on the execution layer and then there’s several entry points when they enter the gossipsub network?
A: Exactly. So the previous idea was — the block builder sends all the columns and all the rows to everyone and the idea was to put a counter that says how many hops you have done and after i.e. five hops you don’t push the data anymore you do only pulling. so basically an IHAVE— as you mentioned previously — you send an IHAVE and if the other peer still doesn’t have that data then it would request it from you. so that’s what we call the push–pull transition but the problem of doing that is that the one that sends you the column or the row with a hop equal to zero — then that node is the block builder and the block proposer. Then that makes it very easy to kind of identify which validator is in which physical node, which is kind of dangerous because then there is no anonymity in the network anymore. With the proposal that I showed here the columns will not arrive from one single node. they will arrive from all the peers you are connected with because they’re going to send you different parts of the column and you are going to reconstruct it locally and you are going to receive pieces with hop zero from several directions and then you don’t really know who is the block builder at any point. So that’s how it keeps the anonymity.