IFT Research Call, December 18th 2024 - libp2p GossipSub Scaling

Phil · December 19, 2024, 7:21pm

This is a transcript of the discussion after the call. Feel free to continue the discussion in this topic.

Addendum

We already have open PRs for the specifications. Also, we have an idea of using the IDONTWANTs instead of IAMRECEIVING which has a different set of benefits and disadvantages although it doesn’t change the approach. We wanted to avoid introducing a new message, small details would change and current approach makes the implementation simpler for the libp2p community to accept. In the PR GossipSub 2.0, two new messages are introduced - it might make sense to have separate messages after all due to different semantics, but also different trade-offs.

One more thing to consider: many of these improvements are heavily targeted towards the Ethereum use case but we also have Waku and want to improve the performance there as well.

Deciding based on user count (simulations?)

Q: When we look at our protocols and optimizing for high user count, intermediate and concurrent users, how would we determine a happy place for our protocol to perform adequately? Are we able to simulate?*

A: Generally we use simulations, for example in the “Message Staggering” (slide 21), we simulated by ranging from 2000 to 12000 peers, so the answer would be yes, we can simulate for a range of peers that we expect to have (start from lowest, aim for highest possible). The biggest constraint is the available machines, because simulations take a lot of resources, so that a 1MB message with 10000 peers produces multiplicative resource allocation for the machines. We have already simulated for up to 12000-14000 peers - bigger ones can be carried if the hardware constraints are solved. For the network size we can define a lower and upper range and then do the runs for all the possible variations.

Adding to this, we have already done some simple mathematical models to just see what might be the network sizes that the GossipSub can support - there is a scaling limit for unstructured peer-to-peer networks which we just might not be able to get beyond and we simulate within these ranges to have a sensible idea about it.

Good math can also be done from slide 10 - Performance Improvement Measures. Basically these are the transmission rounds - if we have a number D of peers we can imagine that a network rises from 1,000 to maybe 1 million by putting this 1 million here we can find out the number of transmission rounds. So the message gets disseminated to the network. The only problem is the number of rounds increases and the time for that message to reach the entire network is actually this number of rounds multiplied by latency. If messages get transmitted in one transmission round or as many RTDs are needed. You can, for simplicity sake, consider the latency just for one round and have a ballpark idea of what is going to happen.

The other option is you can increase the workload of the peers as the network grows. You can also try raising this number but obviously there has to be a limit on this workload. So this multiplied by the delay can give you the idea and then you can consider what is the latency available to you the average determinant latency just to be able to find out how much time it will take for that message to reach the entire network. So mainly it’s a tradeoff. We have shell simulations and we have runs in VacLab which use Kubernetes.

Differences between messages

Q: What is the difference between the IAMRECEIVING and IDONTWANT - shouldn’t they behave the same at the beginning when you start to receive a message ?

A: IDONTWANT is actually a confirmation saying that okay I’ve received a message. The only flaw of not using IDONTWANT as I’m receiving is maybe someone else is sending me a message and maybe that malicious peer can maybe stop sending the message or something happens and I don’t get to receive that message fully in the case that message is important - that is one problem. The other problem is IDONTWANT message is in specs and that is already being rolled over to Ethereum mainnet also, so the community might not want to change this matrix because that can delay that rollover also. That is the biggest difference. So IDONTWANT is a confirmed way of saying that I received the message and IMRECEIVING is kind of saying that most likely I’ll get that message but if I don’t send an IDONTWANT to you within that amount of time you can always resend that message to me. So IMRECEIVING says don’t cancel or just defer sending to me and IDONTWANT says simply cancel sending to me.

Regarding IDONTWANT and IMRECEIVING they’re semantically different but you could convey this information in the IDONTWANT to and I wrote a comment in this PR to explain it. It’s just a different way to get to exactly the same improvement, but it has its own, disadvantages. So, it is ultimately just a preference thing.

Splitting large messages

A: We can split the large message into multiple fragments and the bigger the number of fragments the more latency is saved in case of bandwidth we get to see very similar results. In fact, fragmentation gives you more bandwidth utilization because of those additional headers that are involved in the network. Also just to discuss fragmentation has certain downsides as well. For example, it allows that opportunity of exploitation for some non-conforming peers. I may never relay last fragment of a message and since other peers are also relaying that message within the network - everyone will have unfinished downloads in their receiving buffers and those buffers might get overwhelmed after some time if a large group of peers starts doing these kind of actions.

Also, if a full message is not valid and I’m sending fragments of the message, every receiver can relay that fragment to the next peers. it can only be validated once it’s received in the form of all fragments by the receiver. So maybe a false message can also get propagated or maybe a spam can be introduced to the network in the form of fragmentation. So these are two exploitations that are possible with fragmentation. So that can be considered.

Leonard might have meant splitting the message in a header and a body not this fragment.

Q: So when we were working on the layoffs protocol in IOG and just a bit of background we got the concept of ranking blocks and input blocks where you had the headers and you broadcast those so that you sent the critical information first and you got that broadcasting going and then when you had lesser traffic you sent the smaller blocks and ultimately got confirmations. It’s not dissimilar because our problem is we’re trying to get the optimization for sending messages and I like the idea of the IWANT versus IHAVE - just trying to work out how do we get it. We’ve lots of data which is superfluous to lots of users but because it’s decentralized we’re having to broadcast via your mesh First of all smaller messages, as we all agree I think, are optimal and transmission rounds - that I’m just trying to work out - is there a happy place we can simulate that allows us to slice that big message which for all intents and purposes could be a message, a video message? Also, it could be even preferences and they’re totally unimportant to a lot of people but because we’re decentralized we have to broadcast to everybody. Just wanted to say we can slice it up into pieces I think you said yes and then the question is how important is it that all the pieces arrive it’s important to that one person who wants to pursue this preferences. Other than that, it’s not important. So maybe we look at the concept of how do we mark those blocks so that if I’m not interested in it, I kind of don’t care about it. We’re trying to work out if you have a situation where you’re packaging a protocol and you might be handing it to an end user - you really want to know what the limits of it are.

A: One thing that comes to mind is if you can spread the header within the network and every receiver knows the IDs of the messages that it will be needing - there can be two options. I want to fetch the exactly wanted ids and other much quicker mechanism can be the unwanted ids. You can simply send IDONTWANT for that. So your mesh operation that continues in the similar mechanism and the ones for which the IDONTWANT was not transmitted they can be eagerly pushed through the mesh.

And then regarding mesh telling that somebody’s not interested in specific parts this would be more like a layer above GossipSub, which would be agnostic to that but Waku could do that on top of gossip sub.