Large Message Handling in GossipSub

ufarooq · November 8, 2023, 2:18pm

Discussion: Large Message Handling in GossipSub

The need for efficient large message dissemination is essential for GossipSub, as in the case of EIP-4844.

Message delivery time is approximately \tau = \tau_{tx} + \tau_p , where transmit time \tau_{tx} = \frac{S}{R} \times H and \tau_p = Propagation Delay , with S, R, and H being the message size, available data rate, and number of hops. This implies that message delivery time increases significantly with the increasing message sizes.

We implemented the IDONTWANT message proposal along with the following three approaches to achieve considerable performance improvement for large message dissemination.

1. Message Fragmentation

Message fragmentation allows the partitioning of a large message into smaller fragments. Fragment transmit time drops to \tau_F = \frac{\tau_{tx}}{n}, where n represents the number of fragments. Message relaying requires a store and forward process. However, relaying a message in the form of fragments allows parallel relaying of already received fragments by the intermediate nodes, while the sender is still transmitting. This reduces overall message transmit delay to \tau_{tx} =\frac{S}{R} \times \frac{2H-1}{n}

2. Staggered Message Sending

On receiving a large message, every node sends an IDONTWANT message to its peers to stop receiving multiple copies of the same message. After that, we stagger the sending process by adding a random delay between transmissions to allow reception of already transmitted messages/IDONTWANTs by the peers. We use staggered sending with and without message fragmentation.

3. Reduced Message Sending

On receiving a large message, every node forwards it to randomly selected D_{Low} - 1 peers and sends an IDONTWANT message to the remaining peers. Any missing peer can use the IWANT message to fetch the missing message. We experimented reduced sending with and without message fragmentation

Performance Comparison

The following results are achieved by using the shadow simulator for a 1000-node network, with each node having 100Mbps bandwidth and each link having 100ms propagation delay. The message size is 50000 bytes, and the number of fragments is set to 4.

Parameter	Current state	Fragmentation Only	Stagger Only	Stagger with fragmentation	Reduced D Only	Reduced D with fragmentationg
Total Volume Including Control Traffic (MB)	2673	2804	2628	2645	2271	2245
Data Volume (MB)	2570	2692	2528	2532	2183	2154
Max Latency (ms)	1542	728	1686	997	1838	824
Avg Latency (ms)	525	437	597	612	571	504

arnetheduck · November 10, 2023, 10:04am

Message Fragmentation

A critical point to remember with message fragmentation is that the fragments must remain verifiable - messages on ethereum have a particular shape that is covered by signatures and cannot be split up or forwarded partially without also altering the protection mechanisms that prevent spam / DoS.

In fact, this is a critical point to introduce early in any simulation: only verified traffic must be broadcast and verification itself introduces latency (on the order of 1-200ms per hop depending on conditions) - it also places constraints on how fragmentation and optimistic sending can happen.

arnetheduck · November 10, 2023, 10:07am

Another point to remember is flood publishing, ie the publishing of peers outside your mesh on the first hop - I don’t see it represented in these numbers or discussion, yet it is critical for understanding dissemination: if the first hop takes too long, this delays the entire process significantly.

ufarooq · November 14, 2023, 1:56pm

Many thanks @arnetheduck for highlighting this. For current evaluations, application-level fragmentation is used. But yes, tradeoff between verification/authentication costs and time gains should be considered.
Alternatively, application-specific decisions can be incorporated, or application-specific fragmentation decisions can be introduced?