IFT Research Call, March 12th 2025 - Waku Sync

Phil · March 14, 2025, 4:28am

This is a transcript of the discussion after the call. Feel free to continue the discussion in this topic.

Efficiency tests

Q: Did you do tests regarding efficiency of using sync vs. filter for this purpose?
A: Not yet, for now it’s just an idea for now to either supplement light push and filter with Sync. Replacing filter with sync would require significantly more testing.

Store sync

Q: And regarding stores, is it already used for store sync?
A: It will be deployed on some of the test fleets soon. But the testing is still in progress to make sure everything is fine. That is, for the purpose of syncing store nodes in the network.

Set reconciliation in Bitcoin

Q: Have you looked at set reconciliation in Bitcoin? As far as I’m aware they have similar problems when syncing recent transactions in the mempool. They developed a library called “mini sketch” that does something similar to what Waku-sync does - are there any differences? This problem has been somewhat solved in Bitcoin through this library.
A: Not familiar with that library, apologies. But will take a look at it.

Picking users to send sync requests

Q: How do peers pick another user to send sync requests? Is it chosen randomly?
A: For now it is chosen randomly. You could design a more elaborate strategy but we decided not to for now.

Need of using RLN

Q: Do you need to use RLN to use the sync protocol in a store? Do you need to have RLN enabled in order to sync protocol.
A: That is a good question. One caveat I haven’t mentioned is this: since the sync propagates messages in the network, if you don’t verify the RLN proofs of the messages, you can kind of bypass RLN. So yes, but currently in the implementation we don’t verify the RLN proof. So in conclusion, not implemented right but we will start working on it. The plan is to need RLN in the future.

Q: If you use it for stores, would you have store-only messages that have already validated the RLN proofs?
A: Yes but you would have to validate the RLN proof of the message that you receive from the sync. So you receive messages via relay, you verify it, and then you receive via sync you would also need to verify.

Q: So basically, in that case, store nodes could spam themselves. How would the membership then be different for store nodes synced between them?
A: If the store nodes are permissionless you would have to probably keep RLN roots for some time and then you would have to verify the proof with the older roots.

D1: And then you would also need to have the context such as sync versus relay because you need to have two states for verifying the proofs. There would be a nonce to make sure you cannot replay but in this case a single replay would be allowed because you might get the same message via sync and via relay, which would be ok and not considered spam.
D2: If you already have the message you would not receive it again via sync - it’s only the message you missed, I guess.
D1: It would be like that yeah, because you asked for that range.
D1: As you said you would have to have old safe rules. Definitely interesting to look at and explore.
D2: There is also the problem of having the older roots, such as how would you have them? That’s also a problem.

Possibility of attack

Q: I am not so familiar with RBSR - but is there any kind of malicious actor that we need to consider ? Because if there is a malicious group and the reach is measured in size they can convince an honest node as to having wrong data. Is there any concern here?
A: So the protocol is pair wise, meaning you only sync with one peer at a time. How would you define wrong messages - that is done by RLN.
Q: RLN protects the spam right? It can create wrong data in the non-selection conditions.
A: Yes, one attack that is possible is crafting specific messages so that the fingerprinting function gives you a wrong answer. But since Waku messages require an RLN proof, the design of the fingerprinting can be a bit simpler than usual, since usually you have to defend against that kind of attack. But we decided to not really build any measure against it because of RLN.
Q: I just thought that maybe the peers sent the previous information to other parties. Does it help in minimizing this kind of things? For example, user 1 sends the append of the previous message to other parties and it’s minimal cause to inform, such as ok, I get the sync with this other party and it carries certain information about the previous sync.
A: Would that be useful in selecting the peer you sync with? For example in exchanging the messages to complete the syncing? You kind of trust the total order and the fingerprinting function. You don’t necessarily have to trust the other peer.
Q: But for example if there is a mismatch in fingerprints, who is deciding the missing data? Me or the other peer?
A: By exchanging the payload that you need to find the differences, you kind of reconstruct the set of the other peer so both nodes end up with the same differences. You can send the wrong information but then you would just have less differences which makes the malicious party receive less messages, which is not really an attack per se.

Sync with multiple peers

Q: Can I ask why do you need to sync once at a time? Because intuitively I would say that if you are syncing to nodes, I don’t see a reason why you couldn’t sync let’s say three at a time and that would maybe speed up the process.
A: Actually you are right. The simplest way to thing about it is that you sync two peers with each other, but there is no reason why you couldn’t sync with multiple peers at the same time. And one thing that’s kind of interesting is that if I’m syncing with peer A and peer B at the same time, if I receive a message from peer A and then add it to my own storage while I’m still syncing with peer B, the syncing of peer B will pick up new messages. If you look at the recursion, if you do a full sync range with peer B but with peer A you are the first half, then you find the differences and send them, then you do the same with peer B that new message will be incorporated in the storage, you will find a difference if peer B doesn’t have that message. The same goes for more than two peers at the same time - the logic remains in the same way. Meaning that between steps messages can be added or removed.