Waku v2 discv5 Roadmap Discussion

ksr · March 1, 2022, 2:53pm

A first basic version of the selectable protocol-id implementation (issue, PR)
has been completed and is ready for beta testing.
We already had a successful interoperability test with go-waku; thanks @rramos :).

This issue comprises comments with important feedback for the Waku discv5 roadmap, which I copy here to move the discussion to a central place:

@arnetheduck:

leakage into the main discv5 network (or other discv5 networks)

what is the downside here?

basically, a core feature of any discovery protocol is its ability to withstand attacks - a large number of nodes serving the data is one of the ways to achieve this - on the other hand, a false positive seems nearly harmless - you open a connection to that node, note that it didn’t support waku after all, and disconnect - it’s “almost” a no-op.

another way to say the same thing: running a waku-specific discovery network as well as publishing on the “main” discovery network as well as running DNS discovery etc is the way to work around as many types of obstacles as possible, in the interest of securing a wide selection of peers no matter the network conditions

@ksr:

leakage into the main discv5 network (or other discv5 networks)

what is the downside here?

basically, a core feature of any discovery protocol is its ability to withstand attacks - a large number of nodes serving the data is one of the ways to achieve this - on the other hand, a false positive seems nearly harmless - you open a connection to that node, note that it didn’t support waku after all, and disconnect - it’s “almost” a no-op.

Thank you for the feedback. I agree @arnetheduck. I assume the overhead associated with leaking would be quite low; especially if we exclude mobile nodes from discv5. The strong argument for having a separate discovery network is query efficiency see Waku v2 discv5 Roadmap. I should have stated this more clearly in this issue.

Assuming Waku is part of the Ethereum discv5 network:
The fraction of nodes supporting Waku within this network is small, which leads to a needle-in-a-haystack problem.
Each random node set returned from a query contains Waku capable nodes with a certain probability which might be very low.
This problem gets more significant if we want to introduce capability discovery in the future.
Queries are basically executed as random walks, not leveraging the O(log(n)) hops structured overlays offer.
For queries satisfied by a large number of nodes this is OK; but more specific queries would be inefficient.

Filtering Waku nodes via ENR before inserting them into the routing table would still not solve this problem, imho.
This would only help significantly, if these Waku supporting nodes were stable. But these stable nodes would be stable in a separate network, too.Assuming a respective churn rate, this would still converge towards random walk efficiency.

With regards to attacks, I agree that being part of the Ethereum discv5 network mitigates eclipse attacks; but at the cost of overlay routing efficiency that structured P2P overlays offer.

another way to say the same thing: running a waku-specific discovery network as well as publishing on the “main” discovery network as well as running DNS discovery etc is the way to work around as many types of obstacles as possible, in the interest of securing a wide selection of peers no matter the network conditions

We could think about using both Waku2 discv5 and Ethereum discv5. Following the adaptive nodes idea, nodes could choose to (1) not take part in discv5 at all, (2) be part in Waku2 discv5, and (3) be part of both Waku2 discv5 and Ethereum discv5 (maintaining two separate routing tables).
In case quicker discovery methods are exhausted, stronger nodes can walk the Ethereum discv5 network and search for Waku2 supporting nodes. If they find Waku capable nodes, they can insert these into the Waku2 discv5 routing table.

If there are no strong objections, we could go for the separate Waku discv5 only option first, and after thorough testing and dogfooding, decide which route to go.

Any opinions on this?

@arnetheduck:

The fraction of nodes supporting Waku within this network is small, which leads to a needle-in-a-haystack problem.

It’s not so much of a needle-in-a-haystack problem, as a needle problem It’s clearly the case that lookups will feel more snappy in the case where only waku nodes populate the system, but it also renders the setup easier to shut down - even telegram for example supports multiple discovery methods, including via sms.

Hence, it makes sense to make room for multiple discovery strategies until the needle has become a football - when you reach the football size, what should the lookup situation be?

This would only help significantly, if these Waku supporting nodes were stable

Distributed systems typically come with stable bootnodes - one way of getting an initial set of waku nodes more quickly is to tweak these bootnodes to deliver “more” waku nodes than other nodes

But these stable nodes would be stable in a separate network, too

In a separate network, if the stable nodes are taken down, there aren’t many options on the table - if instead the information is also disseminated to a wider network, it becomes more difficult to shut it down, mainly because you can no longer selectively shut down one network and not the other.

This is often the case with communications systems: you want to shut down the chat, but if that means also shutting down ethereum, the economics are different.

Basically, the same mechanism that makes it trivial for waku nodes to run a separate network is the trivial network rule you need to put in your firewall to make it not work.

I would generally consider this an important property to bake into the design early, when building a resilient system, and the discovery process is the first link in the resiliency chain.

@kdeme:
It is obviously better to have and use only one discovery network, but it needs to be usable for Waku node lookups.
I think perhaps one valid critique is that this has not really been assessed in practice? Or has it?
The Ethereum discv5 network consists of +10K nodes. How many Waku2 nodes would be running at start? How long would it take to find them? Is that usable (on mobile)? What when combined with the other discovery methods? I don’t have a good view right now on the state of the Waku2 project in “production”.

It is also important to think well about what brings which security guarantee exactly.

Filtering nodes before adding them to the routing table will drop your “1 network” security completely imo. It would become much easier to eclipse your routing table, especially with a low amount of non malicious Waku nodes.
This is the reason why I have so far prefered the clean separation, at least it is clear then.

Having one discovery network also doesn’t set you free from eclipses on the next layer. When discovery can find very few nodes, you will need to have other measures in place (which you ideally have anyhow). For example, if you don’t have a good incoming connection limit set on libp2p, and outgoing connections are barely happening due to slow discovery, eclipse becomes easier.

Or, if one queries for nodes from a bunch of nodes in the routing table. While all will/might return nodes, most will get filtered out on the waku ENR field. I think it is clear here that makes it more vulnerable to one malicious node returning a lot of Waku nodes, compared to the others. One would typically have to do something like sort all returned nodes based on target distance and keep only the n closests (which is what a lookup normally does).

Anyway, I assumed that the process would be to, for now, use a separated network, as the main network is unusable (however is it really?), to eventually move to the main discovery network when either there are enough nodes, or there is discovery topic registry implemented, or both. However, there is the risk that moving from discovery network at a later stage is something that might not happen, ever.

Distributed systems typically come with stable bootnodes - one way of getting an initial set of waku nodes more quickly is to tweak these bootnodes to deliver “more” waku nodes than other nodes

This could be a good initial help, until there are more nodes (It does feel a bit like a hack though). It adds extra reliance on the bootstrap nodes (centralization), but might be the lesser evil compared to a new network? However, I’m unsure on how to implement it atm, one would have to be careful not to open up a possibility of eclipsing the bootstrap node. (Node filtering should happen on the outgoing data, not the incoming).

Another point that is differently from eth2: there is no need to continuously find new nodes (that is, because of the subnet walking in eth2). So once you have a decent set of nodes, traffic on discv5 would be just to maintain the routing table.

@jm-clius
Another perspective is that the Waku v2 integration effort in Status will soon need a general discovery mechanism that will work across multiple clients, but with a very small number of production nodes at the beginning. The separated network seems to me to provide the most practical first step with the least amount of uncertainty to achieve this, while we investigate how usable discv5 over the main network would be. Agree that this will need active prioritisation from our side to ensure this does happen (or that we at least have experimental backing of our assumptions).