Desktop requirements for WakuV2

rramos · October 22, 2021, 10:57pm

It’d be useful to understand more precise requirements from e.g. Desktop to prioritize work here.
The current state of Waku v2 peer discovery - #3 by oskarth

For proper usage of WakuV2 in desktop I require guidance in the following topics:

Should we support DNS Discovery for bootstrapping nodes? If so, what kind of nodes should we expect to be returned by this?, my concern is that dns discovery only returns the node multiaddresses, but then I’d have to rely on the identify protocol to know the node capabilities.

Currently we hardcode the nodes in the configuration in different categories so we know before starting status-go the protocols supported by a node, but to achieve something similar with DNS Discovery we’d need that the url had something to identify the type of node. i.e. store.test.status.im, filter.test.status.im
Are there plans to support rendezvous in nim-waku? for mobile devices I understand we need a lightweight discovery mechanism, so I implemented in go-waku the rendezvous protocol with multiaddresses instead of ENRs, as well as enabling peer exchange for gossipsub.

I read somewhere that DiscV5 was being worked on nim-waku. While I see this discovery mechanism being used in desktop succesfully, would it be the same case for mobile devices? I’m worried about bandwith restrictions.
What’s the latest status on peer selection? Currently selecting a peer is done by asking the peerstore to return the first peer it has that supports a protocol, but IMO a ‘smarter’ peer selection algorithm is needed, specially If we want to use Lightpush and Filter protocols.
Should we only use waku-relay? status-go and desktop support the usage of filter and lightpush when running in light client mode (just selecting the first node that supports these protocols). Since status-go is shared by both desktop and mobile, and filter and lightpush are still in Draft, is it acceptable if mobile uses relay only?

rramos · October 22, 2021, 10:57pm

Production readiness for filter protocol, I discussed this topic with Hanno before. The following items are still pending:
- Filter will keep attempting to push messages indefinitely until a client node unsuscribes or a node restarts, even if the node client goes offline. Something more sophisticated is required, so the nodes are able to drop the peer once it’s determined that a client node is unreachable (either time based or via a limit on the total number of subscriptions)
- 12/WAKU2-FILTER: Failed Nodes and the filter protocol · Issue #253 · vacp2p/rfc · GitHub
- 12/WAKU2-FILTER: ACK · Issue #207 · vacp2p/rfc · GitHub

rramos · October 22, 2021, 10:58pm

Production readiness for store protocol
- Settings to control the number of days a message can be stored as well as controlling the memory usage of a node
- Improvements to `store` memory and disk management · Issue #702 · status-im/nim-waku · GitHub
- 13/WAKU2-STORE / 14/WAKU2-MESSAGE: Allow messages to be marked as "don't store" · Issue #441 · vacp2p/rfc · GitHub (a nice to have)
Production readiness for lightpush protocol what’s missing to mark this protocol as stable?
Suggest a strategy for using lightpush vs relay for publishing messages. Currently desktop and mobile in waku v1 rely on message confirmations that IIRC are generated by mailservers. Lightpush should help with this, however, I’m not sure of the metrics I should use to determine if I should send a message via lightpush, or send it via relay. A peer selection mechanism is also necessary for choosing which lightpush node to use.

haelius · October 25, 2021, 1:18pm

Thanks, @rramos. This is a great summary and guideline for the direction we are (should be) heading in with Waku v2.

Some thoughts/comments on the current state:

Should we support DNS Discovery for bootstrapping nodes? If so, what kind of nodes should we expect to be returned by this?

Yes, currently in nim-waku we have the soft assumption that the DNS-discoverable nodes support all major protocols. This is a bad decision, of course, especially since we want adaptive user-run nodes where people can pick and choose what they want to support. See RFC: Capabilities advertising · Issue #429 · vacp2p/rfc · GitHub for more on the problem of capabilities advertising. This is an NB WIP for us.

we’d need that the url had something to identify the type of node. i.e. store.test.status.im, filter.test.status.im

Agreed. Different subdomains for different protocols is a good start and very easy to implement. That would still leave the details of the capabilities to be negotiated (e.g. if store, which content topics?, how long back does stored history stretch?,etc.). See again RFC: Capabilities advertising · Issue #429 · vacp2p/rfc · GitHub.

Are there plans to support rendezvous in nim-waku?

Not on the immediate roadmap, but certainly not out of the question. The advantage is that it’s very resource-efficient, while the disadvantage is that it will have to be written from scratch whereas things like discv5 and peer exchange will be relatively easy to add. The way I see the road ahead for discovery mechanisms: start work on discv5 (depending on how suitability experiments go, see below), then peer exchange and then rendezvous. Depending on results of these efforts we’ll prioritise the “capabilities exchange” problem in parallel.

I read somewhere that DiscV5 was being worked on nim-waku. While I see this discovery mechanism being used in desktop succesfully, would it be the same case for mobile devices? I’m worried about bandwith restrictions.

True! The current POC proposes a method whereby an optional discv5 component can be externally started/stopped depending on resource availability. How well this will work remains to be seen, though it’s certainly not ideal for mobile devices. The latter will require at least DNS discovery in combination with something like peer exchange to become aware of other nodes. On top of that a capabilities discovery method is lacking.

multiaddresses instead of ENRs

Interested to know if https://github.com/vacp2p/rfc/pull/465 will have an effect on this decision?

What’s the latest status on peer selection? Currently selecting a peer is done by asking the peerstore to return the first peer…

Good point and we didn’t have a good issue tracking this. Have created one here. Also see this issue about peer reputation which may help with ranking peers here.

Should we only use waku-relay?

Other than the production-readiness issues for filter and the fact that it’s less tested than relay, I don’t see a reason why mobile could not optionally use filter and lightpush? We may want to address some of the production issues in the current spec before moving to stable, but I think this is feasible in the medium term if prerequisite for use in mobile.

haelius · October 25, 2021, 1:42pm

Thanks! To keep track of the overarching effort I’ve created a tracking issue: 12/WAKU2-FILTER: Production readiness · Issue #469 · vacp2p/rfc · GitHub

haelius · October 25, 2021, 1:45pm

Production readiness for store protocol

Thanks. Although the store dimensioning is orthogonal to the protocol itself, we should certainly write a separate RFC with a recommended minimal set of dimensioning settings that each client implementation should support. Tracking issue: 12/WAKU2-STORE: Production recommendations · Issue #470 · vacp2p/rfc · GitHub

Production readiness for lightpush protocol what’s missing to mark this protocol as stable?

Although not extensively dogfooded, I think it could likely be moved to stable. cc @oskarth ?

iurimatias · October 25, 2021, 4:00pm

@haelius regarding the following items, what timelines & ETAs can we expect for each? and which ones (if any) would prevent using waku v2 in production?

peer selection recommendations
Filter will keep attempting to push messages indefinitely until a client node unsuscribes or a node restarts, even if the node client goes offline. Something more sophisticated is required, so the nodes are able to drop the peer once it’s determined that a client node is unreachable (either time based or via a limit on the total number of subscriptions)
12/WAKU2-FILTER: Failed Nodes and the filter protocol · Issue #253 · vacp2p/rfc · GitHub
12/WAKU2-FILTER: ACK · Issue #207 · vacp2p/rfc · GitHub

haelius · October 27, 2021, 11:49am

@iurimatias, in terms of necessity for production:

peer selection recommendations

IMO this should not be a blocker, though @rramos may have specific and essential requirements in mind here.

Filter will keep attempting to push messages indefinitely until a client node unsuscribes or a node restarts, even if the node client goes offline. Something more sophisticated is required, so the nodes are able to drop the peer once it’s determined that a client node is unreachable (either time based or via a limit on the total number of subscriptions)

This is necessary for production readiness, though not dogfooding. A minimal solution here should be easy and fast to implement (a day or two). I don’t think this necessarily needs to be specified and can be implemented according to client’s needs. That said, adding a RECOMMENDATION to the RFC will tie up any loose ends and ensure consistent implementation.

12/WAKU2-FILTER: Failed Nodes and the filter protocol · Issue #253 · vacp2p/rfc · GitHub

12/WAKU2-FILTER: ACK · Issue #207 · vacp2p/rfc · GitHub

IMO should not prevent filter from being used in production, or at least for dev tests across mobile/desktop using filter to start. That said, these will significantly improve filter reliability so I’d like to see them prioritised for Q4 (note we have a fairly heavy dependency on available manpower, production hires, etc.)

On a further note, I think the store issue here is a production requirement. Recommendations for store dimensioning is not essential, but will benefit consistency between implementations.

haelius · October 29, 2021, 10:26am

In order to keep the conversation going from a Vac/Waku v2 POV, I’d like to summarise the important points from the conversation yesterday.

A. What we are doing first:

End-to-end testing/dogfooding of Waku v2 in Desktop and Mobile.
This:

uses the existing test fleet
excludes bridging to Waku v1, for now
focuses on relay and store, for now
requires a discoverable fleet, but since there exists a go-waku test fleet which is discoverable using rendezvous is not blocked by this point
is (was?) only blocked by some ongoing debugging re connectivity in Mobile

B. Where we want to end up:

End-to-end integration of Waku v2 in Desktop and Mobile in production
This:

uses the (nim-waku) prod fleet for bootstrapping, for now
includes bridging to Waku v1
relay, store, lightpush and filter stable and tested. lightpush is especially important for publish confirmations on Mobile and other clients with short connection windows.
requires a discoverable fleet for bootstrapping, with more and more discovery mechanisms added to encourage decentralisation and user-run nodes

What is missing to get from A → B?

We’ve roughly agreed that we’ll move from A → B in an incremental, dogfooding-based approach with clear goals set out for the next step as we go along.

1. Deploy Waku v1 - v2 bridging

Their exists a nim-waku bridge implementation which can be deployed for any v1 and v2 networks. Once this is necessary the item can be actioned (the nim-waku team can help with config/deployment here). It’s also possible to run your own bridge node(s) locally.

2. Discoverable bootstrap nodes

It’s more than likely that our own prod fleet will be used for bootstrapping for some time. nim-waku (and any other registered) nodes are discoverable using DNS discovery, but this is not yet supported in status-go. status-go does support rendezvous protocol, but this is not yet supported in nim-waku.

@rramos, I have two questions here:

Given that DNS discovery is integrated in go-waku, what would be the technical effort to support it in status-go?
For bootstrapping, is there anything that would be missing from DNS discovery?

3. Support for `lightpush` and `filter` on Mobile

Afaik there are no blockers from Waku v2’s side for dogfooding to start on these. There are multiple possible improvements for filter (e.g. ACK, failure handling, etc.) which we should prioritise for stability’s sake. Larger scale dogfooding may highlight more issues.

4. Progress on message reliability issues

From Waku’s POV store improvements/fixes, e.g. this one is NB here.

rramos · October 29, 2021, 1:03pm

Given that DNS discovery is integrated in go-waku, what would be the technical effort to support it in status-go?

For bootstrapping, is there anything that would be missing from DNS discovery?

The technical effort depends on what’s the expected usage of DNS Discovery within status-go? Currently we have a configuration with the following items:

ClusterConfig: {
  RelayNodes: ["/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234" ],
  StoreNodes: ["/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234" ],
  FilterNodes: ["/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234" ],
  LightpushNodes:  ["/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234" ],
  WakuRendezvousNodes:  ["/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234", "/ip4/127.0.0.1/tcp/1234" ],
}

When you login, status-go will

automatically connect to the RelayNodes, if you are using the default configuration
ping all the StoreNodes and use the one with the fastest reply time to request message history.
If you are using the light client functionality, select a FilterNode to subscribe to messages, and select a LightpushNode to publish the messages
Periodically ask a random WakuRendezvousNode for peers

Since status-go does not automatically connect to all the multiaddresses from fleet.status.im, I have this categorization in the configuration that lets status-go choose specific multiaddresses depending on a need without having to first connect to a peer, wait for identify protocol to return the list of protocols supported by a peer, before being able to use it.

If we use DNS discovery, should status-go then dial all the peers returned by DNS discovery, and wait for Identify Protocol to run before being able to know what a peer can do? or how should I use this protocol?

haelius · November 1, 2021, 3:59pm

Thanks, Richard! I think DNS discovery can i.a. be used to populate this configuration. Its best use is to find a list or, in your case, lists of available bootstrap nodes. Of course, sophisticated capability discovery is still TBD, but one way to address your immediate problem is to publish separate lists to separate subdomains for each protocol (and for rendezvous), as I think you suggested earlier. We could also assume for now that the production node DNS list support all major protocols. This is not a clean solution, but is similar to how we currently use fleets.status.im where we also don’t make protocol distinctions.

fryorcraken · December 20, 2021, 12:19am

Some comments from a js-waku POV:

rendezvous: I need to dig dipper in this protocol but it seems to be a good candidate for browser env. I think it makes better to integrate it in js-waku once work has started in nim-waku

protocol selection: I haven’t implemented Waku Filter in js-waku. As it seems to be light client friendly I am tracking the work here: Implement Waku Filter Protocol · Issue #290 · status-im/js-waku · GitHub
At this point in time I position relay vs filter/light push as:

relay more anonymous but may be less reliable (until we get better connection management)
filter/light push: expected more reliable but you disclose to the peer that you are the sender/listening on given topics

Ideally js-waku should offer or document strategy to use one or the other or even switch depending on condition (.e.g connectivity). This is something that is further down the road.

DNS discovery:

I am still working in getting DNS Discovery in js-waku.

Regarding different subdomain for each protocols. I believe it reduce accessibility to operators.
Ideally we want an operator to have a domain name that indexes nodes of their fleet and their domain name to be added to ours so that we can have as many, decentralized, nodes as possible via DNS Discovery.
However, if instead of setting up one domain to an operators, we ask them to setup 4, it makes the barrier of entry higher.
Especially that at this stage, a platform operator does not have a lot of incentive to add their nodes to our discovery domain.

I would suggest using one domain for discovery and using the wakuv2 field in the ENR to describe the capability, as already done by @haelius.

Nodes could then browse the ENR tree and select peers based on their capabilities.