How to sunset go-waku

fryorcraken · August 2, 2024, 6:32am

As part of the 2024 roadmap, Waku’s team scope increased, leading to more work in the status-go codebase. As a result, when setting milestones and deliverables, a directive was proposed:

Code to edit should be extracted: to keep the status-go/go-waku boundary in line with the organisation, code in status-go that needs to be modified to execute the milestones, should generally be extracted from status-go and replaced with an API call.

However, since the work has started, this seldom happened, with the exception of filter management being moved from status-go to go-waku. It is now time to re-engage the discussion on code architecture to ensure alignment on the strategy to adopt.

Goals

The current goal is to make Status’ usage of Waku reliable and scalable, which includes improvements in status-go, go-waku, and nwaku; as well as defining and implementing new protocols and libraries, such as the e2e reliability protocol.

In the context of this goal, it becomes clear that refactoring status-go and replacing APIs would not yield immediate significant value. I believe this is one of the reason why we have not followed the directive.

Other contributing factors may include:

The complexity of the task
The high code churn, which hinders our ability to define new APIs with confidence

These challenges make it difficult to justify the effort required for such an extensive refactoring and API replacement process.

The next deliverable on the roadmap is the replacement of go-waku with nwaku.

This deliverable will reduce the number of clients to maintain and enable Logos/IFT directive on usage of nim.
It also enables the bundling of the work done for Status in nwaku and offering it as a single native SDK for external projects.

When planning milestones and deliverables, a balance must be struck between:

Speed of delivery for each milestone
Total effort

There are two extremes to consider:

Each goal is considered in isolation, potentially leading to technical debt being introduced with the first deliverable, increasing total effort.
Only considering the end goal, pushing the delivery date of the first goal closer to the third goal, meaning postponing the delivery of Status reliability.

By continuing to use go-waku for now, we have leaned towards option 1. The decision was made to deliver significant improvements to Status reliability first and foremost, drawing a line in the sand at RLN in terms of sun-setting go-waku.

Work highlights

To replace go-waku with nwaku in the Status application, the following work needs to be done (in no strict order):

Understand and establish best practices for using nim in a Golang context, including go-routines, queues, etc.
Move peer-to-peer reliability strategies from status-go to go-waku/nwaku:
a. Regular store query
b. Store query after sending a message
c. Disconnection detection and automated store queries
Improve and implement light client logic in nwaku, such as filter management and general client-side behavior of req-res protocols, including reliability mechanisms
Move e2e reliability strategies from Golang to nim/nwaku.
Understand how to bundle nwaku in Android and iOS Status mobile apps.

The proposed approach is to use nwaku first in Status Desktop, creating a special relay-only build. This will allow for early dogfooding without waiting for the completion of tasks 3, 4, and 5.

(3) can then be completed by adding light client mode to the special Desktop build and making nwaku the default client in desktop. At this point, go-waku will no longer be used for relay or desktop purposes.
Finally, (5) can be accomplished to fully sunset go-waku.

During discussions with Waku Golang engineers, it is clear that the most significant portion of work revolves around (2).
I can see the challenges being multifaceted. However, my expertise with status-go is limited, and I would appreciate any input:

i. Defining the interface between status-go and Waku

This entails defining an API and a contract between these two domains. It is crucial to consider how we envision the ideal Waku SDK API and ensure it fits within status-go.
I designate this as a distinct item due to its importance, where the boundary between Waku and Status (or other applications) must be clearly defined.

ii. Implementing the defined interface between status-go and Waku

This is the largest chunk of work, requiring status-go to be refactored to accommodate the new API.

Ordering

There are several constraints and possibilities when it comes to ordering the work for replacing go-waku with nwaku:

Any logic moved from status-go to go-waku must be present in nwaku before the replacement. This means that more logic shifted to go-waku now creates potential blockers for the decommission.
Implementing an API does not necessarily require implementing it directly in nwaku; an API can be defined with code relocated elsewhere within status-go, before being moved or replaced.
To enable early dogfooding, go-waku will initially continue to be used on mobile devices. This means that any code pushed to nwaku must exist in Golang form. However, this does not necessarily mean that the code should be present in go-waku; it can remain in status-go with a build-time switch.
The previous point also suggests having an abstraction layer that allows for swapping between nwaku and go-waku at build time.

By considering these constraints and possibilities, we can devise a more effective plan for replacing go-waku with nwaku while minimizing potential blockers and ensuring smooth progress.

Strategy

Another way to present the two extremes is:

Accumulate so much technical debt to deliver reliability (a) quickly that the next steps (b) and (c) become extremely difficult.
Spend so much time and effort avoiding technical debt that reliability (a) does not get delivered within the original proposed timeline.

To avoid this, a plan must be defined to tackle the work by breaking it down into small achievable chunks. It is also best if each new interface used in status-go should provide direct value to the product and customer.

A potential solution would be to set a series of deliverables defined as follows:

Feature X is available in Waku SDK and dogfed in Status app

A new API is defined for feature X that abstracts the underlying protocol.

This API is implemented in nwaku, including c-binding and Golang wrapper.

The API is used in status-go and status app, ie, refactor

Dogfooding performed within the Status app context.

Documentation to use the API is available to developers

In this case, “feature X” represents:

Regular store queries
Store check when sending messages
Disconnection detection and automated store query (including hibernation)
etc

By doing so, each small achievable chunk provides a tangible benefit to the project.
This approach ensures that progress is made with regular practical outcomes.
Some of this work is already planned in the roadmap (relay reliability in nwaku), but the current challenge lies in dogfooding the progress made in nwaku. It is not clear who are the consumers of the peer-to-peer reliability protocols in nwaku now.

Another approach could be to push for the replacement of APIs in status-go immediately, and not consider the currently defined deliverables complete until this refactor is finished. This was the original intention when creating the roadmap, but there are some drawbacks.

The primary concern with this approach is that API definition and usage within status-go is difficult to quantify; it can only be evaluated by examining the code. Moreover, the ultimate value of this effort – a simple, high-quality API for the Waku SDK, hardened through Status usage – may not be ensured without further refining the SDK, providing documentation, and offering examples. These tasks have secondary priority compared to ensuring reliability and scalability for Status.

End-to-end reliability library

A lesson must be learned from the previous experience when designing the new e2e reliability API. It is crucial to invest time upfront to ensure that the API is simple and accessible to developers without any knowledge of the underlying protocol.

This was not the case with the current MVDS API or Waku implementations, which leak protocol details to an extent where protocol knowledge is required to utiliize these libraries; which in turn leads to a lack of domain separation.

Questions / Next steps

The key questions we need to address are:

Are we intentionally shifting the Waku/Status API rework from the Direct Message Reliability Milestones to Nwaku in Status Desktop Milestone, or to even further, not yet defined, milestones?
Or, are we able to define a clear value and outcome in proceeding with the API refactor now?
Based on our current understanding, what is the most cost-effective way to replace go-waku with nwaku? How can we prioritize mitigating the highest risks first?
Should we focus exclusively on replacing go-waku with nwaku initially, and move logic (and refine APIs) later?

These questions may lead to further considerations as well.

Ivansete · August 2, 2024, 8:00am

An additional point we need to consider is that nwaku should support Windows. This is a task we started to look at.

kaichao · August 2, 2024, 9:15am

Thanks for the summary. @fryorcraken

It makes sense to move the common patterns like regular store query from status-go to go-waku, or adding the similar logic to nwaku.

Replace go-waku with nwaku seems risky for a consumer app like Status, specifically related to testing and release timeline, should we instead harden the usability of go-waku? Such replacement can be planned along with a major release of Status app, for example from 1.0 to 2.0.

Also there are outdated code in status-go like waku-v1, better remove it sooner than later.

fryorcraken · August 5, 2024, 10:32am

Thanks @Ivansete. At this point in time, the deliverable is marked for *nix systems only. Windows being kept out of the scope.
We can decide closer to the time if we include Windows or do it as a separate deliverable depending on how the progress goes.

prem · August 5, 2024, 11:13am

Nicely summarized @fryorcraken

Wrt Questions/Next steps, my thoughts would be to go with point-4 replacing go-waku with nwaku initially, and move logic (and refine APIs) later

I see below advantages with this approach:

Just by invoking the simpler waku protocol API’s the switch to nwaku can be tested and stabilized quickly rather than with a whole lot of other logic which is present in status-go/go-waku.
There was feedback from Andrea(Status) that they had issues in integrating nim/nim-waku into status-go. Not sure of the details and the issues faced though, but integrating with just nwaku protocol APIs would be better to iron out any teething issues.

This would make the integration and testing with nwaku faster rather than migrating all waku SDK logic from status-go/go-waku to nwaku and then integrating with nwaku.

rramos · August 5, 2024, 1:28pm

It’s worth mentioning that we have initiated the process of transferring some code from status-go to go-waku, and some of these PRs have already been merged:

refactor: move rate limiter and priority queue from status-go to api package by richard-ramos · Pull Request #1171 · waku-org/go-waku · GitHub
refactor: move missing messages logic from status-go to go-waku by richard-ramos · Pull Request #1174 · waku-org/go-waku · GitHub
chore: move filter manager from status-go to go-waku by chaitanyaprem · Pull Request #1177 · waku-org/go-waku · GitHub
chore: move outgoing message check from status-go to go-waku by kaichaosun · Pull Request #1180 · waku-org/go-waku · GitHub
feat_: rate limit message publishing by richard-ramos · Pull Request #5523 · status-im/status-go · GitHub
refactor: extract missing messages logic to go-waku by richard-ramos · Pull Request #5638 · status-im/status-go · GitHub
chore_: extract message hash query for outgoing messages to go-waku by kaichaosun · Pull Request #5652 · status-im/status-go · GitHub
chore_: move filter mgr to go-waku by chaitanyaprem · Pull Request #5653 · status-im/status-go · GitHub

Our initial approach involved working on status-go and then moving the code to go-waku. Although this method introduces extra time due to the necessary refactoring to extract the code from status-go, i believe it proved effective given the constraints we faced for version 2.30. It was crucial to have functional code ready as soon as possible, and implementing changes in status-go first allowed us to achieve that goal efficiently.

Future Development Strategy

For upcoming features, I agree that our focus should be on developing on the SDK side first before integrating with status-go. However, we must remain flexible and reevaluate our approach based on deadlines. If necessary, we can revert to our original strategy. As long as the features we develop are limited in scope, such as those linked above, alternating between these two strategies should be feasible.

Clarification on status-go / go-waku refactoring

Regarding the statement:

Any logic moved from status-go to go-waku must be present in nwaku before the replacement. This means that more logic shifted to go-waku now creates potential blockers for the decommission.

I want to clarify that our current effort to extract code from status-go to go-waku is confined to moving logic to this package: go-waku API, which is not tightly coupled with the go-waku code itself. Once we receive greenlight to focus on replacing go-waku with nwaku, it should be straightforward to extract this api package into a separate SDK that can live on a different repository. We can then define interfaces that allow us to switch between nwaku and go-waku. Separating this SDK code from go-waku will also facilitate translating the code from Go to Nim, as it will not be encumbered by status-go logic.

Transitioning from go-waku to nwaku

Based on my current understanding, transitioning from go-waku to nwaku on the desktop involves the following steps:

Circuit Relay Client Support: nwaku requires circuit relay client support. There is a question of whether this feature “works” in go-libp2p. Based on our experience, I think that even if this feature is not guaranteed to work, it would still be a beneficial addition.
Feature Exposure in libwaku: Besides relay, we need to expose storev3, peer exchange, DNS discovery, and discv5 in libwaku. (We might also need to support filter server; @prem, please confirm if it is required.)
Golang Package Creation: We need to create a Golang package that performs the FFI integration of nwaku’s libwaku functions. This task should be relatively straightforward, although some hidden bugs might surface, which we will address as they arise.
Integration with status-go/status-desktop: In status-go/status-desktop, we need to utilize this new package. Additional work is required to ensure that libwaku is packaged in desktop releases. As @Ivansete mentioned, nwaku must run on Windows. More than 90% of desktop users use Windows, so supporting this OS is critical. This work can proceed in parallel, and just requires planning and prioritization to get it started.

Mobile Integration

For mobile integration, I have already begun enabling the use of libwaku on Android (nwaku/examples/mobile at master · waku-org/nwaku · GitHub). We will need support for the client side of the light protocols on libwaku (lightpush and filter) and to add iOS support. One concern is the file size of the libraries since app stores limit the size of applications to 100-150MB (I need to verify the exact limit). We may need assistance from the mobile team to develop strategies for loading libwaku and librln upon the app’s first execution rather than bundling them within the app. While this may not be a significant issue, I recall that certain features, like torrent functionality, were excluded from the mobile version to reduce file size, so it’s worth taking this limitation into account.

prem · August 5, 2024, 1:58pm

Yes, you are right. Indeed we need to expose storev3, peer-exchange client, DNS discovery and discv5.
I don’t think filter and lightpush server is required to be exposed outside though, because as of now whole of it resides only within go-waku and simliarly it can reside within nwaku. We just have to expose config to enable/disable these when wakunode is created.

prem · August 5, 2024, 2:37pm

This is working barring few issues and is supposed to work…so it would be required in nwaku to improve connectivity especially when a node is behind strict NAT

arnetheduck · August 15, 2024, 7:29am

FWIW, here’s a PR that puts nwaku in status desktop instead of go-waku: frankenwaku: send chat messages with nwaku by arnetheduck · Pull Request #9805 · status-im/status-desktop · GitHub

afair, the majority of the work was on the status-go side where a lot of the code handling go-waku integration was tangled will all sorts of things from database operations to user avatar selection - dropping in nwaku instead of go-waku is fairly simple given that both these projects have been engineered with an API in mind that ultimately is/can be exposted via REST.

Notably, this is “more or less” the approach that frankenwaku uses: from the point of view of status-go, nwaku becomes an (asynchronous) service to which RPC commands are sent and responses arrive eventually.

It’s worth noting that the way frankenwaku does things is quite insane though: from memory, to send a message from the desktop gui, qt gives it to nim which encodes it as json which gets sent to a geth message bus (!!!) which sends it to some status-go jungle which sends it to a whisper/waku interface which sends it back to nwaku via some contrived pipeline - no wonder this is hard to follow. Nevertheless, if you remove the noise, this is more or less the way to do go-nim integration: treat the two languages as two processes and establish a thread-safe channel between them where messages get sent with “whatever” encoding - the REST Json that waku natively offers is as good as any to get going.

Fitting nwaku in there is one step along this way though it’s worth noting that the gains from going down this path come later, when geth can be removed from status-go by reusing nimbus and the light implementations it offers of nearly everything from web3, light client verification, portal client etc etc that would turn status into a fully sovereign node.