As part of the 2024 roadmap, Waku’s team scope increased, leading to more work in the status-go codebase. As a result, when setting milestones and deliverables, a directive was proposed:
Code to edit should be extracted: to keep the status-go/go-waku boundary in line with the organisation, code in status-go that needs to be modified to execute the milestones, should generally be extracted from status-go and replaced with an API call.
However, since the work has started, this seldom happened, with the exception of filter management being moved from status-go to go-waku. It is now time to re-engage the discussion on code architecture to ensure alignment on the strategy to adopt.
Goals
The current goal is to make Status’ usage of Waku reliable and scalable, which includes improvements in status-go, go-waku, and nwaku; as well as defining and implementing new protocols and libraries, such as the e2e reliability protocol.
In the context of this goal, it becomes clear that refactoring status-go and replacing APIs would not yield immediate significant value. I believe this is one of the reason why we have not followed the directive.
Other contributing factors may include:
- The complexity of the task
- The high code churn, which hinders our ability to define new APIs with confidence
These challenges make it difficult to justify the effort required for such an extensive refactoring and API replacement process.
The next deliverable on the roadmap is the replacement of go-waku with nwaku.
This deliverable will reduce the number of clients to maintain and enable Logos/IFT directive on usage of nim.
It also enables the bundling of the work done for Status in nwaku and offering it as a single native SDK for external projects.
When planning milestones and deliverables, a balance must be struck between:
- Speed of delivery for each milestone
- Total effort
There are two extremes to consider:
- Each goal is considered in isolation, potentially leading to technical debt being introduced with the first deliverable, increasing total effort.
- Only considering the end goal, pushing the delivery date of the first goal closer to the third goal, meaning postponing the delivery of Status reliability.
By continuing to use go-waku for now, we have leaned towards option 1. The decision was made to deliver significant improvements to Status reliability first and foremost, drawing a line in the sand at RLN in terms of sun-setting go-waku.
Work highlights
To replace go-waku with nwaku in the Status application, the following work needs to be done (in no strict order):
- Understand and establish best practices for using nim in a Golang context, including go-routines, queues, etc.
- Move peer-to-peer reliability strategies from status-go to go-waku/nwaku:
a. Regular store query
b. Store query after sending a message
c. Disconnection detection and automated store queries - Improve and implement light client logic in nwaku, such as filter management and general client-side behavior of req-res protocols, including reliability mechanisms
- Move e2e reliability strategies from Golang to nim/nwaku.
- Understand how to bundle nwaku in Android and iOS Status mobile apps.
The proposed approach is to use nwaku first in Status Desktop, creating a special relay-only build. This will allow for early dogfooding without waiting for the completion of tasks 3, 4, and 5.
(3) can then be completed by adding light client mode to the special Desktop build and making nwaku the default client in desktop. At this point, go-waku will no longer be used for relay or desktop purposes.
Finally, (5) can be accomplished to fully sunset go-waku.
During discussions with Waku Golang engineers, it is clear that the most significant portion of work revolves around (2).
I can see the challenges being multifaceted. However, my expertise with status-go is limited, and I would appreciate any input:
i. Defining the interface between status-go and Waku
This entails defining an API and a contract between these two domains. It is crucial to consider how we envision the ideal Waku SDK API and ensure it fits within status-go.
I designate this as a distinct item due to its importance, where the boundary between Waku and Status (or other applications) must be clearly defined.
ii. Implementing the defined interface between status-go and Waku
This is the largest chunk of work, requiring status-go to be refactored to accommodate the new API.
Ordering
There are several constraints and possibilities when it comes to ordering the work for replacing go-waku with nwaku:
- Any logic moved from status-go to go-waku must be present in nwaku before the replacement. This means that more logic shifted to go-waku now creates potential blockers for the decommission.
- Implementing an API does not necessarily require implementing it directly in nwaku; an API can be defined with code relocated elsewhere within status-go, before being moved or replaced.
- To enable early dogfooding, go-waku will initially continue to be used on mobile devices. This means that any code pushed to nwaku must exist in Golang form. However, this does not necessarily mean that the code should be present in go-waku; it can remain in status-go with a build-time switch.
- The previous point also suggests having an abstraction layer that allows for swapping between nwaku and go-waku at build time.
By considering these constraints and possibilities, we can devise a more effective plan for replacing go-waku with nwaku while minimizing potential blockers and ensuring smooth progress.
Strategy
Another way to present the two extremes is:
- Accumulate so much technical debt to deliver reliability (a) quickly that the next steps (b) and (c) become extremely difficult.
- Spend so much time and effort avoiding technical debt that reliability (a) does not get delivered within the original proposed timeline.
To avoid this, a plan must be defined to tackle the work by breaking it down into small achievable chunks. It is also best if each new interface used in status-go should provide direct value to the product and customer.
A potential solution would be to set a series of deliverables defined as follows:
Feature X is available in Waku SDK and dogfed in Status app
- A new API is defined for feature X that abstracts the underlying protocol.
- This API is implemented in nwaku, including c-binding and Golang wrapper.
- The API is used in status-go and status app, ie, refactor
- Dogfooding performed within the Status app context.
- Documentation to use the API is available to developers
In this case, “feature X” represents:
- Regular store queries
- Store check when sending messages
- Disconnection detection and automated store query (including hibernation)
- etc
By doing so, each small achievable chunk provides a tangible benefit to the project.
This approach ensures that progress is made with regular practical outcomes.
Some of this work is already planned in the roadmap (relay reliability in nwaku), but the current challenge lies in dogfooding the progress made in nwaku. It is not clear who are the consumers of the peer-to-peer reliability protocols in nwaku now.
Another approach could be to push for the replacement of APIs in status-go immediately, and not consider the currently defined deliverables complete until this refactor is finished. This was the original intention when creating the roadmap, but there are some drawbacks.
The primary concern with this approach is that API definition and usage within status-go is difficult to quantify; it can only be evaluated by examining the code. Moreover, the ultimate value of this effort – a simple, high-quality API for the Waku SDK, hardened through Status usage – may not be ensured without further refining the SDK, providing documentation, and offering examples. These tasks have secondary priority compared to ensuring reliability and scalability for Status.
End-to-end reliability library
A lesson must be learned from the previous experience when designing the new e2e reliability API. It is crucial to invest time upfront to ensure that the API is simple and accessible to developers without any knowledge of the underlying protocol.
This was not the case with the current MVDS API or Waku implementations, which leak protocol details to an extent where protocol knowledge is required to utiliize these libraries; which in turn leads to a lack of domain separation.
Questions / Next steps
The key questions we need to address are:
- Are we intentionally shifting the Waku/Status API rework from the Direct Message Reliability Milestones to Nwaku in Status Desktop Milestone, or to even further, not yet defined, milestones?
- Or, are we able to define a clear value and outcome in proceeding with the API refactor now?
- Based on our current understanding, what is the most cost-effective way to replace go-waku with nwaku? How can we prioritize mitigating the highest risks first?
- Should we focus exclusively on replacing go-waku with nwaku initially, and move logic (and refine APIs) later?
These questions may lead to further considerations as well.