Members of the Waku team regularly bring up that the development of multiple Waku clients (nwaku, go-waku, js-waku) and the usage of Nim are hindering progress. Despite allocating resources to each client, having more engineers does not result in faster delivery due to the high churn in terms of protocol research, design and implementation.
Therefore, I am writing this post to provide a factual view on the benefit and disadvantages of having several implementations, and the impact of using Nim and initiate a discussion on the subject.
My expectations is that, at a minimum, this post encourages a fact based discussion on the place of Nim in the Waku, and possibly Logos, ecosystems and a better alignment across the organisation.
It is my opinion that whether the proposed strategy is accepted is less valuable than strong alignment across the organisation.
Note that I hesitated to keep this discussion internal or public. In an effort to be in line with our principle of transparency and our commitment to build in the open, I have opted to publish this post here.
The use of Nim is a topic of debate when it comes to the efficiency of developing multiple Waku clients. In this summary, I will do my best to explain the strategic value of Nim.
The Status organization has developed Nim expertise through the previous Nimbus project. Therefore, it is logical to utilize this expertise for new projects like Waku.
However, there is limited transfer of Nim knowledge between teams, as observed within the potential Nim team’s scope.
Using a specific language can sometimes help attract specific talent. The understanding and enthusiasm for a particular language may indicate deeper analytical skills or system knowledge. This has been observed in the blockchain domain, as seen with IOHK starting with a Haskell-only codebase.
However, this has not yield benefits for Waku, as none of the current Waku team members had prior experience with Nim before contributing to Waku.
The Logos infrastructure project should be seen as a cohesive tech stack for the Logos network state. Using a common system language to achieve this goal is likely to facilitate inter-project integration.
However, Waku is currently focused on resolving the chat use-case for the Status Community, with the core library (status-go) written in Golang. While Codex uses Nim and Nomos uses Rust. Finally, while it is beneficial to integrate a library in the same language, it is also possible to expose c-bindings and enable inter-project integration this way.
The properties of the Nim language, such as native dependency-free executables and suitability for embedded and hard-realtime systems, make it well-suited for the Waku use case: supporting resource-restricted devices and environments.
As the primary user of Nim, Status has the opportunity to influence of the direction of the Nim ecosystem.
However, this comes with several downsides at it means most of the battle-testing is done by Status and not other projects or organisations. Moreover, it is not clear how much this opportunity is being leveraged. The lack of dedicated Nim ecosystem team is an example of such missed opportunity.
The maintainers of nim-libp2p are core contributors, which brings several benefits, including:
- The ability to influence the nim-libp2p roadmap, such as implementing websocket and WebRTC transports or specific optimizations.
- Priority support from maintainers when encountering bugs.
- Access to in-house libp2p expertise.
It is important to note that nim-libp2p c-bindings are a possibility, meaning that the aforementioned benefits could still be retained regardless of the Waku language implementation. Additionally, nim-libp2p cannot deviate completely from the libp2p protocol, and most of the features required by Waku (e.g. websocket, webrtc) were already available in other libp2p implementations (such as go, rust, and js).
What are the disadvantages of using Nim and how do they impact delivery?
Ref: Justification - Nim Integration/Support engineer (apologies to readers who are not CCs as this document is internal).
Ref: Nim Tooling Wishlist
The lack of expected tooling for a modern language is a daily obstacle to productive development.
In most languages, IDEs provide features such as browsing, auto-completion, hinting, refactoring, and error highlighting, which enable developers to write code faster and reduce their mental load (by relying on the IDE to provide hints on type names, for example).
Linting, static analysis, and dynamic analysis tools help reduce bugs and performance issues in a codebase, saving time spent on investigating and fixing these issues.
Several “standard” library do not have strong ownership and maintainers. While critical bugs are being fixed by the libraries’ original author, extending them is not always prioritized. Critical libraries such as nim-libp2p have clear ownership, but other libraries originally developed by the Nimbus team remain in a limbo in terms of roadmap and feature extension.
Some library simply do not exist, such as proto file base code generation for protobuf.
While we expect and observe that senior software engineers see Nim as a tool being used towards a goal, it is not the case from junior and medior engineers who, once familiar with Nim and its lack of tooling, sees it as a daily source of frustration. In the small lifetime of Waku, we have witnessed engineers departing mainly to their distaste of Nim.
This is also an issue when hiring talent as a strong emphasis need to be set on the fact that Nim is the language of choice, trying to filter out any candidate that may leave due to it.
Tooling and library maintenance are symptoms of the low maturity of the Nim ecosystem in general. Another aspect to this and consequence of Nim not being a popular language is that sometimes the nwaku team has to be the first one to attempt something in Nim language, such as:
- packaging nim library for NodeJS (attempted)
- Using websocket (nim in general and within nim-libp2p context)
- Using WebRTC, Webtransport (TBD)
- interpreting WASM in Nim (TBD for application level message validation)
- Embedding nim library on mobile, including using Nim c-binding for Swift/Kotlin/Flutter/etc (TBD, assuming we are not using go-waku)
Not only it means added effort for Waku and nim-libp2p teams to deliver, but also higher chance of encountering bugs from dependencies, as demonstrated with nim-websocket and nim-libp2p websocket transports.
In this google sheet I attempted to summarize the cost of the usage of Nim and the R&D of an additional Waku implementation.
Our rough estimation is that additional Waku implementations slow down research effort by 10 to 15% due to the time spent by research engineers to support software engineers in duplicating a new protocol in their codebase.
We also estimate that the usage of Nim itself necessitates an addition 3 contributors to reach the same result with another modern language, 1 on nim-libp2p and 2 assuming a nwaku team of 4 developers.
The estimate are based on the specific experience of developing nwaku in the past 2-3 years. Feedback on the calculation of the estimates is of course welcome.
I believe it best to keep the actual dollar figure internal, and will share them on the Logos Discord internal channel.
From a long term perspective, it is evident that multiple native implementations of Waku will be needed to ensure widespread adoption. While c-bindings are an acceptable temporary solution, mature projects are likely to need the flexibility brought by a library in the same language as their product. Especially considering that Waku sits low in the tech stack.
Ideally, and similarly to libp2p, new implementations would be started by other projects, for which Waku’s value to their product is so undeniable that they have a clear incentive of bootstrapping a new Waku library.
Waku is near MVP stage, with Status and The Waku Network Gen 0 launching soon. There is still a high churn in terms research and engineering. This high churn is happening on all implementations. Which why now is a the right time to discuss the cost of multiple Waku implementations.
Once Waku reaches a stage of protocol maturity where the core protocols are widely used and functional, and new research only exists to enable specific use cases, then it would make sense for the Waku team to revisit the maintenance of several native clients and measure the ROI in terms of effort vs reward (onboarding specific mature projects).
This could include maintaining a Nim library for the like of Codex, Nimbus or Nomos.
What outcomes can we expect from this discussion?
The mandate of the Waku team is to provide a suite of censorship-resistant, privacy-minded, and portable communication protocols. These protocols enable Status Communities, as well as other Status features and Logos projects.
The development of multiple Waku implementations hinders this goal, as stated in this discussion.
Having a “no change” outcome is likely to be the worst outcome when various solutions have been proposed to improve the situation.
One possible solution is to provide increased organizational support for Nim developers within the Logos Organization.
The obstacles faced by Nim developers have been clearly identified by various Nim teams (Waku, Codex, Nimbus). Many of these obstacles can be addressed by allocating dedicated resources to develop the necessary tools and libraries, as mentioned in the Justification - Nim Integration/Support engineer (internal document).
However, despite all parties recognizing the potential for significant improvement, and despite initial efforts being made to coordinate and define the responsibilities of a Nim team, no concrete progress has been made in terms of commitment to fund and establish such a team, as discussed in this Discord conversation (internal, Logos Discord).
It is important to note that considering the scope of the changes required in terms of tooling and libraries, it is likely to take 3 to 12 months before the project team sees any benefits from the establishment of such a team. Furthermore, the process of hiring and setting up the team itself may take an additional 3 to 6 months (unless we can source it from current CCs).
Regardless of the outcome for Waku, it is important to consider that such a team can still provide benefits to the Logos Organization and should be seen as a long-term solution to the issues outlined in this discussion.
One solution that has been implemented to address this issue is defining a specific scope for each Waku implementation, with the aim of avoiding redundant work. An example of this scoping can be found at this link.
However, it has become clear that this solution has limitations when it comes to mitigating redundant work:
- Managing discrepancies due to a different scope: The nwaku implementation serves as the reference implementation and service node implementation. Consequently, less effort is dedicated to the development of the “light client” feature. On the other hand, go-waku is the library used for Status applications, including Status mobile which utilizes the light client protocol. This discrepancy creates challenges when trying to align the behavior of both implementations, such as providing a common REST API.
- Redundant work is still necessary: Despite this, it is still necessary to implement all protocols in nwaku, as it serves as the reference implementation. This means that any protocol work done on go-waku and js-waku is dupe work (already done in nwaku), regardless of the specific scope of each client.
One possible outcome is to drop go-waku. Instead, the research and development effort could be focused on the nwaku client, making it the preferred Waku native library.
However, it has already been assessed that integrating the nwaku lib into the status-go codebase is too difficult. This difficulty led to the creation of go-waku.
Months of work have been dedicated to attempting the integration of nwaku into status-go. It is unclear what could be done differently for a second attempt to be successful. Furthermore, considering the timeline pressure to publish Status, dropping go-waku is not a viable option without jeopardizing the success of the Status app.
Finally, Golang is widely used language in the web3 ecosystem and in distributed systems, that, if not for Status, would be likely needed by other projects. This has also the benefits of encouraging contributions to the Waku ecosystem, with go-waku being the repo with most PRs opened by non-Status CCs.
Dropping nwaku is another possible alternative, the benefits are:
- Removing support for one client and focusing all efforts on go-waku and js-waku.
- Eliminating frictions caused by the immaturity of the Nim ecosystem and other Nim hindrances.
- Golang is designed to build scalable distributed systems.
- This would bring an advantage to run Waku node on a multi-core machine/hardware where it can more efficiently use the resources available (like all the cores/CPUs). This would address the high resource end of the Waku adaptive node concept which may not be efficiently addressed by Nim (as it is single threaded).
- Golang’s stdlib is well-documented, and one of the major strengths of the language is its first-class support for concurrency with goroutines and channels. Whereas for Nim, chronos is used due to the limitation of Nim’s support for concurrency.
- C-Bindings for go-waku are mature and already available to provide Waku to other languages.
- There is some evidence in terms of efficiency in coding in Golang as for over 2 years, Richard not single handedly caught up with nwaku in terms of feature parity, but also deliver several c-bindings wrappers while still continuing his status-go commitments.
- Thanks to go-mobile and integration in Status Mobile, go-waku is already available and tested on mobile platforms.
Note that go-libp2p is the reference implementation for libp2p so there is no risk of not having specific libp2p protocols unavailable.
In relation to the Nim language benefits, note that go-waku is already statically compiled.
However, there are downsides to consider:
- Rerunning simulations (RLN, etc.) that were done with nwaku. These simulations should be performed regardless to ensure that the performance for Status meets expectations.
- Losing in-house nim-libp2p support:
- As mentioned earlier, integrating nim-libp2p via c-bindings in go-waku is an option.
- The expertise available from the p2p team can still be utilized.
- go-libp2p is the reference implementation for libp2p, which means that any libp2p protocol we need would be implemented there first, reducing the risk of relying on a feature that is lagging (e.g., this would be a concern if we were to use rust-libp2p).
- Loosing native Waku library for potential integration in Codex or Nimbus.
- Adaptation period for engineers and researchers who do not have previous Golang experience, and for all nwaku engineers to get used to the go-waku codebase.
Status Communities needs to access the Waku network from the browser for a couple of web apps.
Decentralization is one of Status’ principles. Therefore, relying on a proxy, like a Web3 RPC Provider, to access the Waku network is not a viable option.
However, this approach falls under the ecosystem maturity risk category, which includes the following considerations:
- Can we make it run in the browser? NodeJS? React Native?
- Can we bundle the library in a small package size?
- Can it interface with WASM (zerokit)?
- Can it be split into several packages for modular and composable usage?
Going down this path would incur an initial steep cost, similar to the nwaku c-binding experiment or the integration of nwaku in status-go, before we can understand whether it is feasible and practical.
Taking it a step further, we can consider dropping two implementations. This would consolidate the risks mentioned in previous sections.
An alternative approach would be to start fresh with a single codebase that can be used across all languages, facilitated by c-bindings and WASM export.
This would mitigate the risks associated with abandoning each individual implementation.
The previous section listed all possible outcomes with associated benefits and risks.
Only two outcomes have risks that can be easily mitigated:
- Creating a Nim team
- Dropping nwaku and nim in favour of go-waku for service node and native library needs.
The Nim team outcome has already been discussed in details in More support for Nim projects and developers from the Logos Organization.. As mentioned, this would only yield benefit 3 to 12 months down the path, mitigating most, but not all, of Nim’s cons.
However, it would not address the issue of maintaining several clients during this high-churn period.
This only leave dropping nwaku as a viable option to remove the frictions, frustration and slow-down created by maintaining 3 clients and using Nim.
How do we move all research and engineering activities from nwaku to go-waku?
- Features: go-waku is mostly equal in terms of features, there may be some specific CLI flags that needs to be sorted. Various discrepancies have already been flagged and addressed thanks to the QA effort done by Vac/DST.
- Performance: Static analysis is in place for go-waku but some further memory profiling could be done. Relay and RLN simulation should be done for go-waku either way, as it is the main client for Status.
- PostgreSQL: Delivering PostgreSQL backend is a Status requirement to deal with a centralized/federated architecture. The research work to provide more a distributed store service has already started. Consideration will be needed to decide whether to switch Status fleet to go-waku + PostgreSQL (already implemented but stress testings would be needed) or keep nwaku and deliver distributed Waku store with go-waku.
- nwaku maintenance: as the set service node for Status fleet, nwaku maintenance would need to continue, or stress testing of PostgreSQL go-waku implementation would need to be done to enable a switch to a go-waku only fleet.
- Testing: in terms of interoperability testing, nwaku and go-waku are both being tested against js-waku and included in the work done by the DST team. Both go-waku and nwaku also have dedicated test engineers working on test improvements. The DST team is now working on an interop framework that is meant for both go-waku and nwaku from the start.
10k/1mil milestones: The remaining work is mainly around testing and simulation. This was mainly done for nwaku with planned to do it for go-waku once finalized. We are facing delay in terms of DST simulations. It is likely to reduce the overall work if simulations only need to be done for one client, go-waku, the one being used in Status Communities.
PostgreSQL simulation done in nwaku and other “nwaku as a service” node work would have to replicated in go-waku. However, this would not be a show stopper for Status launch as nwaku as it is would be enough for the launch. This work would be needed as we want to move Status to go-waku service nodes for new features such as distributed store, RLN, etc. Do note there is also the option to keep nwaku and PostgreSQL as research is in progress to distribute the Waku store service.
Gen 0: There is mostly parity in feature with go-waku and nwaku, only local simulations performed by Waku Research team would have to be re-run.
We currently have 5 engineer in the nwaku team and 3 in the go-waku team. One of the nwaku engineer is moving to a solution role and another moving to a research role. The combined team would then be 6 engineers. This is a reasonable size and no management overhead is to be expected.
As previously stated, this would have minimum impact on the Status Communities milestone. If anything, it would free time to run more simulation, test stressing and QA efforts on the go-waku library.
There is impact for a potential integration of Waku in Codex or Nimbus. To mitigate this impact, using c-bindings is an option. For example, TheGraphcast successfully uses go-waku via a Rust wrapper of go-waku c-bindings.
If Nomos were to use Waku, then the Rust bindings already use go-waku so there would be no difference.
- Review the proposal above within the Waku team (Nov 2023)
- Push this proposal to Logos’ leadership and founders (Dec 2023)
- If buy in, review with Status app leadership (Dec 2023)
- If agree, go-waku becomes reference implementation (Jan 2024)
- Onboarding of researcher and nwaku engineers
- New protocols are now implemented in go-waku
- Re-organize and re-distribute upcoming go-waku work to nwaku engineers
- Move all nwaku specific DST activities to go-waku (10k node simulation)
- Re-run simulations for go-waku:
- PostgreSQL stress testing may be re-done depending on discussion and agreement with Status app team
- Deploy go-waku fleets to replace existing nwaku fleets (some go-waku fleets are already running for Status), dogfood fleet, monitor
- go-waku fleets become “default” fleets for js-waku, Status (if applicable)