The Cost of Multiple Waku Implementations

arnetheduck · November 20, 2023, 12:14pm

Thanks for this long post. Long in, long out

Generally, the decision to own the full stack is strategic, to ensure that we build and retain ownership, licensing and competence for all the parts of infrastructure that we rely on - this is reflected in investments across the board, from, like Waku represents, messaging and privacy, across storage (like codex), consensus (like nimbus and nomos), user experience (like app / desktop) and, in the case of logos, the broader philosophical movement from which our efforts are derived.

From a historical point of view, is remains a contributing reason to why we went with Nim in the first place.

The strategy is also in place as insurance against changes in the nature of the projects that we depend upon as well as our seat at the table affecting the direction of their development.

A key piece of the strategy is also to encourage protocol-first development - ie every time we create an implementation, it should be possible to replicate it in any language by anyone without complication (as exemplified by the ease at which for example go-waku, a second implementation, can be done when a first implementation that has already solved the architectural problems already exists) - this puts additional stress on documentation and well-written standards.

In terms of alignment, the main direction of language support for Research / Logos is Rust / Nim / wasm (and by extension JS) for the time being.

We retain and make exceptions for go-lang in order to support Status via status-go, to avoid costly rewrites of the existing codebase, even though a refactoring of status-go is high on the wishlist - it is a poorly understood codebase that has grown significant amount of feature and tech debt as priorities have shifted in app development.

Our Nim needs have until recently been scaled roughly according to the needs of the Nimbus team (which itself is small) and a research contingent on the waku side - the Nimbus team in turn has delivered not only an extremely efficient implementation of an Ethereum client, but thanks to the seat at the table, also changed the direction of the Ethereum protocol (via the light client, portal, etc initiatives) - as a side effect of that development, they have also delivered an increasing number of high-quality Nim libraries aimed at chain development in general - not only the code, but also the research and understanding that comes with.

Ditto our libp2p implementation - efforts like IDONTWANT, although they are being shipped as part of the EIP-4844 package on Ethereum, address a core need to render gossip more efficient for larger transfers (such as in-chat images) at a protocol level whose raison d’etre can be traced back to early status app and privacy discussions - unlike sharding, it does so without compromising on the anonymity set. This is a good example of the benefit of working across many “zoom levels” of abstraction and layers.

IDONTWANT is currently unique to nim-libp2p - this highlights another important point: the work to improve the core infrastructure protocols and implementation will not go away if we move to a different implementation - they will remain, meaning that if we want to pursue a different implementation, we need to take a seat at the table of that implementation and do the work there. We have done this in the past and continue to do so where applicable, but such seats do not come without compromises - when the upstream projects have different priorities, the priorities of the project owner win - the strategy of owning not just bits and pieces, but significant parts of the stack, dates back to practical experience of this risk turning into reality in areas dear to us.

This cross-pollination works both ways - as waku drives the priority of implementing webrtc and websockets in libp2p, for example, projects like nimbus and codex will benefit from the increased resilience and flexibility, but above all this work puts us in a position to better and more practically understand topics like validator privacy, resilient and flexible networking and connectivity etc when we scale up efforts to logos as well.

The implementation of IDONTWANT was done in deep collaboration with the EF, libp2p and others - their research shows up in the quality of the protocol, but, as this post is focusing on the budgetary aspects, it doesn’t show up in our budget - this is the power of community development that we’re looking to harness - our skills and efforts combined with those of other likeminded developers and orgs to create an environment in which all projects benefit. Incredibly powerful, and something to harness both within status and outside. I like to rant about this point, BTW, when prompted.

It is indeed important for efficiency that projects support each other in their efforts and that they take our collective needs into account when developing their priorities. As we’re growing, that prioritization obviously must take new forms, hence the desire to start a new core Nim infrastructure team. Starting a new team takes time - hiring, organising, building understanding etc. That said, it is happening with hires going through their eval periods and the team being formed as an independent entity, instead of being an afterthought of the Nimbus team.

This highlights another key point - the Nimbus team has, as the outcome of their efforts, solved their most acute needs in terms of Nim infrastructure support. This no longer makes them the best candidates for driving further prioritisation in this area - this responsibility now falls, as evidenced by this post, in part on the waku team which is where growth and active new development is happening - it is thus not enough for waku developers to say “nim tooling is bad”, but rather it becomes both an opportunity and a responsibility of each developer to articulate their needs and/or take action, as can be seen in the writing of this post - the Nimbus team would not, at this stage, be in the right spot to articulate this as efficiently. We do not expect contributors to be passive consumers in this aspect, even if that’s a comfortable place to be - when something itches, we scratch the itch then go back to its root cause and fix that too.

With that out of the way, on to more practical stuff:

Duplicate work

As a reminder from past discussions, it’s good to reiterate that each implementation we currently have has broadly orthogonal use cases, and as such, should have minimal overlap in terms of features and maintenance outside of the protocol, which thanks to being well documented, doesn’t require much additional effort - when those priorities are not adhered to, we create the bed we lie in, ie one of overlapping maintenance efforts. As an example, go-waku exists to serve the needs of status-go - if anything outside of what status-go needs gets developed, that is a direct violation of this mandate and the consequence is inefficiency and increased maintenance for the organisation. Locally, this might have seemed like a reasonable step which is often why it ends up happening - globally, perhaps not.

We assume in our approach, that each developer understands this point and directs their efforts accordingly where the outcome aligns with this broad understanding, something that can be hard to remember in the heat of the moment because “it looks so easy to do” - see xkcd: Automation for a classic introduction to this oft-seen engineering phenomenon - for those of you in lead positions, recognising and directing effort is paramount and a key part of your responsibilities but ultimately, to thrive in an organisation as ours, this is a skill needs to be honed by every engineer. As many project management books will tell you, what you don’t do is often more critical to success than what you do - close to home, there’s an infinite amount of things the Nimbus team has not done (including not follow up on all bells and optional whistles of auxiliary libraries), in order to be where they are today (ie shipping with a comparatively small team on a monthly basis in tune with the rust, go, java and other implementations that have the supposed “language advantage”).

Native implementations

The core architectural granularity of integration in the web3 space remains REST/JSON-RPC in various shapes and forms - ie the interface between applications can well be resolved at this level, including for example our RLN.

This granularity is strategic and important to maintain - it allows us to focus on what we do well and outsource difficult problems to users that more intimately are familiar with their specific use case. RLN is a good example here - it is an example of a “hard problem” (sybil resistance) where we have one proposal (zk proofs) but other users may have others more tailored to their needs (ie private lists etc). By outsourcing this to an RPC protocol, we retain flexibility and can black-box the problem from our perspective but also gain implementation independence.

This same RPC interface can be used either via a loose coupling (ie http) or tight (in-process) and ultimately via well-documented protocols which serve as the basis for advanced consumers. As correctly highlighted above, the step where a native implementation with deep language integration, as opposed to coarse-grained RPC, comes at a separate stage but it is generally not necessary for us to develop useful products - it is however an important a point that needs to be taken into account when developing the product and our choices here, if deliberate, do no result in locking in one particular implementation or language for a specific component. Whether we get go-lang waku PR:s or RPC/API extension requests is downstream from the architectural decisions in how we present and develop the product, not upstream.

When interaction between components becomes so tight that the specific implementation language is of concern, it is perhaps a good time to pause and examine whether the problem architecturally is being approached the best way.

Coming back to waku and its go implementation, this is again a good example - it is quite possible to swap out go-waku for nim-waku in status-go, if desired - waku can be defined as an RPC interaction between an application and the network and this highlights the power of the RPC model - what usually stands in the way of such attempts is indeed tight coupling left unchecked while features keep getting added.

Here is a weekend experiment pr that exemplifies this - it requires several changes on the status-go side but very few on the nwaku side - the tight coupling of activities in status-go (chat protocols, database updates, community permissions and a plethora of other responsibilities in the same paragraph of code) lead to this outcome - the development of waku as, first and foremost, an implementation of an RPC protocol for the consumer means that the implementation waku itself does not need to be changed much to integrate it in any application). The same applies to the other end: as long as the wire protocol of waku is well-documented, the effort to create a native implementation tailored to a particular need is low - the hard problems faced during the first round of development will by this time have been solved and it becomes possible to focus on “second-implementation” problems like fine-tuning, optimisation and low-hanging UX fruit - this applies at this point to both existing and new implementations.

tooling and technicalities

Worth remembering on the tooling front is that every tool out there (apart from the code formatter) developed for C also works with Nim: this includes profilers (vtune), debuggers (gdb/lldb), compiler static analysis (*Sanitizer), valgrind etc.

Re wasm, we use emscripten - https://eth-light.xyz/ is an example of the light client implementation from Nimbus, written in Nim, running in the browser. It is also a excellent example of how we conduct and document protocol work. At some point, we even had a wasm smart contract stack in the works though priorities changed.

collaboration and hunger

Opportunities to collaborate and cross-pollinate are there for anyone that has the hunger to go after them and I’m happy to guide and help to anyone looking for more concrete ways to do this, including spending some of their time productively in a core library / infrastructure team - improving a library, writing a protocol document, providing a guideline for how a specific solution can be made generic, articulating a critical need by writing an issue or making a PR across status for a project you’re not directly involved with are all highly encouraged, as a general principle. When we look for outstanding work, this is often where we find it: someone that went out of their way to connect the dots between the various efforts that we have and shared their experience. It is rewarding in many ways, to look up and see how the seed you sow grows, nurtured by those that you helped when planting it.

Wrap up

I foresee the need to float each other’s boats to remain across our research projects and as such, nwaku provides invaluable support for the other projects we have going outside of its own core development (similar to how nimbus has delivered utility outside of their mandate, in the form of libraries and code that is not a blockchain client specifically) - both for Nim itself but more broadly by making our varied products more useful. That support flows back to waku in a way consistent with what one would expect, ie collaborations, core protocol changes in areas that matter to us as an organisation, access to a broader pool of researchers that we give to and they in turn give back etc as well as the ability to influence where we pool the Nim resources available to us.

I recognise from this post the growing pains in this process - ie as we hunker down to deliver milestones, it becomes easy to miss the benefits that are brought by such collaborations, but also to forget to do the collaboration itself.

Just like nim-libp2p became a team independent of Nimbus as needs grew, so is “core library” support growing into its own effort, with the aim to address our growing needs in areas useful to the projects that use Nim, and this post in and of itself is a signal that this is something worth investing in. I think we can make it work.

P.S. the deleted post is a fat-fingered incomplete version of this one.