Introducing the Reliable Channel API

As we started to define the Waku API and the work for the Chat SDK, we began to see that some “Waku Application SDK” would need to be defined.

We identified that SDS, segmentation and rate limit manager would be good tools for any developers. However, they did not fit in the Waku API: They are application level matters, whereas the Waku API sits in message routing and peer discovery domains.

Moreover, one may want to use those blocks without the opinionated encryption mechanisms of the Chat SDK. Justifying the need for such API.

This Application SDK and layers were previously discussed in:

The explicit definition of an API remains in line with our previous learnings here: Prometheus, REST and FFI: Using APIs as common language - #10 by fryorcraken

Finally, I often used this blog post as a reference for the layered architecture of Waku. This present post supersedes it.

This is the result of many discussions with the Waku app chat team @pablo @kaichao @jazzz, as well as my own experimentations with Reliable Channels (JS) and the Waku API.

Waku’s layered architecture

I propose a new definition of Waku’s architecture:

block-beta
    columns 4
    sc["Secure Conversations (Chat SDK)"]:4
    e["Encryption"]:1
    rc["Reliable Channels"]:3
    space:1
    block:reliablechannels:3
        columns 1
        sg["Segmentation"]:1
        sds["Scalable Data Sync"]:1
        enc["Encrypt/Decrypt"]:1
        rlm["Rate Limit Manager"]:1
    end
    e --> enc
    P2PReliability["P2P Reliability (Waku API)"]:4
    block:wakuapi:4
        columns 7
        RLNRelay["RLN Relay"]:1
        Store["Store"]:1
        Lightpush["Lightpush"]:1
        Filter["Filter"]:1
        Discv5["Waku Discv5"]:1
        PeerExchange["Peer Exchange"]:1
        PeerManager["Peer Manager"]:1
    end
    block:libp2p:4
        P2PEncryption["Point-to-Point Encryption"]
        Multiplexer["Multiplexer"]
        Transport["Transport"]
    end
    block:base:4
        UDP["UDP"]
        TCP["TCP/IP"]
    end
    classDef borrowed fill:#e1f5ff,stroke:#0066cc,stroke-width:2px,stroke-dasharray: 5 5
    class enc borrowed
  • At the top is the secure conversation part of the Chat SDK. I am still unclear on how some parts fit between status-go and the chat SDK, such as identity. Further discussions, explorations and work may be needed here.
  • Secure conversations leverage reliable channels and some specific encryption mechanisms. The encryption mechanisms are the core USPs for the Chat SDK, in addition to using Waku.
  • Reliable channels is what we have loosely referred to as “Waku Application SDK” in the past. It is an opinionated layer on top of Waku that enables e2e reliability (SDS) with smooth handling of Waku restrictions: message size (segmentation) and RLN Relay (message rate limit manager).

Then, the Waku API remains as defined and enables an easy, foolproof way of using Waku with some p2p reliability mechanisms.

Reliable channels

The existing TypeScript implementation of reliable channel only implements the SDS wrapping part of this new API.

In nim, the various components are ready (sds, segmentation, rate limit manager). The reliable channel API intends to bring them together in a way that is easy to integrate when adding encryption.

Here is a spec draft for reliable channels, as defined in this new layered architecture: introduce reliable channel API by fryorcraken · Pull Request #89 · waku-org/specs · GitHub

And a Frankenstein, non-working implementation in js-waku: feat: Waku API's subscribe by fryorcraken · Pull Request #2683 · waku-org/js-waku · GitHub

(The implementation is blocked, and feedbacks, on the Waku API’s new send and subscribe functions.)

Segmentation

A decision that breaks away from Status’s current approach is the fact that segmentation is to be done within the encryption layer, instead of outside of it. See previous layers.

In Status protocol, messages are encrypted first, and then segmented. This means that segmentation can be seen by external observers.

But it also means more precise segmentation (no encryption overhead to plan, as encryption is already applied).

With the new ordering, we are segmenting a message first, then applying SDS, then encrypting.

The reason for that is to enable SDS to be applied to the message chunks, reducing potential retransmission of all chunks.

It does make the API slightly more complex (we expose the chunks to the API consumer). We could simplify the API by not exposing the chunks.

This proposal stems from the fact that Waku RLN Relay is rate limited, and we would want to avoid retransmission of all chunks when some are already acknowledged.

The downside is that encryption + RLN proof have to be predicted in the segmentation process, as they are done after segmentation (hence a proposal of 100KiB default size for chunks).

Another upside, though I’m not sure it matters, is that segmentation becomes hidden from external observers, preventing any correlation between messages.

Encryption

There may be a better name for this block and I defer to @jazzz to review.

Next steps

Feel free to review any of the pull requests referenced or discuss here.

As we progressed on the Chat SDK deliverables, we may want to do a checkpoint to confirm that the scopes currently defined still make sense.

As part of this review, we can add for 2025 H2, or 2026 H1, a specific deliverable that delivers this API in JS and Nim, to both ensure a strict and tidy boundary within the Chat SDK, and provide said API to all developers.

Finally, as we define the Waku API, we need to ensure it enables reliable channels, with reliable channels being the primary consumer of the Waku API moving forward for both JS and Nim.

Edit: diagram correction: sds wrapper wraps a plain payload, and itself is encrypted

6 Likes

The learnings led to more justification and precision in terms of reducing the store API in Waku API: No store in Messaging API

We know have a clear consumer for the Waku API: reliable channels.

Which means that store related functions on the Waku API need to be sufficient for reliable channel, aka SDS, and nothing more.

So it means exposing store hash queries, to find messages based on retrieval hints. This is fine and expected, and hash queries should be fine re optimization.

However, it does not mean exposing time range queries, but assuming that they are being triggered and used as per P2P-RELIABILITY specs.

One optimization I have demonstrated in js-waku reliable channels is that when time range queries are done, they are done backward in time, and can be stopped based on a hook.

In the case of SDS, it means finding a message with valid causal history. So that from then, hash queries can be used to retrieve the other messages.

This is something we actually may want to force upon developers, having to pass such hook, to make them think of a way to optimize message retrieval.

Do note that the main difference between js-waku’s reliable channel implementation and the store reliability in P2P reliability specs is that instead of doing periodic time range queries (spec), it (js):

  • periodic hash queries based on missed messages as identified by SDS
  • backward time range range query from now until last successful store query upon connecting to a new store node

To be reviewed regarding aligning JS and the Spec.

1 Like

My main comment is that SDS is designed to function within the encryption layer (i.e. “above” encryption in the stack). This is because questions about end-to-end reliability really only can be answered where a conversation’s participants are known (which maps to the underlying encryption).

SDS would in fact be very vulnerable to attacks if it does not sit within the trusted encryption of an app-layer conversation (anyone could introduce malicious or circular dependency graphs) and it breaks sender anonymity, conversational anonymity, etc. by design. SDS-R even adds possible computations where the number of participants in a conversation must be known - something only possible pre-encryption.

That said, I do agree with Reliable Channels being the de facto “Waku Application SDK” and the kind of abstractions we want to hide within the API would be very useful. I think it will have to build on (or also be adapted for) “the opinionated encryption mechanisms of the Chat SDK”. I’ve been toying with ideas on how parts of SDS can be used on lower, post-encryption layers, closer to routing, but even then we’d still have to introduce another e2e reliability layer within the pre-encryption layer too, as e2e reliability cannot be provided on the routing layer.

For me, the Reliable Channels API is very close to what the conversation part of Chat SDK should look like. In fact, perhaps Chat SDK ~= Reliable Channels + Identities, with a revised encryption layer. We can try to make SDS as pluggable as possible to other encryption environments, but use the Chat SDK encryption approaches by default as they’re developed.

This sentence seems to contradict the proposal of introducing SDS post-encryption as in the rest of the doc. Am I missing something? FWIW, I like the idea of segmenting before SDS wrapping to improve scalability.

1 Like

I agree with the picture we are drawing here.

Waku API becomes a way to use the network in a convenient / reliable way and encapsulates Filter / Store / LigthPush + P2P Reliability.

Where as Waku Application API goes into establishing reliable logical (PoV of an app) channels by using SDS / Encryption / Identities.

I believe this is the best product proposition we can have so far.

1 Like

My initial response below… still valid, but the issue is that there was a mistake in the diagram! Now I understand @haelius’ response…


Yes, if you look the diagram, While “encryption” is on the side of reliable channel, it should be an object/hook passed to reliable channel, so that indeed, it can sit below SDS (as you could see on the right side of the diagram).

I guess this is where this sort of diagram can fail to express the complexity of the architecture. I think the important points are:

  • Providing an opinionated integration of SDS + segmentation + rate manager is useful, as deciding how to configure all 3 of those (for Waku RLN Relay) is a headache we want to take away from developers, even our own (aka @jazzz )
  • Encryption is a separate matter all together, so while we would expect most people to just use Chat SDK, it makes sense to have it separate, as it is a different domain. We are drawing a line.
  • yet, encryption does have to sit under SDS, so the layer is awkward, and expressed via a hook (see IEncryption in the js-waku PR)/

The nuance I am proposing is that

Chat SDK ~= Reliable Channels + Identities + Encryption

AKA, while reliable channels need access to encryption mechanisms, it is agnostic to those.

For the Chat SDK, it means in a scenario where we upgrade/offer noise then sender key then de-MLS, there should be no changes to reliable channels code and API.

It is a logic and piece of code we can write now, and should remain unaffected by the encryption (and identities) choices we make in Chat SDK.

Yes, but the nuance here is reliable channels do not come with encryption. But the Chat SDK comes with a default encryption that is passed to reliable channels.

Developer will have the choice to either:

  • use reliable channels and bring their own encryption (and identities)
  • or use chat sdk, that will provide default encryption/identities

Again, the nuance here is done to help organize work and teams and have a clear boundaries in terms of what codebase does what.

oopsies, the diagram needs correction, yes SDS should be applied first, then encryption (and for incoming messages, decryption first, then sds)

1 Like

Ah, makes much more sense now. :slight_smile: I missed that Encryption would be a pluggable component provided by the upper layer Chat SDK.

Now agree with this division and proposal. I think SDS would have to evolve though to be configurable for encryption use cases with small number of participants (e.g. 1:1 chats work with SDS but it could be much more efficient - a bloom filter is completely unnecessary in such cases, for example). I think we should define a new milestone for SDS improvements soon.

1 Like

Continuing further on the role of store.

The store protocol started very generalized, especially time range queries, that provides many features for the user (requester)

  • customizable start and end time for the range of messages to query
  • cutomizable pagination limit
  • customizable direction (forward or backward)
  • pagination cursor

With some limits (now) imposed on server side:

  • max message per pages (100, default 20)
  • max time range of 24 hours

Generalization is a problem for both performance, bug surface, etc.

I believe that we have the opportunity to restrict store capabilities to make it work for SDS (or SDS-like) application protocol, and not more. Such capability restriction can help with performance and maintainance.

I recently heard an opinion that “code becomes become technical debt as you gather knowledge on the domain, removing technical debt is actually applying the knowledge you acquired by building, and using, the system in the first place”.

In our case, I would propose to:

  • remove customizable direction (backward only)
  • remove customizable start time: always applied at time of query

Hence, a time range query would either:

no pagination cursor:

  • always be backward
  • always be from now to some time in the past as specified by user

with pagination cursor:

  • always backward
  • from cursor to some time in the past as specified by user

In the case of reliable channel, it means paginating until a message with a valid causal history is found, store queries can then be switched to hash queries.

@Ivansete @Zoltan do you think such reduction of store’s functional scope would help in general with maintainability and performance?

@haelius @kaichao @pablo I believe that p2p reliability strategies as defined in specs and implement in status-go would still work, would you agree?

This is of course a high level, back of napkin, proposal. There is something to say about pagination, and not finding messages within a given time range (no cursor) but wanting to continue the query further.

Re-iterating on store, we actually only need 2 features, even more simple:

no cursor:

  • req parameters: content topics, messages per page
  • res: one page of messages, from time of request, backward

yes cursor:

  • req parameters: content topics, messages per page, cursor
  • res: one page of messages from cursor, backward

I’m wondering if it’s possible to not use cursor, even no real timestamp or time range query.

From user’s perspective, when start the app, user only has its latest known message hash to the content topic. The question users likely want to ask:
I'm at message 0xabc...123, can someone help me move forward to next 20 messages and indicate if I'm at the head of the content topic? User keeps asking until reach the head.

To implement this logic, we need to order the messages belongs to the content topic in store protocol or SDS. A few ways can reach a “non-global ordering”,

  • real timestamp based, the current implementation actually looks pretty like using timestamp as the ordering
  • append only log local Lamport-like counter, increase the counter by 1 when sees/insert a new message for the content topic.
  • hash-linked ordering, not sure if it’s performant enough for our usage.
  • maybe other cool ordering in non strict environment?

Overall I think this makes sense. Bundling these 4 steps together in a well constructed manner would be a huge help for developers looking to get building faster.

Huge Wins

State Updates Scheduling

In a ratcheting encryption scheme, you only want to update the encryption state when a message has been successfully sent. Similar in SDS, updating the message history with messages that were not sent, would produce artifacts in the causal tree.

Having this implemented correctly would save developers much time and headaches.

Would reliable channels handle state storage internally? or provide a persistence and migration api?

Encryption

Providing some pre-built encryption methods too would make it even easier. If chat handles Identity, Do we have examples of non authenticated encryption that developers want to use (other than using waku’s built in encryption)?

Complexities

Encryption Interface

AKA, while reliable channels need access to encryption mechanisms, it is agnostic to those.

With strict protocol definitions I think this statement could be true. However the encryption interface is more complex than simply Encrypt/Decrypt. Many encryption approaches (e.g. ratcheting ) requires some per conversation state. This stage will also need to perform authentication too if its being used.

Payload Framing

The ChatSDK and other applications will want the ability to control the outer frame prior to being sent to waku. In the chat-usecase messages need to be associated with an encryption state before they can be decrypted due to there being multiple logical channels. As this has performance and privacy impacts, having control over this would be desirable. In use-cases where Identity is not a factor, this becomes less relevant.

There’s also Associated Authenticated Data to consider, and the matter of Parameter binding too. For example, is it safe for payloads to arrive from any channel? How can developers add associated data and ensure its verified? Theres hidden complexity here but solvable through clever APIs.

Yes, this is also my line of thought.

Note that the cursor is created from the message hash already.

I still think it should preferably be a backward search. The idea is to really optimize for “store node” storing as little data as possible. So it should focus on getting the “most recent” message, aka, backward first.

And then it’s about getting enough recent messages on a content topic, to get a relevant SDS causal history and switch to hash query from then.

However, it would make sense to send this information (latest known message), to help optimize the query:

req:

  • latest known message (hash/cursor) ABC
  • max message per page (or we even remove that)
  • content topic(s)

res:

  • messages from query time (now) backward towards past
  • stops at latest known message (hash from req) or max message in page, whichever is hit first
  • if max message in page is hit, it would return messages from now to message DEF

If there are more messages to get, then you could only use hash messages to ask for a given “interval/page”

req:

  • latest known message ABC
  • oldest message from recent query DEF

Allowing you to continue paginating backward until you find a relevant message or you are caught up.

Potentially interesting from DB PoV:

  • DB needs to be ordered by timestamp (insertion)
  • But no indexing is needed on timestamp (only on hashes + content topic)

I am no PostgreSQL expert, but intuitively it does seem like a better plan.

The only issue is hitting a store node that does not know ABC, and hence you may just “paginate backward” forever. The solution:

  • in general, store incentivization should be cheaper for hash queries (very predictable for provider) vs range queries (not predictable)
  • Encouraging app developers to use strategies like reliable channel where range queries are minimally used (only to find SDS message with causal history)

Also, from a Chat App Pov:

  • Most recent messages usually matter most in large chat
  • Most recent messages allows to jump back in convo quick
  • If there is history, you catch up with hash queries which are more efficient

Yes, for SDS you must store internally:

  • messages that are part of the “log”, aka, the ones that have been sent an acknowledged
  • outgoing and ingoing buffer, for messages we are aware off, but not yet committed (delivered) to the log

You can see in JS SDS implementation that one can pass a localHistory object with some of the KS Array traits: js-waku/packages/sds/src/message_channel/message_channel.ts at 0df18b2a75f558f86f03c7f1c1e4b5e89d92f009 · waku-org/js-waku · GitHub

The default object MemLocalHistory stores in memory, but the idea is one can replace it to persist the SDS data.

ReliableChannel spec draft mainly interface via event emitting, but other API could be exposed if needed.

Not sure you what you mean by migration API.

Like everything, we start simple and create an API that works for the chat SDK encryption scheme. The point here is to define clear boundary and module to ensure we have a coherent architecture.

IMO, it is fine if at a later stage, we have to extend the Reliable Channel API, specifically around the trait an Encryption module has to implement, to enable new encryption mechanism such as de-MLS.

This is more about proper architecture than saying that “many developer will want to build on Reliable Channel SDK”.

Having said that, we do need the Chat SDK to be specified and done to understand the pros and cons of the chose encryption strategies, and based on that, we may better understand what encryption alternatives developer may want.

But again, identify a layer in the Waku stack does not necessarily mean supporting different upper layers. We need to be careful actually of not aiming to define too generalized APIs (see store discussion in same thread).

Clean defined API != generalized API.

It is more about defining boundary and being agnostic to the layers, while still create an API that is sufficient for the upper layer.

In our case, a potential outcome here is having similar Reliable Channel APIs defined in JS and Nim, but then having the encryption implemented once, and re-export to wasm or JS to be used in browser or native.

At the end of the day, said Encryption module, while exposing encrypt, decrypt function, would also access the rest of the reliable channel API, and be able to track and refer to knowledge provided by the SDS layer.

As you said, we will probably need to cater for out-of-order messages, meaning if one cannot decrypt a message, we may park it at a later stage, the same way that we park SDS messages without a resolved causal history in an ingoing buffer.

Moreover, right now the proposed reliable channel API does not expose too much of the causal ordering of messages. This is something we started to discuss both with opchan and SolarPunk to understand what would be the ideal API.