End-to-end reliability for scalable distributed logs

cammellos · May 29, 2024, 10:32am

Thanks for the write up, very thorough and I particularly appreciate the Protocol steps section.

As discussed yesterday, I will add some notes:

sender_id

In Message sender_id is probably not necessary, as that’s most of the time either extracted from a signature or inferred (like in the case of the double ratchet when it’s inferred by the fact that the there’s only one other member who can have that shared secret with you, i.e plausible deniability).

message_id

message_id is also not specified generally, as it can be manipulated, so we generally use the hash of the content or similar.

lamport timestamps

With regards of lamport timestamps, we have adopted them in the current protocol, but with one difference as plain lamport timestamps don’t support one common case.
A chat can be seen as distributed system with no central coordination and permissionless.

In some cases there’s no explicit “joining” (public chats in the old app, but 1-to-1s are similar with that regard), so a new member that could not retrieve information about a channel, would start at 0 if we were to follow canonical lamport timestamps.
Also it’s important to note that the app is or wants to be offline first, so syncing before being able to send a message is not something we want in general.

If we are thinking about a chat, that would mean that the message would be ordered last, at it has no causal relationship with any other message, which is problematic. What we do instead, is we “hint” lamport timestamp with a wall clock:

clock = max(time.Now(), previous_clock+1)

This maintain causal relationship between events and is consistent with lamport timestamps (you can think about it as each member having started at 0 and ticked up each time internally at each second, unless aware of another clock having ticked up). We maintain a total order of events by breaking ties with ID, so that everyone has the same ordering of events.

casual_history

With regard of casual_history and specifically:

Check dependencies in the causal history:

If all dependencies are met, process the message.
Otherwise, add the message to the buffer of incoming messages with unmet causal dependencies.

This is something that we discussed a lot back when mvds was first implemented (the feature I believe is in the codebase already). From our point of view, it’s not something that we generally want to have, as we almost never wants to delay processing messages as it leads to poor user experience in case of out of order messages or message loss (if you are strict about it, if any message failed to be delievered, then the chain is interrupted forever). Most of the protocol is build on the assumption that message loss will occur and it accommodates that (at the cost of extra bandwidth, we piggy back information often for example, or accept that there might be gaps etc).
There are only a handful of corner cases where that’s not currently the case, and this could be helpful, one of which is message segmentation, where you cannot process a partial message, but just id doesn’t cut it (we need something more like group_id+count), and currently community channel encryption as we haven’t quite figured out an efficient way to propagate the key, but I think that’s better solved by improving key exchange.
It can be useful to identify gaps of course. I would personally explore more those scenarios, but it’s certainly something of secondary importance from our perspective.

pulling from store nodes based on message_ids

This is fine, but I think it needs to be complimentary to querying by timestamp, as querying by ids it’s very chatty and will lead most likely to long syncing times, we already see long syncing times with straight timestamps with a cursor of 20 (up to 8/9 minutes to sync a day worth of data), and this seems to be strictly slower in the worst case scenario, which is quite common (user of an active community goes offline for a day).

I think these are the immediate things that I could spot on the document, but in general, it’s probably a good exercise to go a bit deeper on how it integrates with the current functionalities, since I think there’s a fair amount of unknowns, and from experience (mvds etc), if it’s not worked out to fit with the current technology, there’s a risk of it not being as effective as it could be, as there still a fair amount of unknowns.