Optimizing Community Description

The CommunityDescription’s message size has been an issue in the past (reached a size over 1MB, the current max message size limit for Status shards) and has been identified as not scalable: Scaling Status Communities : Potential Problems · Issue #177 · vacp2p/research · GitHub

In Optimizing the `CommunityDescription` dissemination - HackMD, @rymnc proposed to extract the membership list from the message and push it to IPFS.

When considering RLN and de-MLS, there may be new options to handle a community’s membership list.

Note: forking the conversation from Scaling Status Communities : Potential Problems · Issue #177 · vacp2p/research · GitHub as the design below assumes usage of RLN or de-MLS for Communities, which has not yet been planned.

RLN for Communities

In the case of community dedicated shards, where one and only one community routes their messages on one or several shards.

The current DoS protection model is opt-in message signing. Where the community owner broadcast a pre-shared private key to all community members.
Community members sign their messages on the routing layer with this key.
Waku relay nodes routing for this community’s shard, validate the signature against the matching private key at gossipsub level.

While this strategy can help ensure that no other community or application pushes traffic to the protected community, it has a number of caveats:

  • No bandwidth capping or usage guarantees
  • No model to handle kick/ban of members
  • Difficult handling of breach (eg key is leaked). Breach risk is high, as all members have access to the same key.

Using RLN would remediate those drawbacks. The community owner could deploy an RLN smart contract specfiically for their community. Community member would need a membership inserted in the contract. Membership insertion could be done:

  • via stealth commitment where the community owner inserts the membership on the contract
  • via token gating, where the community owner drop a token to accept a member in the community, said token can then be used as an authorization to insert a membership in the smart contract

In this case, the RLN Merkle tree now acts as a membership list. As only members of the community are in the tree and can publish to the community’s shard.

Which means the membership list could be potentially removed from CommunityDescription. Where instead, members would broadcast their own details to the network in lieu of the community owner sending the full list.

The list could then be constructed from member’s messages. This would also introduce some efficiency where offline members do not pollute the membership list. After all, if they are inactive and haven’t open the app for a while, why a user would want to see them in the list (product assumption)?

The missing feature would be to express the members’ roles. In this case, the community owner could still include some community members in the CommunityDescription message, only if their roles is not none.

deMLS for Communities

Currently, community traffic is encrypted using symmetric encryption. The community owner provides a symmetric key to new members. Members of the community use this key to encrypt and decrypt all messages. This comes with a number of caveats:

  • no forward secrecy
  • authentication is out of band
  • No model to handle kick/ban of members
  • Difficult handling of breach (eg key is leaked). Breach risk is high, as all members have access to the same key.

Using deMLS would solve this drawback.

deMLS proposes to use onchain smart contract to manage groups and their members (reference missing).

The community owner would need to deploy a smart contract to manage group encryption for their community. Similar to RLN, the community owner could add/remove members to the group, or drop tokens to allow members to add their own information.

Similar to RLN, it would mean that only members of the deMLS group matching the community would be able to send messages in the encrypted group. A similar approach than before, where the members list is constructed from messages emitted by active members, could be adopted.

Conclusion

The previous proposal of using IPFS to store a Community’s membership list is still relevant. However, it does mean integrating new technology in the Status app (IPFS) and solving problems of pinning, retrieval etc.

Looking at medium and long term protocols applicable to Status Communities (RLN, deMLS), new opportunity arises to solve the problem of unbounded size of the community description message.

Annex

The protobuf for CommunityDescription

message CommunityDescription {
  // The Lamport timestamp of the message
  uint64 clock = 1;
  // A mapping of members in the community to their roles
  map<string,CommunityMember> members = 2;
  // The permissions of the Community
  CommunityPermissions permissions = 3;
  // The metadata of the Community
  ChatIdentity identity = 5;
  // A mapping of chats to their details
  map<string,CommunityChat> chats = 6;
  // A list of banned members
  repeated string ban_list = 7;
  // A mapping of categories to their details
  map<string,CommunityCategory> categories = 8;
  // The admin settings of the Community
  CommunityAdminSettings admin_settings = 10;
  // If the community is encrypted
  bool encrypted = 13;
  // The list of tags
  repeated string tags = 14;
}
message CommunityMember {
  enum Roles {
    reserved 2, 3;
    reserved "ROLE_MANAGE_USERS", "ROLE_MODERATE_CONTENT";
    ROLE_NONE = 0;
    ROLE_OWNER = 1;
    ROLE_ADMIN = 4;
    ROLE_TOKEN_MASTER = 5;
  }
  enum ChannelRole {
    // We make POSTER the first role to be the default one.
    // This is for backwards compatibility. Older protobufs won't have this field and will default to 0.
    CHANNEL_ROLE_POSTER = 0;
    CHANNEL_ROLE_VIEWER = 1;
  }
  repeated Roles roles = 1;
  repeated RevealedAccount revealed_accounts = 2 [deprecated = true];
  uint64 last_update_clock = 3;
  ChannelRole channel_role = 4;
}
3 Likes

Thanks for this. Both directions proposed here seem to provide some really nice properties with the Community membership management almost a useful side effect of each. I would especially want to see de-MLS become standardised for group encryption over Waku.

We discussed this proposal briefly during the Status-Waku call and Status devs will comment whether there is an app need to have centralised, downloadable membership list or whether this can be inferred from individual messages from online members.

Intro

Thanks for the proposals. We have been aware that the community description has been way too big for a while and patching it over time to reduce it as much as possible, but we’ve come to a point where we indeed need a more drastic shift to actually solve the problem.

The most obvious option would be to use IPFS or Codex as you pointed out. However, like you said, it has the problem of pinning and not knowing if it will actually work.

Iuri did bring up the idea of using a contract to store the community or part of it, but we didn’t have time to actually discuss the actual implications.

Complexity of the community ownership

Patryk would also have good insights on what are the challenges.
For example, currently only the control node can send updates to the community description because it simplifies a lot of the flows. One of the reasons being that we also have admins and token masters that can send updates. Currently, we use events that those roles send to one another and stack in a queue, so that when the control node comes back online, it processes the queue of events to generate the new updated community description. That version is accepted by every members, because it is signed by the control node, who can be validated on the Owner token contract.

All that to say that there is a lot of complexity to community ownership.

Sending member updates

Your idea of sending only updates about new members is interesting. It would help a lot with the bandwidth usage, as the member list is one of the most voluminous parts of the community, especially since it can grow almost infinitely (we currently stop it at 10k on the client side).

However, it then relies heavily on the store nodes to be efficient, since if a new member were to join the community, they would need to fetch all the member updates to get a member list that is close to complete.

We could create some sort of message that the control node sends to the new user with the full list only once to fix that, but then we still send a full list of members to every member at least once, so it doesn’t solve the issue.

Another important point is that the channel’s member list is very important. It contains information about who has the rights to read and/or write in a channel.
So it’s important that users get the member list for permissioned channels to know if they have access, otherwise, they might not even see a channel for which they do have access.

That access is recalculated each 8 hours currently by the owner node. This re-evaluation is important, because if a channel has a requirement to own 10 SNT to have access and the user sells it, they shouldn’t have access anymore.

Using a contract

I think using a contract to manage the community is probably the most logical way to fix the issue. It helps with the bandwidth, the accessibility and the decentralization all at once.

One downside is that there will be costs associated, but with current L2s, it’s pretty affordable.

We have not discussed in our team how to achieve using a contract as the source of the data of a community, but like I said, it was raised before by Iuri.

I do think it has a lot of benefits, one being that admins and token masters could be able to modify the community directly, since the admin roles are already managed by tokens, so the edit functions will have the checks to see if the person calling it has the right admin token.

One thing that needs to be discovered and discussed is how to make it as frictionless as possible for the owner.
We wouldn’t want them to have to manually send a transaction each time a user is added to a community. It would remove the current “open” community feature.

One way that comes to mind would be for the people wanting to join a community to send the transaction themselves and if the community is open, then it gets processed and they are added.
The downside here would be that it’s more friction for users.

Another way would be for the owner to send a bank of ETH to the contract, so that when users send the transaction to the contract, it gets paid by that bank (gas relayer type of thing).
The downside here is that malicious users could then create multiple accounts to join over and over to deplete the ETH from the bank.

All that to say, I don’t have a perfect idea, but I’m just brainstorming :smile:

Specialized Store nodes

It was talked about a little bit with Ivan some time ago when he brought up that we “spam” store nodes a lot with messages like community description, status updates (online vs offline) and back ups.

All those messages do not need a historical storage. No one cares if you were online 5 days ago or you don’t care what the member list of the community was last week, you only want the last up to date description with the current members and channels.

So that is why we brought up the idea of having a split store node. The first part would be as it is right now, a store for all messages.

The second part would be more like an actual DB. It would store only the latest message from type X from each person.
So for example, when Bob sends a new backup, the last backup message that was sent by Bob gets overwritten.

The big advantage is that it’s easier to retrieve those messages, for new members for example, since they are not lost in a sea of messages.
The drawbacks are that it would take a lot of work to make work correctly. It also loses the abstraction of the store nodes, so maybe it’s not even possible?
Finally, it doesn’t really solve the bandwidth issues, apart from the fact that we wouldn’t have to resend the same backup or community each ~8 hours(?) if they didn’t change.

Conclusion

I didn’t plan to write a novel, but here we are :sweat_smile:

In the end, I don’t really have a good solution. While using a contract is the coolest option, it also increases friction a lot and has some unsolved issues. Plus it’s a lot of work on the contract side and also for the clients (status-go).

Hopefully my ideas spark some even better ideas that we can then actually implement.
I’m looking forward to see what others think.

Do note that while we did want to improve the community decentralization, it was not in the close term, as we are quite busy with improving the base quality of the app.

3 Likes

somewhat related, if we are considering using smart contracts together with waku/status, this is something that could be worth another look: #STATUSBUIDL week - Moderated Channels - Archive - Status.app

1 Like

In this case, it may be interesting to better under deMLS as the encryption may be useful to have a channel level (instead of community-wide), especially when permissions play a role.

I understand deMLS does have some ACL feature, I wonder if they would be enough to managed permissioned channels (ie, read/write part of the protocol).

Thanks for the insight, again, I wonder if we can build a system where this is directly handled by deMLS and a clever smart contract.

Further review is needed here to understand the restrictions. I assume in an open community, anyone can join but there may be some token gated channels?

It may become inevitable for a reason or another to have regular transactions to manage members. This is where I would expect strong integration with Status Chain to be needed, potentially with a paymaster to assist community owners.

I think something like that should be feasible, especially when using Status chain (just my opinion).

I think we would need to look into the exact user story here. E,g, how do open communities expect to work?

Yes, we are all on the same page on this subject. @kaichao is working on a series of proposal.
A short term solution from @haelius was to split the traffic on different shards, and set different retention policy (eg 30 days vs 8 hours) per shard.

I think we all agree with this requirement and we will investigate the best way to handle it, including considering alternative technologies to Waku (IPFS, Codex, etc).

This is an important point I have raised recently. The Waku team is currently focusing on improving scalability and bandwidth usage for Communities.
If priorities changes from Status side in regards to chat protocol, it needs to stated :slight_smile:

2 Likes

Thanks for all the comments and insights.

@jonathanr, one thing I still wonder about is whether you think there is likely to be a need for a cached Community Description on an off-chain store/DB even if the membership is managed in a smart contract? This assumes that inferring a membership list based only on individual member updates will not be fast enough. In other words, would it be feasible for all community members to perform some on-chain view function to retrieve the list or would we want this to be available off-chain as well (either for robustness, speed, convenience or other reason)? Asking because that will leave us with the same challenge as now: how to store and retrieve the Community Description. To state the obvious - the main issue with Community Description is not so much storing it as the bandwidth consumed while broadcasting it to the Store nodes. If this is a problem we’d have to solve in any case (and possibly for other larger messages, such as the user backups), perhaps the question of the smart contract becomes orthogonal?

Indeed. “Open” just means that anyone can join the community without needing a manual approval from the owner or admin.
The opposite of “open” is “on approval” which, as the name states, requires an admin to manually press “approve” or “reject”.

This is separate from token gating. Even if a community is token gated, eg you need 100 SNT to join, it can be “open”. This means that once someone has 100 SNT, they can join the community automatically (once the control node is online and can validate the ownership of the tokens, but it is done automatically in the background).
Same thing for channel permissions.

As explained above. I guess if we see no other way around it, we could remove the “open” community option, but it then requires manual work from admins.

Indeed, if this becomes a priority for you guys, we can shift priority. I also think that this is important.

This is a good point. If we end up managing members on a contract, that would already reduce the size of the community by a lot, since the member list is one of the biggest parts, especially since it scales the most (up to 10K members at least), and also each permissioned channel currently has a list.

However, I think if we do end up having a smart contract for the members, we might as well use it for the rest of the community?
The only thing then would be to make it smart, similarly to how we have admin events, where we can aggregate changes and then send the final product to the contract only once. The reason being that you don’t want to have to send a transaction for each channel reordering you make for example.

An idea could be to have some sort of “dirty state” with a big “Save and publish” button and “Discard and Cancel”, so that you only send once you’re done with all changes. This is just for the client, but I like writing down the idea to not forget.

Downside of using a contract though is that it now relies on our providers and/or proxy to work well. That’s a problem that is ongoing and the wallet team is working on it, but we always get bit by it.

Anyway, all that to say that if we ever go the direction of using a smart contract, we might as well just throw everything in there, especially if the bandwidth is the big issue.

However, that’s just my own opinion and I might have missed some blind spots that limit doing that.

I’m hoping someone else from the team can chime in with opinions and ideas :smiley:

Allow me to clarify. The decision the Waku team optimizing for Communities first, is my interpretation of Status’ needs and IFT’s directive.

If you, Status team, need to see a shift of properties related to Chat protocol, then we need to know.

Status app requirements sets the priorities in regards to chat protocol work, not Waku :slight_smile:

Priorities being:

  1. Reliable messaging
  2. Scalable communities
  3. Bandwidth efficient communities

IPFS is not the worst idea either. Especially if we get members to pin the latest community description by default.

The conversation here is about understanding what our options are.

Yes, I think we would need to do comparative studies. Removing a regular community description messages from the store node in lieu of series of small messages from online members may be better for store efficiency, or not :slight_smile:

I think we should assume usage of Status chain by default, meaning more control on providers etc that when using an other chain

This goes hand in hand. The community owner may want to deploy a paymaster that can pays for transactions to join their community, as long as the address has 100SNT.
Meaning that it can reduce the risk of depletion, while enabling free joining of the community.

As discussed in Cost related to Waku infrastructure - Messenger - Status.app there is no such thing as free lunch. There needs to be some friction.
If a community is open and “anyone can join” then you will have a problem. I believe what is important is to have enough options on the friction so it can fit the use case. Options are (non-exhaustive) (when joining community):

  1. pays some gas to join community
  2. holds/stake specific tokens
  3. gets approval from community owner
  4. gets invite for existing member
  5. has a RLN membership

What is interesting with (1), (2), (6), is that it could be done fully onchain, in an automated manner (no action for community owner).

1 Like

For community description message, instead of creating a specialized store node, we can have a new protocol (stateful store) for Waku,

  • there needs to be another table, let’s call it states for now
  • each record in states table has fields (id, pubkey, content, pubsubTopic, contentTopic)
  • we will assign a new shard to route the messages of state creation and updates.

Now let’s see how community control node use this protocol,

  • send community description message along with the community pubkey and signature of the message hash
  • store node receives the message, verifies the content in message with the bundled pubkey, further save it to the state table.
  • when new update events like member joins happens, it send a new community description message just like the previous one
  • store node further check and updates the content.

How community members use the description message,

  • periodically check the locally saved description messages matches the stored one
  • the control node could send a description update message which contains the hash of the description and broadcast it to the community assigned shard
  • members receives the short update message and look up the full description from store.

How to incentivize such stateful store is not covered here, as it’s better to discuss it in other posts.

2 Likes

any specific benefit of using a separate shard?

this approach doesn’t seem to help with bandwidth optimization/reduction( which seems to be one of the main reasons why this discussion to move community-description out of waku).

Use a separate shard will not route this message to users, instead user query the description by topics or other keys.

Only the hash of the description may be broadcast via global shard or community assigned shard which is user subscribed.

hey, I’m working on de-mls PoC from ACZ team. Let’s I give briefly review about how it works now and comment on some of the issues.

How it looks like in code:

So far there are only ACL with eth wallet addresses of users that can be added to the group. Also an important point that at this stage for each group needs its own smart contract - further we are considering adding the ability to one smart contract to manage several groups

Smart contract has an owner and only he can add/remove users from acl and at this stage only he can add to the group too.
Here is a step-by-step process of adding to a group with block diagram:

It is also possible to join the group not by one member, but by bunches. We have mechanism for this to both the ACL and the de-mls group directly, once we have the key packages of all members.

I will repeat the main points in brief:

  • Off-chain - in order to add someone to a group we must have their openmls key package (at this stage we are using openmls so we allocate two keys - openmls key for communication within the group eth key for authentication and communication outside the group). Accordingly we request this key package and with it we get the new member’s eth wallet address.
  • On-chain - add this address to ACL.
  • Off-chain - send a commit message to the new member to join, and a commit message to all other members to update the group key (and the tree accordingly).

I’m really not sure that this is possible, at least at this stage - due to the fact that we don’t have built-in mechanisms for validating user rights other than admin (and this is done only because of the smart contract and at the implementation level). In fact all group users are equal and it is only possible to also add levels on top of de-mls

Counter question, what is considered to be join? If it is completely joining the group without any validation from the group member, then no. If we are talking about whether a user can send a request to join the group instead of waiting for an invitation, then technically it is possible. Here - there is such functionality in openmls - but as I said, it requires validation from the group member + the question is how the user gets into the ACL.

Here also rfc link on ACL.

If you have any other questions, I’d be happy to help

I am wondering whether private key rotation is a feasible solution for handling kicks/bans and breaches; i.e., if a member is kicked or banned, we rotate, and if we detect a message from a non-member, we rotate.

The product assumption is that the member list is known for every channel, similar to how it works in Discord. Due to channel-level token-gating, each channel can have a different member list (always a subset of all members). There is at least one important feature built on top of it, which is @mentions.

The community indeed uses symmetric encryption, particularly the Hash Ratchet mechanism. The Hash Ratchet mechanism itself provides forward secrecy; each new message is encrypted with a derived key, specifically the hash of the previous key. Compromise of a derived key should not lead to the compromise of previous messages. To prevent excessive computational cost for clients, there’s a maximum number of hashes that can be performed on a given key, which is why the key is periodically rotated. This rotation also adds partial post-compromise security. There is also a model in place to handle the kicking or banning of members. Each time this happens, the key is rotated, ensuring that kicked/banned members are unable to read new conversations. However, a breach remains possible, as a malicious (kicked/banned) member could share all the rotated keys it received from the control node, allowing any party to read the entire past conversation, but not the future conversation.

Important note: Each encrypted token-gated channel has its own Hash Ratchet key (with the same properties as described above), ensuring that non-members cannot read the content. Moreover, for private channels, the member list is itself encrypted, preventing others from identifying who belongs to private channels.

If we want to maintain token-gated channels, we would probably need to deploy a contract for each channel.

At this point for the app to function properly we need to know the list of members upfront. Please take a look at the first point of this post.

There are some events that are reflected on the client side without control node acceptance, specifically, all events that do not affect the member list of either the community or channels. Every event that can potentially affect the member list, such as editing token permissions, must be approved by the control node, as it is the only one that performs on-chain permission checks and, therefore, is the only entity that can manage member lists.

That would not be possible at the moment for admins, as they do not have access to the revealed addresses of members, so they cannot perform on-chain checks of permissions (evaluate membership). Another issue would be the numerous redundant on-chain calls if we have many admins or token-masters. This is one of the reasons why only the control node is responsible for membership reevaluation, with the primary reason being simplicity to avoid conflicts/edge cases.

I think friction for community owners/token masters/admins is unavoidable due to the membership reevaluation process that must occur from time to time to ensure permissions work as expected.

Indeed. Moreover, the store node would need to be able to check on-chain who the owner of the community is to properly verify the signature. I guess that’s not feasible.

2 Likes

Having everything in one message propagated off-chain on Waku is certainly the most convenient and robust approach. Let me give you an example: when a control node approves a member, it sends a response with the acceptance to the member and updates and propagates the CommunityDescription with the new members list. Let’s assume both messages come in order; the user sees they were accepted into the community, triggering a notification to the client, but the community members list is still not updated because the latter message hasn’t arrived yet. This leads to an inconsistent state, and if the CommunityDescription is late or doesn’t arrive, it becomes problematic. To avoid such ordering/timing issues (which occur in various contexts), we usually attach the necessary information to such messages, and so the CommunityDescription is attached to the CommunityRequestToJoinResponse. This is the easiest solution; otherwise, we would need complex logic to queue the request response until the updated CommunityDescription arrives, if it arrives at all.

Moving CommunityDescription out of Waku will most likely solve the bandwidth issue, but at the same time, it will make the protocol much more complex and require some work for adaptation. That’s the tradeoff we must accept, I suppose.

If so, then we can still achieve the same without introducing a new state store protocol in Waku. Anyways community description will get broadcasted once a day or whenever there is a change in community by the community admin/token-master. If this is moved to a separate shard and clients just use store-query to query for messages of last 24hours for this shard, we can still achieve bandwidth optimization without introducing a new waku protocol. wdyt?

The routing part is similar to previous suggestion of using light push + store queries for those large message (no propagation on relay).

In essence, only the last version of CommunityDescription matters. Which opens opportunities in terms of bandwidth and diskspace saving.

We have identified three strategies to leverage this opportunity, that could be used together:

  • lazy pull: messages should not be broadcast on relay but being pulled by the users. Which is justified because only the most recent message matters. It does not make sense for a user to go online and pull all the versions of the message
  • last version only: only the latest version matters, the system from which the users pulls should not store previous versions. Which implies the system being aware of the specific artefact and its versions
  • update: we may want to also enable an update only from the sender, where instead of pushing the full description, could only update specific fields to save on upload

From a high level, we can:

  • use light push + store (gives us lazy pull)
  • move CommunityDescription message to an diff mechanism where instead of pushing the full message every 8 hours, it pushes a diff from last version and user has to grab all diffs to build messages
  • Build a new Waku protocol that enables those strategies (eg specialized store)
  • Integrate an existing protocol that enables those strategies

I believe it still makes sense to use light push + store and shard segregation as short term strategies.

For the long term solution, we need to look at existing protocols (Codex, BitTorrent, IPFS, gundb) and understand whether or not they fit purpose. If they don’t we can review whether this is something that would make sense to add to the Waku protocol family.

edit: missing requirements to make an informed decision are:

  • data size
  • frequency of update
  • frequency of retrieval
  • latency of either
2 Likes

Thank you @seemenkina

Looking at rfc-index/vac/raw/eth-demls.md at eth-secpm-splitted · vacp2p/rfc-index · GitHub I think it could be possible to have the step “add Bob’s Ethereum” address being self serve. Where a user add their own address on the contract, and are authorised to do so via token they hold.

“send request”, “respond request” and “verify request” would then be replaced by
“smart contract verifies tx sender has token X”
“smart contract verifies tx sender inserts own address”

Then, “admin” node can automatically “Send welcome message” from watching the smart contract, assuming it got Bob’s “openmls key package” because Bob sent them to them or broadcast over Waku.

The issue is that the key is used for gossipsub validation. Meaning you’d have to first propagate the new key, but not use it.
Then everyone will need to start using the new key at the same time to not split the gossipsub network.

  • What about fleet nodes and other Waku nodes deployed to support the community? You’d need to add some API there to allow community owner to tell them new key
  • What nodes which are offlines? they would need to do store queries before they can enable relay

Not impossible but hard.

I think this can be challenged, how often do you stare at the member list?

Yes noted, I think we would need to be able to infer this from somewhere (eg smart contract) in any case.

You’re unlikely to need a @mentions if you have never seen the user online. You would only need to see a user once to get their details via store.

Thanks for the information. All to be taken in account here. It helps.

Great to hear. I didn’t know. Thanks for that.

Why is that an issue? Because of the following?:

Yes indeed, we need to find a balance.

In summary, thank you for the insight. It helps understand the possible directions and risks.