Optimizing Community Description

kaichao · September 27, 2024, 2:29am

Yeah, it will work, but it’s not a general solution for stateful storage, some message may not send every 24 hours, and multiple versions needs to be handled in client side. The stateful store is mean to handle this conflict and reduce version to save more bandwidth.

To update specific fields is too much complexity. A KV store is a much easier solution for decentralized network in my gut feeling. Also if specific fields matters, it should be split into another KV record.

I feel the changes to implement this diff mechanism is also big and needs Waku protocol level support, a generalized Sync protocol is probably needed for this to happen. cc @SionoiS

prem · September 27, 2024, 5:37am

agreed, but i think this protocol will need a little more thought and research to handle other use-cases as well. it doesn’t seem to cover a lot of scope just for this single use-case where every-time all data is over-ridden.
i have been thinking about a similar state-mgmt protocol as well, and i think this protocol needs to support working with state diffs as well as @fryorcraken suggested.

which is why as a stop-gap wondering if we can quickly migrate these messages to separate shard and use existing store protocol itself to query required data at client. At max the overhead on clients would be to handle maybe few more messages which should not be a huge bandwidth overhead considering these won’t be via relay anymore.

Patryk · September 27, 2024, 8:24am

I was describing the state of the app requirements as they are now. If we need to make that compromise and build the members list dynamically due to technical reasons, I am totally fine with that. We just need to be careful not to break users’ expectations.

There are cases where users are inactive on given channels and yet might be targets of mentions on different ones. For example, one can participate in only a few channels of the community, but we might want to tag them in another where an important discussion is ongoing. This scenario can be mitigated with various techniques, such as displaying every active user of a community in the mention list and performing channel membership validation on-chain just before sending (if the member is not in the dynamically built member list, of course).

In the context of this discussion, I assumed that the edit functions refer to membership manipulation. I then assumed that many parties, such as the owner and token masters, would be performing members reevaluation to determine if members satisfy modified permissions. This would lead to redundant on-chain checks. This is an issue because reevaluation itself is very greedy and was identified as one of the reasons why the infura limit was exhausted in the past.

I probably misunderstood, and @jonathanr meant edit functions in general. That’s fine; we just need to figure out members reevaluation. For simplicity’s sake, it could still be performed by the owner, who would trigger it once they recognize that the permissions have been modified in the CommunityDescription on-chain.

No problem, happy to help!

Patryk · September 27, 2024, 8:40am

Might be dumb question but why not to push CommunityDescription to store nodes directly then, i.e. don’t use relay for this type of message at all?

Patryk · September 27, 2024, 9:30am

fryorcraken:

In essence, only the last version of CommunityDescription matters. Which opens opportunities in terms of bandwidth and diskspace saving.

We have identified three strategies to leverage this opportunity, that could be used together:

lazy pull: messages should not be broadcast on relay but being pulled by the users. Which is justified because only the most recent message matters. It does not make sense for a user to go online and pull all the versions of the message

last version only: only the latest version matters, the system from which the users pulls should not store previous versions. Which implies the system being aware of the specific artefact and its versions

update: we may want to also enable an update only from the sender, where instead of pushing the full description, could only update specific fields to save on upload

I think the strategy makes sense in general, but I want to emphasize that CommunityDescription is attached to various message types, and the periodic community publishing done by the owner might not be the biggest concern here in terms of bandwidth.

CommunityDescription is part of:

CommunityRequestToJoinResponse, which is propagated through relay once if the owner accepts the request on their own, or twice if acceptance comes from a privileged member.
CommunityEventsMessage, which is propagated through relay each time privileged members make changes to the community.
SyncInstallationCommunity, which is propagated through relay each time a backup is created or a sync is performed. Backup occurs every 8 hours or on demand. Sync is performed when there are paired devices and the user either requests to join the community, joins the community, leaves the community, or opens the community on mobile (fix(communities)!: stop syncing community on `LastOpenedAt` update by osmaczko · Pull Request #5884 · status-im/status-go · GitHub).

We will need to rework all of these to not contain CommunityDescription.

kaichao · September 27, 2024, 10:08am

There are multiple stores and the only way to share/save messages is via relay currently. The store sync protocol (not fully implemented) may be applied here actually, so that we don’t depend on the relay.

SionoiS · September 27, 2024, 11:41am

It seams to me like some of the problems here boils down to IAM, PKI and decentralized identity.

This spec here solves those problems quite well. GitHub - cryptidtech/provenance-specifications: Specifications for the various new pieces of provenance tech

This video helps understand. https://www.youtube.com/watch?v=LxU4wG4ryFo

If we don’t want to impl. stuff ourselves there’s this company. https://www.cryptid.tech/

Their tech is already built on IPFS.

jonathanr · September 27, 2024, 4:03pm

I think I did have in mind not just edits, but also members management. However, you’re right that those require chain look ups, so it’s not a good idea to have all admins do that. Especially considering normal Admins (not TokenMasters) don’t even have the shared addresses.

In theory, if we do use a smart contract, the users’ addresses would become “public” anyway, so we can have the admins receive the addresses as well.

However, like you said, the full re-evaluation should still be done only by the control node or need a mechanism where only one does it, because it is very request consuming.

So yeah, it’s not simple, but might be doable.

fryorcraken · September 30, 2024, 3:36am

Yes, this is one of the short term strategies we should be using. Using light push to push a message to store node, and not propagating it through relay.

I agree with you. I wanted to through the diff in the mix but a first solution may not need that. In any case, as @SionoiS stated:

A store latest only scenario will need for the sender to prove that the new copy is from the same origin.
Meaning that there needs to be some identity. It would be a very narrow application:

Message X is received and store
Message Y is received and signals here to override X
Some zk id needs to be in place to prove that Y and X are from the same sender.

Yes, noted. Indeed, if CommunityDescription is the first problem to solve, then solution cannot seem to be routing/storage only. We may need to refer to a messages instead of including it and do the necessary adjustement on the application protocol level.

@kaichao I suggest next step is to set a roadmap, ie, steps, to understand what matters most before proposing further solutions.
To analyse that, I suggest moving forward with splitting message type per shards, which will enable us to better understand the volume per type, as well as getting some more fine tune control on message retention.

vpavlin · October 3, 2024, 12:44pm

Sounds like something hopefully solved by e2e reliability (with the causal dependencies etc.)

haelius · October 17, 2024, 9:21am

Although e2e reliability should help guarantee eventual consistency, it will not necessarily prevent the separate Community Description from arriving late. This means there still might be a period of inconsistent state where the “accept” is received before the user is included in the Community Description. I can imagine a different workflow though, where the Community Description is sent before the accept response (as causal dependency of the response), although this may presumably also cause state issues if the accept takes long to deliver. A completely different mechanism may include metadata in messages piggybacking community descriptions that tie each message to a specific versioned update of the Community Description. This version MUST be uploaded separately to the Store node with high reliability before sending the message - a more complex mechanism, but probably possible to design.

prem · October 17, 2024, 12:46pm

I was thinking about similar scenarios today while going through communities code, because this would be a common problem for any app when it comes to state change and messages that become valid only after a state change. As messages may come out of order, they would have been received before state-change, processed and dropped/ignored due to old state. Once new state is updated somehow these older discarded messages need to be re-processed. This could be something we can document as a best practice to be handled?

kaichao · October 24, 2024, 8:41am

Early draft about the roadmap and steps on this page: