A series of breaking changes for chat protocols are expecting to come from current studies:
I suggest a template strategy to adopt, to ensure that expectations are set and problems avoided.
While the focus of this post is the handling of breaking changes for the Status app, this can also be used as a guideline for any application using Waku.
Generic roll-out strategy
I usually use these steps when applying breaking changes in any type of application where software is rolled out to the user (ie, not a website).
The breaking change is old protocol to new protocol:
Read and write are generic terms, and could be replace with subscribe and send in the context of Waku and Chat.
Step | App Version | Read | Write |
---|---|---|---|
Initial | N |
Old protocol | Old protocol |
Preparation | N+1 |
Old and new protocol | Old protocol |
Switch | N+2 |
Old and new protocol | New protocol |
Clean-up | N+3 |
New protocol | New protocol |
The steps above are related to the code. In terms of data, a migration step may be needed before clean-up to convert data in the system to the new protocol format.
+1
, +2
, etc in the table may not refer to the exact app version bump, see next section.
Prerequisite
First we need to define the support period for an application. This is usually defined in a time (e.g. 1 year) or release (e.g. 3 major releases).
Timeline
N
is the last version to only read from the old protocol. Only when N
’s support is ended can we proceed with the switch step.
N+1
is the last version to write in the old protocol. Only when N+1
’s support is ended can we proceed with the clean-up step.
For example, if support is set to 1 year from release date, and release happen every month:
Time from N |
Step | App Version | Read | Write |
---|---|---|---|---|
0 | Initial | N |
Old protocol | Old protocol |
1m | Preparation | N+1 |
Old and new protocol | Old protocol |
1y | Switch | N+2 |
Old and new protocol | New protocol |
1y1m | Clean-up | N+3 |
New protocol | New protocol |
Status Communities
Status Communities are mostly discreet (apart from request to join flow), which enables a faster roll out strategy:
- Switch can be enabled to newly created communities. The gap between preparation and switch can be reduced or eliminated. However, only users with the latest app version can join the community.
- Enabling new protocol at community creation can be a switch at first, and enabled by default later (default setting is a product decision).
- Migration and clean-up step can be scheduled based on product needs by answering the question: what is the value of migrating pre-existing communities to the new protocol?
- A property on the community information can used to signal the usage of either protocols and enable correct handling.
Step | App Version | Community creation |
---|---|---|
Initial | N |
Old protocol by default |
Preparation + Switch | N+1 |
New protocol opt-in switch |
Crystallize (optional) | N+2 |
New protocol by default |
Migrate (optional) | N+2 |
Existing communities migrate from old to new |
Clean-up | N+3 |
Remove old protocol support from code |
The restrictions are:
N+2
should not be done untilN
is EOL (end-of-life)N+3
should not be done untilN+1
is EOL
Examples
Some examples of expected (not committed) breaking changes are:
- Communities created on common shard (app key DoS protection) to communities created on dedicated shard (pre-share key DoS protection)
- New content topic strategy
- Using e2e reliability protocol
- All messages in one shard to messages segregated based on retention needs
One-to-one and private group chats
_ For the sake of brevity, we will refer as one-to-one chats for both one-to-one direct message and private groups_.
One-to-one chats do not offer the same flexibility as communities in terms of breaking changes. Yet, there are some options to dogfood and rollout breaking changes.
Experimentation and dogfooding
When rolling a new breaking changes, a feature switch can enable this changes in the app, with a caveat of no backward compatibility. This can help dogfooding and experimenting with the change, without proceeding with the migration yet.
In this scenario, users that enable the change are aware of the lack of backward compatibility and may switch and back and forth.
This should be considered in a similar manner to changing target fleet. Where the app must be restarted.
Once the feature is stabilized then the generic roll-out strategy can be applied:
read: old and new protocol vs bridging
Depending on the change, this may mean subscribing to both old and new content topics, or to different shards.
For example, assuming the change is migrating one-to-one chat from a single shard to a range of shards. In this case, the application would subscribing to both the single shard and the new shard range for a period of time.
Another strategy previously implemented is bridging: Status runs a software to route/convert messages from the old protocol to the new protocol.
Bridging comes with a number of caveats that are often directly related to the breaking change itself:
- scalability: if the new protocol is brought in to increase scalability, it means that bridging messages from new to old is likely to aggravate the scalability issues of teh old protocol.
- security: if the new protocol brings new security features (e.g. RLN), then it may not be possible to bridge. Indeed, messages going from a non-RLN relay network to RLN relay network would need RLN proofs. If a bridge attaches all proof for free, then abuse may come from the non-RLN network, negating the intended effect of the breaking change.
Which is why I would recommend against bridging strategy.
Data Migration
Data migration usually consists in running a database script to update data format.
In the case of Chat, it could mean copying messages stored with a specific shard or content topic value, to another shard or content topic.
Migration strategy is similar to bridging, where the effect of converting/copying data from one network to another may negate the intend effects.
It also assumes that all store nodes can be migrating. Which becomes less likely as the network becomes more decentralised.
Hence, I would also recommend against deploying migration strategies.
Proposed Next steps
- Waku team to draft roll out proposal for non-backward compatible protocols. e2e reliability protocol for communities is likely the first candidate
- Status team to define support term for Status apps in terms of versions or time scale
- Waku chat and Status teams to review how a community could be flagged with a specific feature, to help manage incompatibility. E.g. attribute in community invitation link + description message
- Waku chat and Status teams to consider UX around incompatible communities, a generic solution is likely to be enough regardless of the migration. e.g. “This community has been created with a new version for the app, please update to join”.