Improving Content Topics: A Coordinate-Based Approach for Better Privacy and Distribution
Current State and Problems
The current content topic format in Waku follows this structure as defined in 23/WAKU2-TOPICS and RELAY-SHARDING:
/{generation}/{application-name}/{version-of-the-application}/{content-topic-name}/{encoding}
For example: /0/myapp/1/mytopic/cbor
Whilst this format serves basic functionality, it introduces several significant challenges:
1. Complexity for Developers
Developers must navigate autosharding decisions early in development:
- Should they use a fixed app name (concentrating all users in one shard)?
- Should they use dynamic app names for multi-shard scaling?
This creates an unnecessary barrier to entry and potential lock-in scenarios with difficult upgrade paths.
2. Privacy Concerns
The application name appears in clear text within content topics, completely removing plausible deniability. Any observer can definitively identify which application a given IP address is using based on filter subscriptions, store queries, or light push messages.
3. Uneven Traffic Distribution
When applications use static app-name
values, they may create uneven traffic distribution in a shared network, with some shards becoming heavily loaded whilst others remain underutilised.
Proposed Solution: Coordinate-Based Content Topics
I propose a new approach based on application “coordinates” that addresses all these challenges simultaneously, with a simplified content topic format.
New Content Topic Format
/1/<topic>
Where:
1
is the autosharding version (generation)<topic>
is the numeric coordinate generated for a user
For example: /1/33024
Core Concept
- Application Coordinate: Each application generates a unique coordinate within a defined space (e.g., 0-65535)
- Distance Parameter: Applications define a statistical distance within which their content topics are generated
- Content Topic Generation: Individual content topics are generated randomly within the distance range of the application coordinate
- Coordinate-Based Sharding: Sharding is based on the content topic coordinate, meaning that topics close together are more likely to be in the same shard, simplifying the autosharding approach
- Shared Topic Space: Multiple applications may statistically generate identical content topics, enabling plausible deniability
Example Scenario
Consider a network with 8 shards (content topic space 0-65535, each shard covering ~8192 topics):
- “CoolGame” has coordinate
32768
with distance4096
- “SecureChat” has coordinate
40960
with distance4096
Alice (CoolGame user): Gets content topic 36864
→ /1/36864
(distance 4096 from coordinate)
Bob (SecureChat user): Gets content topic 36864
→ /1/36864
(distance 4096 from coordinate)
Despite using different applications, Alice and Bob share the same content topic, providing perfect plausible deniability.
Benefits
Simplified Developer Experience
- Generate a random coordinate once during project setup (tooling provided)
- Choose a distance parameter (sensible defaults available)
- Generate user content topics automatically (utilities provided)
Enhanced Privacy
- Multiple applications can share content topics
- No clear text application identifiers
- True plausible deniability for users
Improved Traffic Distribution
- Statistical distribution prevents hot spots
- Configurable distance allows scaling control
- Better load balancing across shards
Optimised Shard Usage
- Applications can guarantee maximum shard usage (based on distance vs shard size)
- Predictable scaling characteristics
Proposed Algorithms
1. Coordinate Generation
function generateAppCoordinate(spaceSize = 65536) {
return Math.floor(Math.random() * spaceSize);
}
2. Content Topic Generation
function generateContentTopic(appCoordinate, distance, spaceSize = 65536) {
// Calculate start of available range
const rangeStart = appCoordinate - distance;
// Generate coordinate within range
return rangeStart + Math.floor(Math.random() * (2 * distance));
}
3. Shard Mapping
function contentTopicToShard(contentTopic, totalShards, spaceSize = 65536) {
const shardSize = Math.floor(spaceSize / totalShards);
return Math.floor(contentTopic / shardSize);
}
// Example: content topic 36864 with 8 shards
// shardSize = 65536 / 8 = 8192
// shard = Math.floor(36864 / 8192) = 4
Scaling Scenarios
This coordinate-based approach handles network scaling elegantly:
Scenario 1: Network-Wide Shard Increase (8 → 16 shards)
When the network decides to increase shards from 8 to 16 due to overall traffic growth:
- Previous: Each shard covered ~8192 topics (65536/8)
- New: Each shard covers ~4096 topics (65536/16)
- Impact: Content topic
/1/36864
moves from shard 4 to shard 9 - Migration: All applications automatically benefit from reduced per-shard traffic
- Developer Action: If the number of shards is defined in a unique source of truth, such as a smart contract, with time-based information (e.g., from block N of the chain), then applications can automatically upgrade with minimum disruption
Applications with a large distance parameters (see Scenario 2), could have their shard range span reduced (flooring to 2).
Scenario 2: Application-Specific Scaling
When an application experiences high user traffic but network shards remain constant (8 shards):
Option A: Increase Distance Parameter
- Current: distance 4096 (max 2 shards)
- New: distance 8192 (max 3 shards)
- Effect: Spreads users across more shards, reducing per-shard load
Option B: Multiple Coordinate Ranges
- Deploy additional coordinate ranges for the same application
- Each range operates independently with its own distance parameter
- Users are assigned to different ranges during onboarding
Developer Considerations:
- Monitor per-shard traffic for your application’s coordinate range
- Adjust distance parameters based on actual usage patterns
- Consider user experience impact of spreading across more shards
Implementation Considerations
Backward Compatibility: This could be introduced as generation 1 (/1/<topic>
) whilst maintaining support for the current generation 0 format.
Tooling Requirements:
- Coordinate generation utilities
- Content topic generation libraries
- Shard calculation helpers
- Migration tools for existing applications
Configuration Options:
- Default distance parameters for different application types
- Space size configuration for different network scales
- Shard count adaptation algorithms
Request for Feedback
This proposal represents a significant improvement to Waku’s content topic system, addressing developer experience, privacy, and network performance simultaneously.
Key questions for discussions:
- What are your thoughts on this proposal and the simplification of the content topic?
- Do you see potential issues with this model in terms of new app scalability, especially re Status/Chat SDK? Comparing to old model?
- Do you agree with the overall simplification? Reducing code and dev ex complexity.
- The proposed algorithms are extremely sample, is there different/specific algorithms you would like to see employed instead?
- Do you see any reason why we should proceed with this change?