I propose to pair 2 techniques, private set intersection (PSI) and range-based set reconciliation (RBSR) to build a new way to propagate messages in a (Waku) network.
The protocol would work as follow;
- Nodes in the (libp2p) network periodically contact other nodes.
- Each node have a set of topic of interest but are not willing to disclose this. Instead they do a PSI on topics hashes.
- Nodes use RBSR (Waku Sync) to sync messages only on topic of shared interest.
- Nodes keep track of who share their interest the most and bias their connection pattern accordingly while still syncing with random nodes (trades privacy for some efficiency).
The PSI would be based on curve25519 ECDH for efficiency.
This scheme is both simple to implement but provide good privacy and scaling.
Privacy is good because the path a message take is quite random and the timing too, which is similar to a mixnet. The IP address of the original sender is somewhat obfuscated. Topics can be shared to gain k-anonymity too.
Scaling is also good because each nodes only relay messages it’s interested in, in contrast to a pubsub system where all messages must be relayed. Each node can cache the number of messages it wants according to it’s own limits.
WDYT?
Thanks! FWIW I’ve been thinking about how both PSI and Waku Sync (not necessarily together, but also together) could be used to improve content-based protocols like filter, lightpush and store. 
Before responding more comprehensively, just to make sure I understand what you propose:
- with “topic interest” you mean something akin to how we use content topic in Waku, i.e. something that’s likely to tag content per application or per subchannel in an application?
- if so, any thoughts on how very sparse content topics with only a few interested nodes would find each other and be able to build enough redundancy for messages to be delivered to everyone with some probability? Would (k-)anonymity and censorship resistance here be dependent on each node with this topic interest also participating in other topics (otherwise you’ll essentially have a small infrastructure of nodes with a single interest only communicating with each other)?
Yes, some form of coordination, categorization mechanism similar to (or literal) content topics.
There’s nothing preventing sharing a topic to gain some k-anonymity if needed.
For full privacy benefit, “topics” would have to be random 256 bits numbers shared out of band (a la small world). Good luck guessing a topic and you can’t force nodes to reveal topic interests either. The design would also include fake topics to pad the set size.
Same design still allow more public topic where you just hash a word or phrase.
Yes that is what would happen but I would argue that in practice single topic interest would be rare but it is true that an observer of the entire network could track metadata about who contact who in that case.
Discovery assumes you already know at least one node or worst case use a DHT.
For delivery probability, best bet is birthday paradox but even then it’s hard to estimate… Too many variables (the main down side of this approach vs structured p2p networks).