This is a follow-up post on @Daniel’s prior great post on Towards a Waku v2 Security Analysis. Do not miss out on that one!
I took the liberty to branch from that post and kick off the discussion around the waku-relay anonymity.
This post DOES NOT MEAN to be a comprehensive, precise, and final analysis of the waku-relay anonymity, but rather just a discussion on this subject.
Background
Waku is “a stack of modular privacy-preserving (or secure and private) and censorship-resistant p2p protocols”.
When you hear this sentence, a lot of questions arise among which are: how much waku is privacy-preserving? is it truly censorship resistant? etc.
Well, let’s break the privacy-preserving term into Anonymity and Data confidentiality.
Even with the preceding breakdown we still need to clarify what each term exactly means. This will then allow us to onboard Waku users to what Waku actually guarantees or does not guarantee yet.
In this forum post, we shall get laser-focused on the Anonymity aspect with the hope of getting to a shared understanding of what anonymity means in the Waku, what are the adversarial models, and what is the level of protection that Waku can provide.
We would also like to follow a modular fashion and narrow the scope of discussion to the core protocol of Waku which is its transport layer i.e., Waku-relay.
Why first waku-relay? well, if we dissect the waku stack, we will end up with three interaction domains i.e., 1) Gossipsub (Waku-relay) 2) Discovery domain, and lastly 3) the req/reply domain e.g, store/filter/lightpush protocols. See Figure below.
The second and third domains exist to facilitate the first one i.e., the Waku-relay. So, it makes sense to first understand the anonymity of Waku-Relay, and then extend our study to other domains.
What is the objective of this post?
This post provides
-
An intial breakdown of waku-relay anonymity with their respective definitions
-
Adversarial models against anonymity
-
A very rough security analysis of the current state of the waku-relay
Please read it through, and share your thoughts on (including but not limited to)
-
Whether the security definitions can be expanded or remodeled
-
Whether the adversarial model is comprehensive and realistic, also feel free to share another adversarial model you have in mind
-
Your general thought on the security analysis
Anonymity in a Private Transport Protocol
-
Sender anonymity: No global entity except the sender knows which entity owns the message
-
Recipient Anonymity: No global entity except the receiver knows which entity received the message
-
Participation Anonymity: No global entity can discover which two entities are engaged in a conversation except the conversation participants.
-
Unlinkability: No two protocol messages are attributable to the same conversation unless by the conversation participants.
Threat Models
Based on the domain of knowledge, the following non-exclusive categories of adversary exist. Any collusion among the adversaries is perceivable.
-
Local adversary (passive (HbC), active (malicious)): An adversary with the control of the local network
-
Global adversary (passive (HbC), active (malicious)): An adversary with the control of a larger portion of the network e.g., ISPs.
-
Service Providers: Any centralized service operator and aid the messaging system e.g., public key directories.
In this treat model, the end-point security is assumed, hence malware or hardware attacks are precluded.
Also, the adversary has NO Auxiliary Information (background about users). The inclusion of such information would open up all sorts of inference attacks and a countermeasure demands research techniques like differential privacy which is going to be left out of scope for now.
WAKU2-Relay Anonymity Analysis
In the following anonymity analysis, we preclude the metadata included in the WakuMessage, as the unit of data transported using WAKU2-Relay. The waku message and its constituent fields are treated as a black box. The analysis of the metadata included in the WakuMessage will fall into the “conversational security” and deserves a separate track.
Recipient Anonymity: No global entity except the receiver knows which entity received the message
-
Level of privacy: K-anonymity
-
Adversarial model: holds against a global adversary
-
Details: The number of topics transported within the same gossipsub mesh determines recipient anonymity, e.g., if the mesh is used to transport k topics then the recipient anonymity of all the nodes within that mesh is k-Anonymity. That is, every message in that mesh belongs to a participant with 1/k probability.
-
The anonymity level can be increased by generalizing the topics hence supporting more topics within a single mesh
-
Increasing anonymity of recipient comes with the bandwidth penalty for all the participants i.e., nodes have to spend their bandwidth to relay messages not within their interests
The same analysis as above applies to Participation Anonymity.
Sender anonymity: No global entity except the sender knows which entity owns the message
-
Adversarial model: local and global
-
No sender anonymity against even local adversary
Anonymity level against a single non-colluding local adversary with no auxiliary information: K-anonymity i.e., under this adversarial model, the probability that a given message published over a given pubsub mesh belongs to a certain node is 1/N where N is the mesh size.
Rationale: In waku-relay, messages carry no personally identifiable information (PII) about the message owner. This is because waku-relay follows StrictNoSign policy as described in libp2p PubSub specs. As the result of the StrictNoSign policy, Messages should be built without thefrom
,signature
andkey
fields since each of these three fields individually count as PII for the author of the message (one can link the creation of the message with libp2p peerId and thus indirectly with the IP address of the publisher). Due to this, each relayed message can belong to any node in the network with identical probability. -
Anonymity against a more powerful adversary with access to all/large portion of a targeted node’s neighbor list: Removing identifiable information from messages cannot lead to perfect sender anonymity. The direct neighbors of a publisher might be able to figure out which Messages belong to that publisher by analyzing its traffic as explained next. The adversary who is connected to the target node and the neighbors of the target node can eavesdrop on the outgoing traffic of the target node and the neighbors and can spot messages owned by the target node based on the timing difference between the messages. The messages routed sooner by the target node than its neighbors are more probable to be originated by the target node.
-
Anonymity against a more powerful adversary with access to the incoming and outgoing traffic of a target node: in this case, the attacker may realize that some messages appear in the outgoing traffic of the target node but not in the incoming traffic. Those messages are the ones that are originated by that node. (I am personally interested to know how realistic is this attack, share your thoughts)
Unlinkability: No non-global entities except the conversation participants can discover that two protocol messages belong to the same conversation.
-
Level of privacy: K-anonymity
-
Adversarial model: Global
-
The number of topics transported within the same mesh determines the unlinkability level, e.g., if the mesh is used to transport k topics then for every two messages m1 and m2 transported within that mesh, the probability that these two belong to the same conversation is 1/k
-
The anonymity level can be increased by generalizing the topics hence supporting more topics within a single mesh
-
Increasing anonymity of recipient comes with the bandwidth penalty for all the participants i.e., nodes have to spend their bandwidth to relay messages, not within their interests
-
There might be the possibility of timing attacks?