On the anonymity of Waku-Relay

sanaz · May 13, 2022, 11:12pm

This is a follow-up post on @Daniel’s prior great post on Towards a Waku v2 Security Analysis. Do not miss out on that one!

I took the liberty to branch from that post and kick off the discussion around the waku-relay anonymity.

This post DOES NOT MEAN to be a comprehensive, precise, and final analysis of the waku-relay anonymity, but rather just a discussion on this subject.

Background

Waku is “a stack of modular privacy-preserving (or secure and private) and censorship-resistant p2p protocols”.

When you hear this sentence, a lot of questions arise among which are: how much waku is privacy-preserving? is it truly censorship resistant? etc.

Well, let’s break the privacy-preserving term into Anonymity and Data confidentiality.

Even with the preceding breakdown we still need to clarify what each term exactly means. This will then allow us to onboard Waku users to what Waku actually guarantees or does not guarantee yet.

In this forum post, we shall get laser-focused on the Anonymity aspect with the hope of getting to a shared understanding of what anonymity means in the Waku, what are the adversarial models, and what is the level of protection that Waku can provide.

We would also like to follow a modular fashion and narrow the scope of discussion to the core protocol of Waku which is its transport layer i.e., Waku-relay.

Why first waku-relay? well, if we dissect the waku stack, we will end up with three interaction domains i.e., 1) Gossipsub (Waku-relay) 2) Discovery domain, and lastly 3) the req/reply domain e.g, store/filter/lightpush protocols. See Figure below.
waku-arch-wide

The second and third domains exist to facilitate the first one i.e., the Waku-relay. So, it makes sense to first understand the anonymity of Waku-Relay, and then extend our study to other domains.

What is the objective of this post?

This post provides

An intial breakdown of waku-relay anonymity with their respective definitions
Adversarial models against anonymity
A very rough security analysis of the current state of the waku-relay

Please read it through, and share your thoughts on (including but not limited to)

Whether the security definitions can be expanded or remodeled
Whether the adversarial model is comprehensive and realistic, also feel free to share another adversarial model you have in mind
Your general thought on the security analysis

Anonymity in a Private Transport Protocol

Sender anonymity: No global entity except the sender knows which entity owns the message
Recipient Anonymity: No global entity except the receiver knows which entity received the message
Participation Anonymity: No global entity can discover which two entities are engaged in a conversation except the conversation participants.
Unlinkability: No two protocol messages are attributable to the same conversation unless by the conversation participants.

Threat Models

Based on the domain of knowledge, the following non-exclusive categories of adversary exist. Any collusion among the adversaries is perceivable.

Local adversary (passive (HbC), active (malicious)): An adversary with the control of the local network
Global adversary (passive (HbC), active (malicious)): An adversary with the control of a larger portion of the network e.g., ISPs.
Service Providers: Any centralized service operator and aid the messaging system e.g., public key directories.

In this treat model, the end-point security is assumed, hence malware or hardware attacks are precluded.

Also, the adversary has NO Auxiliary Information (background about users). The inclusion of such information would open up all sorts of inference attacks and a countermeasure demands research techniques like differential privacy which is going to be left out of scope for now.

WAKU2-Relay Anonymity Analysis

In the following anonymity analysis, we preclude the metadata included in the WakuMessage, as the unit of data transported using WAKU2-Relay. The waku message and its constituent fields are treated as a black box. The analysis of the metadata included in the WakuMessage will fall into the “conversational security” and deserves a separate track.

Recipient Anonymity: No global entity except the receiver knows which entity received the message

Level of privacy: K-anonymity
Adversarial model: holds against a global adversary
Details: The number of topics transported within the same gossipsub mesh determines recipient anonymity, e.g., if the mesh is used to transport k topics then the recipient anonymity of all the nodes within that mesh is k-Anonymity. That is, every message in that mesh belongs to a participant with 1/k probability.
The anonymity level can be increased by generalizing the topics hence supporting more topics within a single mesh
Increasing anonymity of recipient comes with the bandwidth penalty for all the participants i.e., nodes have to spend their bandwidth to relay messages not within their interests

The same analysis as above applies to Participation Anonymity.

Sender anonymity: No global entity except the sender knows which entity owns the message

Adversarial model: local and global
~~No sender anonymity against even local adversary~~
Anonymity level against a single non-colluding local adversary with no auxiliary information: K-anonymity i.e., under this adversarial model, the probability that a given message published over a given pubsub mesh belongs to a certain node is 1/N where N is the mesh size.
Rationale: In waku-relay, messages carry no personally identifiable information (PII) about the message owner. This is because waku-relay follows StrictNoSign policy as described in libp2p PubSub specs. As the result of the StrictNoSign policy, Messages should be built without the from, signature and key fields since each of these three fields individually count as PII for the author of the message (one can link the creation of the message with libp2p peerId and thus indirectly with the IP address of the publisher). Due to this, each relayed message can belong to any node in the network with identical probability.
Anonymity against a more powerful adversary with access to all/large portion of a targeted node’s neighbor list: Removing identifiable information from messages cannot lead to perfect sender anonymity. The direct neighbors of a publisher might be able to figure out which Messages belong to that publisher by analyzing its traffic as explained next. The adversary who is connected to the target node and the neighbors of the target node can eavesdrop on the outgoing traffic of the target node and the neighbors and can spot messages owned by the target node based on the timing difference between the messages. The messages routed sooner by the target node than its neighbors are more probable to be originated by the target node.
Anonymity against a more powerful adversary with access to the incoming and outgoing traffic of a target node: in this case, the attacker may realize that some messages appear in the outgoing traffic of the target node but not in the incoming traffic. Those messages are the ones that are originated by that node. (I am personally interested to know how realistic is this attack, share your thoughts)

Unlinkability: No non-global entities except the conversation participants can discover that two protocol messages belong to the same conversation.

Level of privacy: K-anonymity
Adversarial model: Global
The number of topics transported within the same mesh determines the unlinkability level, e.g., if the mesh is used to transport k topics then for every two messages m1 and m2 transported within that mesh, the probability that these two belong to the same conversation is 1/k
The anonymity level can be increased by generalizing the topics hence supporting more topics within a single mesh
Increasing anonymity of recipient comes with the bandwidth penalty for all the participants i.e., nodes have to spend their bandwidth to relay messages, not within their interests
There might be the possibility of timing attacks?

blagoj · May 17, 2022, 10:26pm

Hello Sanaz, great security analysis.

I think what is important to highlight is the Sender anonymity. I think it is the weakest point of the system in terms of other types of anonymities. The strict-no-sign policy does not do much on network level, where an adversary can perform metadata analysis. The adversary does not need to be an ISP, but simply a party controlling a botnet and having the nodes distributed through the network. Then they could monitor traffic and collect data about the state of the network.
This is not trivial to do, but definitely doable and depends on the size of the network and additionally the resources the adversary has and their dedication.
Additionally depending on the application, an application level user identifiers can be associated with node ip addresses.
In my opinion the current waku-relay implementation does not provide strong enough sender anonymity. We’ve studied similar problem and the gossipsub strict-no-sign policy is a weak measure. Additionally the relevance of this metric is very application dependent (i.e sender anonymity might not be necessary for some applications), but for waku I think it is very relevant.

As a counter measure we’ve studied implementing Dandelion++ on top of gossipsub. This can help obfuscating the sender, and can improve things a lot in terms of anonymity. The downside is the additional latency (dependent on the number of stem hops), and also it is not fully resistant to large attackers. However it offers better anonimity than the plain gossipsub implementation, and if latency is not a concern this can be utilised (the paper states a ±300ms latency per stem hop).
There are additionally other solutions for message obfuscation on network layer, such as onion routing and others, which might be explored as a first step (i.e obfuscate message sender first, then relay to gossipsub, message sender node != message publisher node).