Towards a Waku v2 Security Analysis

We are planning to write a research log post introducing a Waku threat model,
and future privacy and anonymity goals.
This will not (yet) be part of a thorough scientific security analysis, and more along the lines of the ProtonMail threat model.

This forum post comprises a non-comprehensive list of aspects the Waku v2 threat model should cover,
and also touches on possible countermeasures, with a focus on Waku’s request response protocols.
(For now, it is just a braindump; I will add more, foremost a list of attacks, after finishing current prio tasks.)
The purpose of this post is starting a discussion and filling in the gaps.

The post on the research log should provide readers with

  • a definition of relevant terms
  • adversarial model(s),
  • a threat model,
  • a list of threats Waku v2 protects against,
  • an explanation of how Waku protects against these threats, and
  • what is out of scope.

The list of attack based threats (detailed in the threat model) should be analyzed in (various) adversarial models. (This can be done informally at first, giving an overview over existing attacks and reasoning as to why we expect these attacks to be thwarted in specific adversarial models.)

Edit: As comments suggest, it makes sense to keep a complete threat model / security analysis as a future goal while having a series of posts on parts, e.g. a post on security / anonymity relay protocol.
Also see @sanaz post on the anonymity of Waku-Relay.

Adversarial Models

Adversarial models describe the power of an attacker.
Power in this context includes but is not limited to

  • access to resources (e.g. nodes controlled in a network)
  • passive vs active: while passive attackers just listen to traffic, active attackers might inject, alter, and drop messages
  • position in the network
    • internal vs external (access to a network link or not)
    • being able to see ingress and/or egress traffic

Dolev-Yao Model

In the Dolev-Yao model the attacker has full control over the whole network and is only limited by the employed cryptographic measures.
Ideally, Waku v2 would provide all of privacy, anonymity, and censorship resistance in this model.
(However, this model is unrealistically strong, and even when introducing some select relaxations, we still have a very strong model.)

Tor Model

Tor research typically models the attacker as either

  • an AS-level attacker that has full access over the infrastructure of an autonomous system, or
  • an attacker that controls p% (often 20%) of the Tor nodes

Tor does not protect against an attacker that can observe both entry and exit node.

A thorough comparison with Tor seems out of scope because Tor’s main application area is web traffic while Waku’s application area is messaging.
For messaging, Waku will have superior anonymity properties: Waku does not depend on low latency and can leverage mix-nets (see below).
A short comparison seems useful though, because many readers will be familiar with Tor and seek an answer to the question how Tor compares to Waku.

Weaker Models

TODO

Desired Properties

Guaranteeing Properties like privacy, anonymity, and censorship resistance only makes sense
when these properties are clearly defined, and when stating under which adversarial model they are guaranteed.
The Waku security analysis should define these properties; the following gives some first impression / basis for discussion.

Privacy

Waku does not involve any trusted third party that has access to user’s private data.
Thus, Waku v2 avoids typical privacy issues by design.

Privacy issues we should investigate in the Waku v2 threat model are security threats that impact the user’s privacy.
Any attack breaching confidentiality would effect the user’s privacy.
Waku v2 provides confidentiality via noise.

However, anonymity can be seen as a part of privacy, too.
Attacks involving meta-data analysis effect the user’s privacy.

Open question: How should we define privacy?

An (incomplete) suggestion:

We define privacy as keeping private information about users and generated by users private, i.e. not disclosing this information to entities the user did not agree to.
In the context of Waku v2, this private information comprises:

  • message content (only disclosed to intended recipients)
  • identity
  • usage patterns

This information must also be protected from inference attacks.

Anonymity

With the above definition of privacy, anonymity is a part of privacy.

Receiver Anonymity

  • hiding the interest in certain topics
  • for any given message, it must be impossible to infer receivers

(Receiver anonymity for 1:1 communication is out of scope for now, because 1:1 communication is not part of Waku.
1:1 communication protocols can be designed on top of Waku v2.
If such a protocol should become part of Waku in the future, it will be included in the Waku threat model.)

Sender Anonymity

It must be impossible to infer the original sender of a message on the Waku v2 layer (and ideally on deeper layers, too).
Protocols running on top of Waku v2, e.g. a 1:1 messaging protocol, can relax this: the receiver is allowed to know who the sender is.

Censorship Resistance

Censorship resistance is a property the threat model has to cover separately.
Even with privacy and anonymity protected, censorship resistance is not guaranteed.

Censorship resistance requires Waku v2 traffic not to be fingerprintable.

Further Security Properties

DoS Resilience

Threats

(Focus on attack-based threats.)

TODO

  • Collusion of a set of (relay) nodes
    • homomorphous to an attacker controlling a set of nodes

attacks against discv5

Waku v2 uses discv5 as a means of ambient peer discovery.
We have to make sure that ambient peer discovery does not leak additional private information about peers.

Attacks against discv5 comprise

  • Sybil attacks (a general problem)
  • eclipse attacks

Privacy issues in the current version of Waku

  • sender timestamp allows fingerprinting / tracking
  • static PeerID allows tracking and identifying peers

Protection Methods

Sender Anonymity

We could introduce pluggable anonymity in Waku v2, which allows using one of the following mechanisms

  • none
  • Dandelion++
  • mix-net: Nym
  • Tor (as a fallback if mix-net is not accessible)

Because Waku does not require low latency, we can leverage mix-nets like Nym and achieve stronger anonymity guarantees than Tor.
Even with an attacker observing the whole network, mix-nets can protect anonymity.

The pluggable anonymity method could be plugged in between the sender node and the indented receiver peer.
For the relay protocol, the indented receiver would be the first relay node.
All further relay hops would not require any further anonymity methods.

For these mechanisms to work, messages are not allowed to contain identifiers like the PeerID.

protecting timestamps

Instead of setting a sender timestamp, the first relay that sees a message could set the timestamp

  • the message might come directly from the sender in case identity protection is off
  • the message might come from a mixnet (with a slight delay) in case protection is on

However, as @s1fr0 pointed out, an attacker could use the timestamp field to encode and transport information along the relay path.
(To protect anonymity, the timestamp field cannot be authenticated.)

Censorship Resistance

A possible countermeasure against fingerprinting is pluggable transports.

3 Likes

Great to see this coming along!

Although privacy and anonymity are strictly connected, I would keep them as separate concepts. Privacy is about the ability to selectively reveal information to some party, (included identity) while anonymity is the ability to keep your identity (real or artificial) unlinkable from your actions.

In your post was implicit in many points, but I think privacy definition should stress (at least to some extent) reduction of metadata leakages, possibly on all protocol/network layer employed. This is very hard to formalize and probably harder to measure how Waku can be safe from private information inference using available metadata. However I think is worth to mention as one of the security goal we want to achieve, especially because this justifies some implementation choice we have already made and possibly future designs. So, to informally sum up, privacy should not only allow users to be the only one able to selectively decide to whom reveal some of their information, but ideally any of their actions should reveal nothing except the fact that someone is performing something in the network (and in order to address anonymity as well someone should means: never seen before, never seen again and with no obvious relation to anyone else (past and future)).

I also think that Censorship Resistance can be hugely impacted by discovery methods: not only because new joiners can be redirected to all-malicious-nodes parallel networks where extensive (meta)data collection may help in running targeted attacks, but also because it may represent a single point of failure which prevents users to connect to the network at all (I’ve in mind, for example, shutdown of the DNS server used to discover nodes in one of the implemented discovery methods).

Some other possible threats include deliberate backdoors or bugs in the code-base and/or cryptographic primitives/protocol designs employed.

1 Like

Thanks @ksr for the forum post!
I have some questions and suggestions:
1- There is an initial security analysis of waku req/reply protocols namely, store, filter, and light-push which you can find in their respective specs, e.g., for Waku-Store protocol 13/WAKU2-STORE | Vac RFC, the analysis provides an initial adversarial model, the security considerations and the definitions of the desired security properties 13/WAKU2-STORE | Vac RFC. I strongly recommend benefiting from that past effort, especially the security definitions.
2- I see you have mentioned

Waku does not require low latency

why is that?
3- What do you mean by “AS-Level” attacker?
4- As each req/reply protocol messages embody different metadata, it would make sense to analyze them separately, as they will be subject to different types of (inference, traffic analysis) attacks.

Another suggestion for keeping the scope of the current analysis more contained and small is to follow a more modular approach i.e., first get focused on the security analysis of the waku-relay, and then later extend the work to the req/reply protocols. I think this was also the initial plan when we wanted to compare Tor with Waku.

Here is the initial waku-relay security analysis research/WAKU2-Relay-Privacy.md at master · vacp2p/research · GitHub

Can you please clarify how you defined 1:1 communication and why is the receiver anonymity out of scope?

@s1fr0 Thank your for your answer!

Although privacy and anonymity are strictly connected, I would keep them as
separate concepts. Privacy is about the ability to selectively reveal
information to some party, (included identity) while anonymity is the ability
to keep your identity (real or artificial) unlinkable from your actions.

Agreed :).

In your post was implicit in many points, but I think privacy definition
should stress (at least to some extent) reduction of metadata leakages,
possibly on all protocol/network layer employed. […]

Agreed. We can use your informal definition in the research log post.
This helps readers quickly grasp the idea.

The treat model I planned would list attacks based on metadata leakage.

I also think that Censorship Resistance can be hugely impacted by discovery methods […]

Yes. One part of the general threat model vs Waku v2 should be about ambient peer discovery.
It does not only effect censorship resistance but the aforementioned metadata leakage, too.
It might leak information about the position of the peer in the network etc…

@sanaz Thank you for the post :).

1

Yes. I fully agree. This post was more like a dump of my current thoughts.
I will comment on your forum post on the anonymity of Waku-Relay, too.
I also agree that it makes sense to have a first post on the research log that focuses on the relay protocol.
Here, I was aiming at a more general Wakuv2 threat model. While focusing on request response protocols for now, I would also include discovery protocols later on.

2

For a messaging protocol, I would assume that our latency requirement are not as strict as for interactive web session.
This in turn, would allow us leveraging mix-nets like Nym and get stronger anonymity properties than Tor.

3

An attacker that controls an entire autonomous system. For instance a malicious ISP.

4

See 1.
An series of research log post breaking down the threats versus each of the protocols starting with relay would be a nice, imo.
(This is in line with your second comment that I just read ;))
Yet, I still think, a comprehensive threat model (including ambient peer discovery) should be the future goal.
There might be attacks that combine weaknesses in several protocols.

I think this was also the initial plan when we wanted to compare Tor with Waku.

I was just planning to introduce the Tor attacker model, and look at threats vs Waku within this model.
Tor and Waku have different purposes.
Imo, giving an overview over the differences (both technical and from a use-case point of view),
and explaining why Waku can achieve stronger anonymity properties (if we accept higher latency), should be enough.

You could post your existing work here on the research log.
This allows reader to get an overview over Tor, and understand the differences.
I would stress that Tor’s main purpose is anonymous web browsing (incl. downloads and interactive sessions), while Waku’s main purpose is messaging. These purposes pose different restriction on the design, which reflect in their security/privacy/anonymity models.

For current/future work, I would rather focus on the Wakuv2 threat model.

I do not have any specific definition for 1:1 messaging yet.
Just that it would live on top of Waku, and thus receiver anonymity with respect to this protocol outside of Waku, would be out of scope.
I will edit the to make it more clear.

Ofc, if we plan to integrate 1:1 messaging as a Waku protocol, it must be part of the threat model.

I am back to working on the first part of the anonyimity analysis (tracking issue).
For this, I want to clarify our definition of privacy:

It will comprise the selective reveal of information, and, as
@s1fr0 commented, should also entail

any of their [the user’s] actions should reveal nothing except the fact that someone is performing something in the network

further

and in order to address anonymity as well someone should means: never seen before, never seen again and with no obvious relation to anyone else (past and future)).

I really like these as informal easy-to-grap definitions.
However, imo,

nothing except the fact that someone is performing something in the network

already contains large parts of anonymity.

I would suggest splitting the concepts to match the following scenario:
When Alice sends a private message to Bob with only privacy and no anonymity active, an observer would see “Alice sends a secure message Bob”.
The privacy property would make sure that (1) the message is confidential, meaning only Bob can read it.
(2) No further information about Alice or the message is leaked.
We could also tighten that to “Alice sends something to Bob”.

@s1fr0 wdyt?

Yes, I believe anonymity is quite harder to define than privacy.

(2) No further information about Alice or the message is leaked.

this already partially goes under anonymity: for example, you may leak her communication habits, e.g. message timestamps (maybe even timezone), (padded?) message lengths etc.

In principle you’re also leaking the fact that a communication between Alice and Bob is happening on the first place! Neither Alice or Bob ever opted-in for such information reveal, if we want to consider the selective reveal of information as part of our privacy definition.

That’s why some privacy solutions send random traffic from any node to some network node, indistinguishable from (real) encrypted communications.

So to not make it too general, I would probably specialize the privacy definition to our main use case: protecting message contents of communications with (randomized) encryption.

Anonymity instead seems to be more appropriate for protecting identity, usage patterns, etc. you mentioned in your first post.

Yes. I agree these concepts are very much intertwined.

(2) No further information about Alice or the message is leaked.

this already partially goes under anonymity: for example, you may leak her communication habits,
e.g. message timestamps (maybe even timezone), (padded?) message lengths etc.

Agreed, but we could “define” anonymity as mainly concerned with identity hiding/protection.
So with the definition I suggested above, protection from leaking the information about
“communication between Alice and Bob is happening” would be part of the anonymity property rather than the privacy property.

Of course, Waku’s goal is to provide both. The separation would just be for the sake of definition.

We could also go back to my initial thought and just define anonymity as a part of privacy.

So to not make it too general, I would probably specialize the privacy definition to our main use case:
protecting message contents of communications with (randomized) encryption.

I would also like to distinguish Privacy from the confidentiality aspect of security.
Imo, part of the meta-data protection should be part of privacy, too.
Ill ask for your review regarding this on the article :).

Wdyt @s1fr0?

Here is a Waku relay Privacy/Anonymity FAQ HackMD. Feel free to comment here. This is WIP and non-comprehensive, but already contains quite a bit of info. It might be turned into a more concise and presentable format in the future.
Ill address feedback for the HackMD, but focus (with respect to privacy/anonymity) will be shifted to a research log post on relay anonymity.
(The research log post will build on parts of the FAQ document.)

Github issue: Waku v2 privacy/anonymity roadmap

It is actually not that unrealistic I believe if a project decide to use a unique pubsub topic then they may control all the nodes at least at first, until others join in (or they could restrict discovery mechanisms).
It is an interesting scenario to review. If a project claims they are using Waku and hence their product is censorship-resistant, do not collect metadata, etc then it’d be interesting to know how much guarantees does Waku indeed give when one party controls all the nodes.