Introducing the Mix Protocol: Enhancing Privacy Across libp2p Networks

fryorcraken · January 20, 2025, 12:31am

Sorry I still don’t understand what “replacement” mean here. Can you describe the behaviour with and without replacement or provide a reference please?

fryorcraken · January 20, 2025, 12:32am

Yes, so we agree that the spam protection mentioned here is out of scope of mixnet, as it does not protect the mixnet.

The question is then, how is the mixnet protected?

akshaya · January 21, 2025, 7:28pm

Our PoC encodes full multiaddresses (including peerID) for each hop in the Sphinx packet, eliminating the need for mid-transmission discovery (as mentioned in (6) above).

To enhance reliability without significantly compromising anonymity, we plan to send messages over 3-4 redundant paths. Finding the optimal number of paths is a work in progress. As noted earlier, for complex use cases such as GossipSub, anonymizing only the first hop may suffice, potentially reducing the overall overhead.

akshaya · January 21, 2025, 7:30pm

Random selection without replacement means that once a node is selected, it cannot be selected again for the same path. That is, after a node is selected, it’s removed from the pool of available nodes for that specific path. This process continues until the required path length is reached. This ensures each node in the path is unique.

This approach prevents any single node from appearing multiple times in a path, which is crucial for mitigating traffic correlation attacks.

In contrast, random selection with replacement would allow the same node to be potentially selected multiple times for a single path.

akshaya · January 21, 2025, 7:39pm

The Sphinx packet format provides strong protection against traffic analysis, offering unlinkability and resistance to tagging attacks. Its per-hop integrity checks effectively prevent malformed or spoofed packets from propagating, reducing the risk of such attacks on downstream nodes. Random delays and dummy traffic, while pluggable, significantly enhance protection against timing-based attacks and traffic correlation.

These mechanisms obscure traffic patterns, making targeted attacks harder. However, you’re right that the mixnet doesn’t inherently defend against Sybil or large-scale DoS attacks.

Exploring complementary protection mechanisms like proof-of-stake, rate-limiting, or reputation systems could add value—though they come with trade-offs, like increased complexity or reliance on blockchain infrastructure.

ksr · January 22, 2025, 3:34pm

Thank you for the discussion.
This is a good example of working towards finding consensus on an RFC, aligning with our goals for RFC culture to foster more consensus-finding discussions similar to the IETF. cc @Phil

The RFC is currently in its preliminary stage, serving as a starting point for discussion rather than a finalized proposal.
Our goal is to collaboratively refine the ideas presented, seeking consensus to develop the most practical and effective solution.

Here are my thoughts on the various aspects of the libp2p-mix protocol:

Protocol vs Transport

added in a follow up post (see below)

Routing

The Sphinx package should contain complete routing information, including the full multiaddress (potentially allowing multiple, but limited).
We can modify the Sphinx format to accommodate this.
While mid-transmission discovery might have desirable properties in certain niche situations, it’s worth mentioning in the RFC but not exploring further at this stage.

For architectural consistency within libp2p, the inner-protocol endpoint should remain separate from the mix exit node.
This separation aligns better with libp2p’s modular design principles and allows for greater flexibility in protocol implementation and network topology (it requires thorough anonymity analysis though).

Discovery

Discovery should indeed be separate from the mix protocol definition.
We can move the discovery section to the appendix as an example implementation.
The RFC should focus on defining the interface to the discovery service, explaining what information about peers the discovery service has to deliver.
This approach should remain agnostic to ENRs, with ENR implementation being just one possible method.
For the proof of concept, it’s reasonable to leave discovery out of scope initially.

Ideally, we should aim for a discovery system allowing random sampling from the full peer set, with Discv5 being a close approximation.
Efficient capability discovery, a topic we previously considered researching, could be revisited in Vac ACZ.
This is crucial for mitigating the issue of non-functional peers.
My suggestion is to extend libp2p-kaddht with efficient capability discovery, replacing ENRs and the dependency on a libp2p-external discovery service.

Message Pushing

To enhance resilience against non-functional peers, messages pushed through gossipsub or lightpush should be transmitted via multiple diverse paths, similar to the approach used in tor-push.
This can be added as a recommendation in the RFC, though not as part of the core libp2p-mix specification.
Nodes implementing mix SHOULD follow this approach for message push protocols.

Spam Protection

Spam protection is not part of the core mix protocol.
libp2p-mix only defines how mix nodes wrap and unwrap packets, serving as one building block of a broader mix architecture.

Of course, spam protection is a crucial aspect of an architecture using the libp2p-mix protocol in practice and will be addressed as part of future work.
For now, the raw RFC includes a simple PoW mechanism in the appendix as an example. We could remove this in future more mature versions of the RFC.
After establishing a running testnet with the core libp2p-mix protocol,
we will prioritize either discovery or spam protection based on feedback and practical needs (in case nothing even more pressing comes up).
Spam protection will be designed as a pluggable component and most likely defined in a separate document to maintain modularity and flexibility.
This approach allows for adapting spam protection mechanisms to different use cases without overcomplicating the core protocol.

We could explore combining RLN with mix for Waku.
The mix protocol would allow decoupling network parameters from the RLN identity, which could offer desirable privacy properties.

fryorcraken · January 24, 2025, 5:19am

Capturing thoughts on spam protection as I dedicated some brain cells to it.

One attack to prevent is when L is excessive. Taking resources of the network by forcing mixing of one message by a great number of nodes.
As previously mentioned, RLN Relay applied on exit (or even applied on entry) would not protect against this attack.

It may interesting to consider using RLN to limit the number of unwraps, instead of messages per epoch.

To prevent cyclic paths, so that at worst, a message goes through all nodes to the mixnet, but only once. A rate limit of 1 per epoch could be interesting.
The epoch may need to be large enough so that it does not renew by the time the message loops around.

This would also force the user to use different node for each message sent within the epoch.

akshaya · January 24, 2025, 3:48pm

One attack to prevent is when L is excessive. Taking resources of the network by forcing mixing of one message by a great number of nodes.

Good point! The Sphinx packet provides some protection here. The packet size is determined by the maximum path length r , which limits the number of hops (L) to a maximum of r. For most real-time use cases, r = 5 should be sufficient, preventing a loop from exceeding 5 hops. We could even set L = r = 3 to strike a balance between efficiency and good anonymity protection.

It may interesting to consider using RLN to limit the number of unwraps , instead of messages per epoch.

This is an interesting idea. Mix nodes can’t distinguish between packets, so they wouldn’t be able to tell if a packet is being unwrapped for the second time. We’d need to look closely at RLN to see if it could help limit the number of unwraps per epoch.

This would also force the user to use different node for each message sent within the epoch.

If a mix node could figure out whether the same user is behind two messages (in an epoch), it could lead to unwanted correlation attacks. Additionally, restricting node usage across paths could limit the available paths, reducing overall usability of the system.

ksr · February 12, 2025, 11:53am

Protocol vs Transport

The libp2p-mix functionality should be designed as a protocol, not as a transport.
The transport might seem easier to implement, but it is essential to avoid tailoring the RFC too closely to the Nim implementation and instead ensure it adheres to libp2p’s soecification and overall architecture.
In the current libp2p design, a negotiated transport will be used for all comminication between two peers. However, we want to be able to selectively use mix. The selectivity is not just on a protocol level, but even on the level of specific messages within a given protocol.

Rather than introducing new abstractions, I’d “mixify” existing protocol endpoints.
For example, in gossipsub, modifying the protocol endpoint implementation would allow selective message routing through mix.
While challenging due to multiple message-sending points in gossipsub, this approach avoids significant architectural changes.
API users of protocols like gossipsub could enable mix by setting a simple flag during instantiating (default: no mix).
We could also allow for more fine-grained control through the endpoint API (e.g. per message, message filter, etc).
For a first version, we could send all messages a node originates through mix (making sure we send message to D peers, via D paths, see tor push), while all relaying is done as per usual.

The mix protocol itself would focus solely on relaying Sphinx packets, and wrapping and unwrapping them.
The endpoint components (entry and exit) can be viewed as a distinct layer (analogous to IP and TCP):

Entry: Handles mixification of protocol endpoints.
Exit: Dispatches packets to tunneled protocol endpoints.
(We will structure the RFC better to reflect this.)

Additional components, such as spam protection and discovery mechanisms, will be implemented separately.

A mix transport could be feasible if libp2p supported per-protocol transports.
This would enable mixification without altering protocol implementations and might make sense for protocols where all messages should traverse mix.
While this is not currently supported, it could be explored in the future.

In Vac/ACZ, we will focus on implementing and testing libp2p-mix within a gossipsub testnet. Following this, we will proceed with its integration into Waku.

ksr · February 12, 2025, 11:55am

Adding to this, endless routing can be avoided by implementing strict constraints on packet structure.
This approach involves:

fixing the length of each routing information record
Fixing the maximum header size (containing the routing records, thus fixing the total number of records)
fixing the payload size

While this strategy necessitates some padding, it provides a straightforward defense against endless routing attacks.
Consider a network designed for 3-hop routing where a malicious client attempts to send a packet with 4 routing hops:

If the malicious peer includes four routing records plus a full-sized payload,
the first mix node would reject the packet as its total size would exceed the maximum allowed.

If the attacker manages to get the packet to the third mix node,
this node would identify that the payload is too small, as it would contain an additional routing information record instead of the expected final payload.

(Note: This is already part of the sphinx package format and already in our design.)