Speeding up RLN proofs
TL;DR version: We propose a technique to speed up RLN proof generation by about a factor of 3x, or 300%. The same idea should also apply to similar protocols, eg. Semaphore.
Introduction and context
Rate Limiting Nullifiers (RLN for short) is a rather cute protocol originally invented by folks around Ethereum back in 2019 as a modification of Semaphore, for the purpose of handling spam protection in an anonymous setting. Vac research embraced it very early and developed it further, and it is used in Waku for spam protection. Later Ethereum’s PSE released a production-ready implementation (see the above link).
In short, the core dilemma is: How to detect spammers in a messaging network without linking messages to sender identities?
RLN offers a solution for this with a tricky construction, which ensures that identities are revealed only when the same (previously anonymous) sender tries to send more than N messages a given time period (or epoch). Here N is some fixed rate limit, tied to the sender’s identity and the particular messaging network; for example it could be N = 10 (or any other number). This behaviour is enforced by ZK proofs.
Side remark: Of course, as this is not magic after all, it’s still possible to spam or DoS the very first contact in the network with invalid proofs or even malformed packets. However, at least those packets won’t be forwarded further in the network (and you could also block the sender’s IP). It’s also possible to create many sybil identities and spam using those, while keeping every single one below the rate limit. Because of this, creating identities should have a cost.
How RLN works in short
People wanting to participate in the network must first register in some central registry with a pseudonymous identity (however, this identity won’t be linked to messages sent). The identity consists of a secret key and a corresponding public key, which is simply a hash of the secret key (it can also include the limit N to make that user-specific).
This registration database is encoded as a Merkle tree, with the users’ public keys as the leaves. Hence a user can prove that they are a member, by producing a Merkle inclusion proof. Unfortunately, to do that, one has to reveal their secret key - otherwise a cheater could just use some other person’s identity (public key). RLN solves this by packaging this Merkle inclusion proof inside a zero knowledge proof.
This ZK proof also outputs a random-looking “nullifier”, specifically designed so that:
- it’s not linked to the sender’s identity (public key)
- in a given time period, a single sender can produce at most
Ndifferent nullifiers - if a nullifier is reused, the sender’s identity will be unmasked (and thus can be slashed for misbehaviour)
ZK proof generation for RLN
The original version of RLN uses the Groth16 proof system. While there are some alternative proposals (eg. KZG-RLN), they all come with some disadvantages.
Groth16 is probably the most well-known, and also one of the most widely used ZK proof protocols out there, which probably influenced this choice; but to be honest, there aren’t really any good alternatives here.
Groth16 has some good and some bad properties, but for RLN, the important ones are:
- very short proofs (down to 128 bytes, excluding public inputs and outputs)
- proper statistical zero-knowledge (this is important because there are secret keys inside the proofs, reused many times; so even a tiny information leakage could be a problem)
These days there are much faster, more modern proof protocols out there, but usually they satisfy neither of these two properties. While in theory one could post-compose any of those with a Groth16 “wrapper”, so you get the best of both, the overhead of doing that is several orders of magnitude larger than the actual RLN proof generation itself, so that won’t work here either.
The conclusion is that in the RLN context, we don’t really have any other choice than Groth16 (if you think you have a good alternative, please tell me!).
ZK proofs 101
A zero-knowledge proof can convince you that a given computation (program, circuit) gives a given result for some inputs, while keeping parts of the input secret. If that sounds magical, that’s because it indeed is ![]()
For example, in the RLN context, this computation checks the Merkle inclusion proof (among other things). To do that, we need both the secret key and the Merkle path as inputs; but the ZK proof won’t reveal those, only that the Merkle proof matches the public Merkle root of the registry.
Generating a ZK proof always happens in two steps:
- first we simply execute the computation, while logging all intermediate results (this log is called the “witness” or the “trace”);
- second, we use various tricks from mathematics and cryptography to prove that this witness sequence corresponds to a valid execution of our program.
The first step is usually called “witness generation”, and the second “proof generation”. The second step is usually orders of magnitudes (eg. 100x) slower, and thus is the main bottleneck. Furthermore, the size of this witness more-or-less determines the time required to create the proof.
The speedup trick
In the RLN case, the witness consist of about 5500 numbers (finite field elements).
The key observation here is that the bulk of that witness, about 5000 or so field elements (so more than 90%) is spent on the Merkle inclusion proof, which rarely changes: It needs to be updated only when the Merkle root changes, which only happens when a new user registers (or an old one deregisters).
As normally a user sends more than 1 messages in their lifetime, we expect much more proofs to be generated than such changes to occur. This is especially true in some particular applications like DoS protection for mixnet where we expect very large continuous traffic.
This means that we can precompute the unchanging part of the witness; but much more importantly, we are lucky that Groth16 works in a way that we can also precompute part of the proof (which is where most of the time is spent).
This simple observation is what leads to the 300% speedup.
Groth16 proof generation
How and why Groth16 proofs work is out of scope here (it’s rather involved…), but we can still look at the actual algorithm, which is not that complicated - one could implement it without understanding why it works.
It turns out that the majority of time is spent on an operation called “multi-scalar multiplication” (MSM), and the second most expensive operation is fast Fourier transform (FFT). These together are responsible for 99% of the proof generation time, with MSM being the bulk.
From a birds-eye point of view, MSM is just a fancy name for a sum much like a scalar product:
Here the x_j \in\mathbb{F} coefficients are field elements, G_j\in\mathbb{G} are elliptic curve points (group elements), and * is elliptic curve scalar multiplication.
In Groth16, the curve points are constant (they are determined by the circuit during the so-called trusted setup), and the field elements depend on the witness (thus at the end of day, on the inputs).
Even better, for 4 out of the 5 MSMs to be calculated during the proof generation, the coefficients x_j are in fact the elements of the witness themselves! Since most of those change very infrequently, we can simply split the sum into two, and precompute the unchanging part:
where \mathcal{F}\subset [0\dots M] denotes the set of indices of the fixed elements. So the only thing to do is to precompute the first sum P, and during the more frequent proof generation, only compute the remaining part (recall that that’s less than 10% of the sum!)
And that’s basically the whole idea. Unfortunately both the last MSM and the FFTs seem impossible to precompute; but this already gives a nice boost.
Furthermore, it’s also possible to split the witness generation into a precomputed part (that’s in fact necessary for the above to work!) and the remaining, but as witness generation takes less than 1% of the total time, that won’t result in a significant improvement.
Implementations
A proof-of-concept of the above is already implemented using the Nim language (chosen simply because most of the required components were already there), and incorporating it into Zerokit is in progress as I write this.