YouTube recording TBD
This is a transcript of the discussion after the call. Feel free to continue the discussion in this topic.
- SERPs
Q: What do these SERPs actually look like? Are they sort of pseudo addresses that you respond to and you just correlate them internally or?
A: So it’s going to be a node and an encrypted packet you can just attach the message to. So there is a way you can attach the message to this. So once you attach it to the SERP, you have to send it to the first hop and after that the processing would look like an exact sphinx packet. So the first node would receive it. It would look like a sphinx packet. it would be oblivious to the fact that it’s a response and not a forward message. More precisely it would be oblivious to the fact and it does the usual processing like remove a layer and then forward it to the next mix node. So the way this is done is the sender of the message they would actually make sure that to choose random nodes for the return path and then they would come up with this SERP and then they send it with the message. But these SERPs can be used exactly once.
Q: Okay, so in a sense that be the SERP would contain the return paths, do I understand that correct?
A: Yes.
Q: Then the other thing that you mentioned was that there’s a drawback of having the exit node not be the destination node as well, is that you would have the final message in plain text right? Is this completely inevitable?I assume you would need to know what protocol stream to create to the destination node, but I assume we could introduce another layer of encryption there.
A: Yeah. Initially we were talking about the noise encryption. But it’s just that these encryptions would be end to end, it would be between the sender and the destination they would need to actually exchange a key and then do this encryption for us to have another layer of encryption. But that’s not straightforward because with the mix protocol there is no multi-stream negotiation, they wouldn’t be able to negotiate keys beforehand for them to encrypt this last hop. So that’s something we still need to figure out. So if there was a way for the sender and the destination to exchange keys beforehand, they could use them to actually encrypt the message and send the encrypted message via the mix.
A: Maybe just adding to this just plain text from the mix view and then for specific origin protocols it might be fine let’s say if the sender and the receiver already have a pairing they could derive keys from that. But again, that depends on the protocol that’s actually tunneled via mix. So yeah, that’s not a restriction at all. It’s just from the mix view. Generally that is more difficult, assuming that sender and receiver have no pairing, that is part of future research.
- Optimizations
Q: In terms of optimizing this, is there a minimum number of mixing hops that we go through or is it going to be configurable that you can say I need some optimization and I can actually work with a reduced mixing count that way.
A: We could definitely work with a reduced number of hops because both of the sphinx packet it has a specific structure. So we have to beforehand actually what would be the maximum path length that we would tolerate and at any point in time the minimum path length would be three because only with three nodes we would be able to provide sufficient anonymity. So between three and the maximum path length it’s configurable that the sender could choose any number.
Q: In terms of one of the things we’re looking at at the moment is the neutrality stance on Nomos side and trying to articulate that. Is there optionality for mix nets to have almost pseudo levels of mixing so you can actually flip it? is it an easy selection to remove the layers or is it just a on-off switch?
A: You can make it on or off like per message or that’s how we are integrating. But it depends on the entry layer and the exit layer that you configure for each protocol. so in the entry and exit layer ideally you would have a separate send function for the mixed protocol and there you could have just a flag which would say on or off and based on whether it’s on or off the messages can be routed through the mix network or not.
A: So maybe just adding to this from the original protocol point of view it can be selective it has to be implemented in the entry and exit for this protocol but let’s say for gossipsub only the first hop would be going over mix and we can even do it for select messages within gossipsub but not for the relay parts. So from the point of view of the protocol that’s tunneled through mix it can be selective and the user, let’s say the API, for the protocol that gets tunneled the API that gossipsub offers to a client node they don’t have to know that they would just use gossipsub and the messages that originate they would go through mix the others not.
Q: The use case for this is let’s say you’re using your network for micropayments and you want it to be really fast but in another situation you want to have the ultimate obfuscation security. You want to be able to optimize. That’s what I’m thinking about.
A: Yeah, you can do that. You can also expose this to the client API. Yeah, that works as well.
- Traffic classification
Q: I saw that there’s a constant size payload, in order to hide the data for traffic classification. I’m curious what kind of factors might go in to choosing the optimal size for that, assuming that I was correct in understanding that it’s a constant size payload, not a header.
A: It’s a constant size payload. We actually do padding, but it depends based on how much padding you want to add and how small messages the application needs to send based on that you could actually choose the perfect packet.
A: The content size is also an important factor in preventing specific denial of service attacks. Shai mentioned that in a talk with me just to repeat that with this you can prevent malicious packets that would just loop through the network basically infinitely. So if you say you have a hop count of three then you would enforce a header size that exactly fits three forward addresses and you enforce a certain body size and then an intermediate node can check that it is within these constraints and will drop the message if the body would contain further hop information. And to have that kind of a trade-off to get this very simple denial of service protection here, or mitigation rather, you have to have a fixed spec.
Q:Sounds like that’s on the header fixed size if you want to limit the number of addresses or hops that can go through there or are we still talking about a fixed payload size as well?
A: Even the body has to be fixed for that.
A: The entire structure is actually fixed. You have a specific alpha size, a beta size, a gamma size, and a delta size. So everything is fixed and overall it’s constant size.
A: If you would not fix the body as well, I mean the full payload as well you could basically put further information into that. The intermediate hop doesn’t know that the header has already been processed so it just gets a message and thinks okay the first few bytes this is still part of the header but the payload has to have a specific size so it can check okay this is too small now I will drop it has hopped too many times.
- Packet size
Q: My question is also related to the packet size - for example the public waku network I think have defined a max packet size of 150 kilobytes, but what we have noticed is that different applications building on top generally tend to use packet sizes that fall in different ranges. I mean we could have applications that may just use one KV packet sizes, let’s say simple chat apps which are using just text based messaging but there could be apps that may be using let’s say 50 KB packet sizes. How would you recommend we define what should be the mixed packet size in this case?
A: So actually this is something we would need to explore a lot more - but it’s one thing that I wanted to point out - we cannot have different package sizes because that itself would reveal which sort of applications are being padded to that. If we were to say that package size less than 50 KB would be padded up to 100 KB. So we would actually know that all these 100 KB packets are of applications that are using pack where the message size is less than 50k. So we are actually indirectly reducing the anonymity set for these depending on the application. Based on the application if this was an ok thing like if it was a good tradeoff to do in terms because you would be using lesser bandwidth, in case of that we could go with it, or otherwise just using one big packet size with all of them padded would be the best approach from anonymity point of view.
- Dummy traffic:
Q: Are you actually using dummy messages in this protocol?
A: Yeah, currently we do not have any cover traffic but like I mentioned cover traffic is pluggable. So you could have cover traffic depending upon your application you could decide to have cover traffic and also the frequency of the cover traffic. So the cover traffic would actually provide a more powerful property of unobservability.
Q: So is it true then to say that if we don’t use it - cover traffic - then the anonymity properties offer a kind of equivalent in some sense to the Tor anonymity which is known to be broken. Is that true?
A: No, in Tor you would be able to perform volume attacks and traffic correlation at the endpoints based on the timing that would still not be possible here if you were to use only this unobservability property will not be there but still it would provide complete unlinkability and until the destination you wouldn’t be able to correlate the message based on PSD size or volume.
Q: Because in my understanding this traffic of dummy messages is important in order to prevent from the traffic analysis and things like that.
A: That’s what I mentioned you would know if the dummy messages are not there, you would know when a sender is actually sending messages. But if you had dummy traffic, that’s the protection it adds. You wouldn’t be able to say when they are actually sending messages and when they are actually sending just dummy traffic. So you wouldn’t know through the network when there is real traffic and when it’s just dummy traffic. But if you don’t have dummy traffic, you would know that it’s always real traffic. But still you wouldn’t be able to correlate it because, like I mentioned, for every mix node the incoming packets and the outgoing packets look exactly similar. They are just timed differently and they are unlinkable like you have bitwise unlinkability. So you would just see sphinx packets going through mix nodes and a bunch of sphinx packets coming out of the mix node and it’s hard to actually correlate.
Q: Also at some point of your presentation you mentioned those estimates of the probabilities of the deanonymization I believe.
A: I did, in the known limitation.
Q: Okay. The way you calculate this because let’s say I understand that the user who sends a message he chooses the path right and you have a number of the nodes on this path let’s say three or five or I think it’s a minimum one.
A: Yeah. this is calculated with a three hop path.
Q: Three mix nodes, right?
A: Three mix nodes. Yeah, exactly.
Q: And then there’s a probability that all of them are adversarial - that’s the probability we give here.
A: Yeah. That’s the probability. It’s a bit complicated here because from the entire mix node in the network, you’re going to select only a part of them and you’re going to have it in your cache and from the cache you would choose but it should actually mimic the probability of choosing nodes uniformly at random from the entire network.
Q: In the end you mentioned next things to do and one of them I think it was I believe testing various properties performance overhead, message propagation, efficiency, measuring a bandwidth and latency. How are you going to measure also anonymity properties?
A: But since sphinx cryptographic packet format theoretically provides all this, I would say if we had implemented it properly, unless there is something wrong with the implementation, it would have these properties.
Q: But I think this is not enough. Just encryption itself, it’s not enough, right?
A: So this is definitely also on the list a thorough analysis of the anonymity properties and we also make some compromises to get to a minimum viable product as soon as possible and then we will improve anonymity properties from there on. This is definitely a part of future steps. But yeah, as mentioned before, the first benchmarks kind of is Tor and getting better than Tor as Akshaya explains. So we can defend against these correlation attacks because that’s kind of the first step but then - thorough analysis - this is something that will follow.
Q: The way I understand, in order to step away from Tor, in a kind of a better direction one needs dummy messages.
A: But even without dummy messages you have the protection against correlation attacks. So it’s definitely a stronger threat model that this would still protect in compared to Tor.
Q: It’s kind of true, but then one needs kind of to measure those correlations somehow - to quantify and for that you need to assume adversary and what he does, for example if he does some sort of a statistical analysis, then one needs to show basically that he cannot do this analysis in a reasonable time.
A: This is true. But again, so for us the main thing is to get this out as a protocol in the peer-to-peer and it’s definitely improving on the status quo and then we would add this analysis plus improving security properties. So one thing that surely is true, is if there’s no cover traffic and the message rate is very low then you can still correlate that’s true. So definitely, we need to analyze these properties.
A: Yes, but that sort of an attack would be possible like everywhere - so we would assume a sufficient number of mix nodes and sufficient participants being there that would be required out of all mix nodes, so all mixed networks. That’s the assumption I took when I was saying even without unobservability it has a stronger anonymity model but yeah definitely if the traffic is low and if the number of mix nodes is low we have a problem.
A: But again, this is kind of improving the status quo incrementally. This is the goal of this effort because for now we have gossipsub without any protection. This was kind of the main protocol we wanted to anonymize and then moving towards gossipsub anonymization. We thought okay it makes more sense and is even more useful if we do this as a general libp2p protocol that you can anonymize all the protocols and improving from that is kind of the goal here.
- Implementation, handling the protocol
Q: How is this going to be played? Because from my understanding on how libp2p to is designed it’s a bit intrusive that if another protocol needs to use this protocol, we will push some changes in the specification of the other protocols. That seems a bit weird, or did I I understand it incorrectly maybe.
A: No, you don’t require actually to change the handler of these protocols. You just need a change in the send or publish functions. This is sort of unavoidable because at some point you need to say instead of routing it the regular way route it through the mix network whenever I switch on something you want to route it through the mix and whenever I switch it off you route it regularly. So that needs to be added somewhere. That’s where we have this entry layer and the exit layer. We try to make this entry layer and exit layer as much as possible usable with all the protocols with minimal changes. That’s something we have been trying with the end to end and the exit abstraction. It’s not completely successful yet.
Q: So the idea is to actually specify this inner and outer layers, And then get it incorporated to integrate with the other protocols.
A: The entry layer basically what it does is it creates a virtual stream to the local mix instance and it would call the local mix instance the send function to route through the mix network and the exit layer at the other end. It would invoke the handler on a buffer. Basically, the mix instance would store the message in a buffer and then would have a virtual stream to the origin protocol where the origin protocol can read from stream. But at a later point in time we want to explore better mechanisms for this entry and exit layer where we actually introduce separate abstractions which would do these things.