Context: A few years ago, Signal changed the way end-to-end encrypted group conversations work. They announced these changes in a blog post.
End-to-end encrypted group conversations are hard. Under the hood, an encrypted group conversation in the Signal app is really an agglomeration of individual encrypted conversations: each group member is talking with every other group member.
This doesn't scale well, because sending a message to a group involves individually encrypting that message for every member in that group. But there's a more critical issue which breaks the integrity of group conversations, and that's the fact that this system of lacks transcript consistency. Message ordering isn't guaranteed to be consistent among all group members.
Receiving a message out of order doesn't seem like a big deal at first glance. Who cares who sent X message first? The issue is that access control modifications – adding a new member, kicking a member, or promoting an existing member to have the ability to do those things – are broadcast in the same way as regular messages: delivered to each member individually and potentially received out-of-order.
Suppose Alice is kicking Bob from a group conversation, and, at the same time, Charlie is revoking Alice's privileges to kick anyone out. If members receive those messages in order, the group will break into two: one fork of the group will kick Bob out, the other won't. There's no easy way to correct this divergence, especially considering we're in an asynchronous environment with inconsistently connected clients and unpredictable latency. Because of this problem, multi-admin groups weren't a feature of Signal Groups V1.
With a centralised chat service, no such issue exists because the service nominates itself as the absolute authority on group member and access control lists. All clients defer to the service for the latest membership list. But such a model would break Signal's promise of privacy. Under Signal's initial model of group conversations, groups are an abstraction of the clients only. All the Signal service sees are encrypted blobs flying from user to user – it doesn't know* that they're related to a group, and it wouldn't know which member is an administrator of a group, and so on.
The new Signal Private Group system purports to solve this problem without compromising on the private nature of the app. What it did was move the group member and access control lists to the server, nominating itself as the sole oracle of truth for the latest state of the group. But unlike traditional chat services, group membership is encrypted before it reaches Signal's server. The service stores the group's latest state on a server – membership, roles, and other attributes – without being able to read it.
By using a zero-knowledge ‘anonymous credentials’ proofs, the Signal service doesn’t need to know who the members of a group are in order to let any one of them fetch or, if permitted, modify, the latest state.
The following video (and this related paper) explains this is in great detail:
Elephant in the room
There's nothing obviously wrong with the paper, which deals solely with the anonymous credentials mechanism. It would be fine if the protocol existed in a vacuum, however in practice any request to the Signal service needs a way to get there, and this regular HTTP network journey leaks the user's digital identity. Any person accessing or updating a group's encrypted state is revealing their IP address to a Signal server in the process.
If the user has already identified themselves by the time they get to the anonymous credentials process, then the fact that they aren't deanonymised yet again means nothing. They've already signed in at the front desk to enter the building: the fact you're not making them sign in again means nothing. It doesn't matter how good the cryptography is – it's moot.
By analysing Signal’s log files, it would be very easy to create a list of every IP address belonging to each group. This seems to defeat the point of the anonymous credentials mechanism.
The Signal Groups V2 model moved the group-chat abstraction from the client to the server, entrusting Signal with a new power to list the members of each group (and also to know how many groups there are, when each group was created, and so on).
The usefulness of anonymous credentials under the new private group system rests wholly on trusting Signal to not keep logs. And, of course, you should assume that the Signal is logging everything (that's why end-to-end encryption is a thing).
Deniability is gone, again
The Signal Protocol is based on OTR, which provided deniable authentication of messages. That's the ability for a user to theoretically make the claim that any message they sent wasn't theirs – due to the fact that both the sender and receiver know the MAC key used to demonstrate message integrity.
In 2014, researchers noted that this deniability property wasn't inherited in the Signal Protocol (then named TextSecure) due to the fact that users had to first authenticate with the Signal server before they could send a message. Even though TextSecure achieved deniability on a protocol level, in practice it failed. 'In conclusion, TextSecure only achieves deniability theoretically. Content deniability is provided due to our security proof but we can not prove that no delivery request will be recorded at the TextSecure server.' (Sound familiar?)
Signal solved this particular issue in 2018 with its 'Sealed sender' feature. It stopped requiring senders to authenticate with the service in order to deliver a message.
But you now have to authenticate with the Signal service in order to grab or update a group's encrypted state – and the anonymous authentication scheme employed is not deniable. Signal now has undeniable cryptographic proof that a Signal user with your IP address belongs to a particular group chat. No such undeniable proof existed in Groups V1.
*But timing analysis was already a thing
Above I wrote that all Signal sees are encrypted blobs flying from user to user without knowing they're necessarily related to a group, but that's not exactly true, as the timing of messages can potentially leak group member lists.
If someone sends a message to ten users in a very short time window, it would appear likely that the message was a group message – and those ten recipients are members of a group chat with the sender. If a second or third message is sent to the same ten users again, it seems more likely. By conducting a probabilistic statistical analysis of request logs, an adversary could infer the make-up of group chats.
Maybe V2 was an admission that this theoretical leakage was a flaw large enough to mean group membership lists were practically non-secret anyway. Maybe we may as well stop pretending that group membership is sacred, especially if doing away with membership privacy could mean solving the frustrating issue of transcript consistency?
Not exactly, because even if we assumed that group membership lists were already discoverable with perfect precision through timing-based leaks, there's yet another privacy downgrade introduced in V2: group hierarchy. Through leakage, the service might be able to determine who the members of each group are, but it couldn't tell which member is an administrator. In V2, the serivce knows which member is in charge because only that member passes the anonymous credentials test to update the group's encrypted state.
V1 vs V2
The differences between Groups V1 and Groups V2 seem to be:
- Consistency issues: Problem in V1. Solved in V2, which allows Signal to greatly improve the UX and feature set of groups chats.
- Group participants (by IP address): Probabilistically discoverable in V1 with timing analysis. Known by the service in V2.
- Group administrators (by IP address): Secret in V1. Known by the service in V2.
- Group participation deniability: Deniable in V1. Non-deniable in V2 – the service has mathematical proof that someone with your IP address is a member of the group.
2023-05-06: Based on some feedback received, I feel I need to clarify that my conclusion is not that V2 was a mistake – it's that V2 introduced additional metadata leakage which wasn't present in V1. I'm aware that solving consistency issues allowed Signal to greatly improve the usability of groups and resolve related bugs – my focus was the cost of those improvements.