Remember the situation: you want to send that fresh dank meme to your friend (let's say) Greg. You open your favorite messenger, find the chat with Greg, attach the meme and hit send.
Behind the scenes, your messenger goes to some cloud, reads your chat history from a database, uploads your meme and pings your friend with a push notification that new top-tier content just arrived.
That's how every chat app has worked since forever. Even a junior who just finished a "Learn Python in 21 seconds from YouTube Shorts" course, nowadays can build it in one evening with a couple of beers.
It was a great time! We lived happily in this simple and peaceful world until some governments (1, 2, 3) decided that now they also want to read our spicy memes in our private correspondence and decide (spoilers for you, westerners) for which ones we should go straight to jail for.
Welcome to the brave new internet!
The good news is that end-to-end encryption was invented exactly for this – sending dank memes through "untrusted" channels so only the recipient can read them.
You encrypt data at your end and can still store it on some random server that you don't really trust. Yes, we haven't ascended to peer-to-peer enlightenment yet – that comes in the next part, where we'll see how German and French parliaments exchange le memes through Matrix. So subscribe for the next post, or whatever content creators say.
All cool kids today prefer end-to-end encryption (E2EE) and trying to distance themselves from the "clouds". Not just because suffering makes them brave and sexy. though it does But because they're young and practical, they understand that clouds are still cheap and convenient for most non-geeky people.
As long as "real" peer-to-peer software remains about as user-friendly as a 1988 cassette player, we're stuck with clouds. We just need to get creative on how we use them.
For now, let's go back to the situation where we were sending our dank meme to Greg. Greg lives a thousand miles away from us, so we still need to use that sketchy thing called The Internet. No way around it.
Time to dive into how encryption actually works.
We covered asymmetric encryption with its "public" and "private" keys in my old Blockchain post, but nobody remembers that, so let's start fresh.
All modern encryption can be divided into two categories: symmetric (AES) and asymmetric (RSA).
Symmetric encryption is pretty straightforward – Greg and I agree beforehand on a special "secret key" that both encrypts and decrypts all our messages. It's like if we both had keys to the same apartment and could drop by anytime for drinks.
The problem? We somehow need to give each other that key in the first place.
With apartment keys, Greg and I could meet in some quiet alley for the handoff. But on The Internet, there's no safe place – we're constantly at risk of having our data stolen or copied without even knowing it.
So symmetric keys aren't great for starting new connections.
But don't throw them out in a garbage just yet. Symmetric keys have major advantages: they're way shorter than asymmetric ones and much faster.
Just look at an example: a 128-bit AES key is just 22 characters in base64. That's literally something you could write down on your palm.
Yet it provides solid protection by today's standards.
Just don't get it tattooed, please.
🔥 Fun fact: if you take your keyboard and smack your head with it about ten times, the result could work as a terrible symmetric key. Try it now!
The simplicity and speed of symmetric keys will come in handy when we encrypt huge files later.
To avoid sending secret keys across the internet, smart people invented asymmetric encryption.
With asymmetric encryption, everyone gets two keys – a public key and a private key. These are basically two really long prime numbers connected by a simple math formula that I won't tell you yet...
Your private key is super-secret because it can decrypt anything encrypted with your public key. But your public key? Share it freely.
You can send your public key to Greg, post it on your website, or even tweet it. Anyone can use it to encrypt a message that only you can read.
To read it, of course, you need the private key. So keep that one safe :)
The public-private key pair has a cool reverse trick too: if you encrypt something with your private key, anyone who knows your public key can verify that YOU sent it, without even knowing your private key.
What's that good for? Digital signatures! Like the ones you use in online banking or government services.
If you're amazed by all this math wizardry, check out "The Code Book" by Simon Singh. It came out in the early 2000s, but nobody's written anything clearer about encryption from ancient times to (almost) modern TLS for regular folks.
Now you might be jumping up yelling: Dude, if these asynchronous keys are so awesome, why'd you waste my time with that AES stuff?
First, it's asymmetric, not asynchronous – read more carefully!
Second, here's what a typical key pair looks like:
See the problem? These chunky boys can, at best, eat your entire bucket of KFC, but ask them to encrypt a 10TB file and they'll say OOOFF like your grandpa on a treadmill.
That's why modern internet uses both types of keys, usually together, to cancel out each other's weaknesses.
And I have even better news – you can mathematically derive one from the other!
Meet the famous Diffie-Hellman algorithm, which makes crypto-nerds wet because it lets you elegantly create a symmetric key using just two asymmetric pieces: your private key and someone else's public key.
It's beautiful and simple: if keys are just numbers, so with simple math operations (raising to powers, modulo division), both sides can end up with identical results, using just the over side's public keys and their own privates. Without exposing them. Brilliant!
No joke. Every time you visit an https website (including this post), your browser performs a modern version of Diffie-Hellman to establish a secure connection with my server in Germany and show you the little green lock icon on top of your browser.
So this isn't just nerdy stuff – you've been using it all along. Deal with it!
Great, encryption basics covered! Now you can add "security expert" to your resume and start cashing in.
So we just exchange public keys, encrypt our stuff, and send it? That's it? Why do we need a whole post then? We've got TikToks to watch!
TikToks can wait – we've got a problem here.
Everything I've described only works for one-on-one chats between me and Greg. But these days, we're all living in group chats, and there, this approach falls apart completely.
Let's tackle that next.
Houston, we have a problem. We've learned how to encrypt one-on-one messages, but now we've got a chat with 100+ people, and we want it all beautifully end-to-end encrypted so evil clouds can't look at our dank group memes.
Let's think through some possible solutions:
We could think of a 100-person group chat as simply 100 different one-on-one chats. Why not, right?
We know everyone else's public keys, so we can encrypt each message separately for each person and send it so that only they can decrypt it, while everyone else can just ignore it.
This means when we write a message to the chat, we encrypt it 100 times. Or 1000 times. Or a million times if there are a million participants.
See the problem?
With this approach, even if you want to send your buddies one meme, your little iPhone will struggle trying to encrypt it a thousand times that it won't just freeze for several minutes - it'll drain your battery enough that you can't call an Uber and have to walk home.
But forget about your iPhone - there will be new one next year anyway. There's another problem: instead of one 1MB image, you need to store 1,000 identical encrypted files, about 1 GIGABYTE of data for each meme. No server will like it.
After just a couple of memes, your friend chat will take up more space on the server than atoms in the universe, and your monthly data allowance will last about fifteen seconds.
Is this the future you want? Doesn't seem like it. So forget this option - it's trash. Let's look at the next one.
Everyone agrees on one common encryption key for the whole chat, which can use to encrypt and decrypt any messages inside. This key is symmetric for everyone, as we learned earlier.
In practice, usually the person who started the chat generates it and then shares through "some" secure connection with all other participants (using their asymmetric keys, for example).
Don't smile - this is actually a decent option. Many "old" encrypted chats worked exactly this way for years, why not?
The shared key gets transmitted over the network in encrypted form, it's easy to add new people to the chat by giving them our key, and the evil cloud never knows it. Our messages are safe!
But there are still some issues.
First: how do we remove users from the chat?
Okay, if my dudes and I have more than 130 ICQ points (in total), we might invent a system where when one user gets removed, everyone else generates a new shared key to protect future messages.
But this creates new technical problems with communicating this new key to participants who are offline right now - they temporarily lose access to the chat until they come back online and receive the new key.
Fine, eventually everyone will get it - we can write this off as a "UX problem". Maybe we send a special push notification to them, asking to go online and update their keys. Solvable.
But there's a more serious issue: key leakage.
In a chat with a thousand people, this will happen sooner or later - someone will upload an unencrypted backup to iCloud, leave their phone in a bar, or future hackers might just brute-force our key. The probability of the latter is extremely small, but never zero. A chat might use the same key for years, giving hackers plenty of time to buy all the GPUs and hack you.
Serious memes need more serious guarantees.
Though we won't completely trash this method. Telegram's MTProto protocol in secret chats works exactly this way, despite all the downsides. They try to smooth it over with UX tricks, offering, for example, auto-deletion of messages after 20 seconds.
🧠 You can easily confirm this right now by creating a secret chat in Telegram - it'll tell you something like "waiting for the other person to come online and exchange keys with you" and won't let you send any messages until then. Now you know why :)
Later we'll learn that even the god-loved Matrix uses a "session key" (which is the same thing) to improve performance in large chats. It just rotates more frequently. So it's not all black and white.
This is similar to the previous option, but now we agree on a new shared key for each message sent. Now one key leakage isn't scary - it could only decrypt one specific message which got probably leaked too , not the entire chat.
But we already found out that sending a new key between all participants each time is madness. We need some tricky trick here.
So, imagine our chat is just starting. Participants exchange keys and get one shared key. this is easy to do at the beginning because each user clicks the Join Chat button themselves But in this third option, besides the key, they also agree on one more thing - a special algorithm that lets each participant mathematically derive the next key without asking others for help.
You can visualize this as a magic box where we put our old key in one side, turn a handle, and it spits out a new one.
Given the same input key, the box will always produce the same result. For all users. Even offline. And it'll be physically and mathematically impossible to turn the handle backward and get the original key from the new one.
How? It's simple, even a schoolkid could understand: we can multiply our key by some prime number.
Here's a thought experiment: I give you a piece of paper with two numbers - 107 and 283, and ask you to multiply them. You pull a calculator from your pocket and instantly tell me the answer - 30281.
Took about four seconds, right?
Now imagine the reverse: I give you a paper with 30281 written on it and ask you to tell me which two numbers I just multiplied to get this result.
Difficult? Calculator not helping? Exactly.
This example is very simplified but helps understand how our magic box might work - easily deriving a new key from an old one, but not vice versa. This sounds like Diffie-Hellman? It should! Crypto folks call this process derivation, and the new key is a "derived key." That's just a fancy term you can drop to impress your friends.
An algorithm that works in one direction but not the other is called a Ratchet mechanism. You've probably seen a ratchet screwdriver or wrench that easily turns in one direction but not the other. That's where the name comes from.
From now on, I'll just call this mathemagical key generator a "ratchet".
OK, so our ratchet gives us a new key for each new message.
If one key leaks, hackers won't be able to turn the ratchet backward to get old keys and read previous messages - that's good. crypto folks call this property "Forward Secrecy," though it's more like Backward Secrecy, but who are we to judge smart people here? But what about new messages? The ratchet can still be turned forward easily.
The problem is that there are only a finite number of algorithms for our ratchet. Maybe a dozen standard ones. Or up to a hundred if you count non-standard ones.
We don't invent a new ratcheting algorithm every time for each new app. In cryptography, inventing own algorithms is a very bad practice!
But this means a hacker who steals one key can simply try all the known algorithms from Wikipedia and find the right one to decrypt all our future messages. We need to fix this. Our memes are still in danger!
What if our ratchet had some kind of code on it, like a combination lock? And two identical ratchets would only produce the same result if the code on them matches?
Basically, as programmers would say, the ratchet now has a state that affects how it works.
Anyone who remembers how the famous Enigma encryption machine worked should be having war flashbacks right now. We're literally reinventing the same thing over and over...
But wait, doesn't this throw us back to Option 2 with all its problems? How will we all agree on this code? How will we tell it to the people who are offline?
What if another chat participant is drinking beer at a neighborhood bar and can't stop right now to generate you a new public key?
No worries. Let them finish their drink. Watch closely: we don't need to agree on a new code for every message - only when the other person wants to reply to us. Like this:
So the code only changes when the second participant comes online and decides to write us back. Until then, we keep generating our keys with the same code.
This means we can use our familiar Diffie-Hellman algorithm to exchange the new code.
The second participant attaches their new public key to their messages, and we calculate a new shared code and set our ratchet to it to decrypt what they wrote.
This creates a completely asynchronous exchange - we don't need to be in the chat and respond. We'll read and calculate all the codes and ratchets when we come back online.
Next time when we want to write back, we'll similarly attach our new public key, and the other participant can take it (often right from the message) and calculate the next code on their side. The offline problem is solved!
Even if a hacker breaks into our chat somewhere in the middle of my monologue, they might decrypt a few messages, but after a new code exchange, they'll lose the ability to decrypt our cozy little chat again.
This algorithm is called Double Ratchet and its ability to "self-heal" after a breach is one of its main features.
I recommend you to watch this great video:
All encrypted one-on-one chats in modern messengers like WhatsApp, Signal, Matrix use some variation of the Double Ratchet mechanism. Except Telegram, which doesn't encrypt regular chats at all, uses the old shared key method for secret chats and can't encrypt group chats at all.
You know who else can't encrypt group chats? Double Ratchet 🤡🤡🤡
Wait, what? What about group chats? We went through all this just to arrive at an algorithm that only works for one-on-one chats AGAIN? Are you high or something?
Yes. So this post isn't over yet. You'll have to put up with me a bit longer.
Modern messengers almost always use different types of algorithms for group chats versus one-on-one chats. Double Ratchet works great for 1-1 chats. Group chat logic is far more complex and usually full of tricks that developers use to balance between security and good UX.
So far, they're not doing great :) Anyone who's seriously used Matrix, especially in groups with 500+ people, knows exactly what I'm talking about.
"Unable to decrypt message" still comes to me while I'm sleeping.
Anyway, to give you a perspective on how cutting-edge this topic is - the first real encryption standard for group chat messages, MLS (Messaging Layer Security), was only released in 2023. That's newer than ChatGPT, folks! We're literally talking about bleeding edge stuff here. though of course they'd been developing it since the previous decade
Before MLS, every chat app invented their own mind-blowing encryption algorithms for groups, but now everyone's finally getting their act together and gradually moving to this standard.
So put your Double Ratchet in the drawer and meet - Ratchet Tree!
The main problem this tree solves is how to quickly agree on a single "key" for our encryption ratchets as a group without transmitting it N×N times between each participant.
It's built pretty simply: first, each chat participant starts at the very bottom of a basic binary tree.
Not all branches of the tree have to be occupied. The chat might have an odd number of participants, and people come and go.
Then there are nodes higher up - from users to the root. These intermediate nodes don't do anything; they're just a convenient abstraction for getting a common root key.
Each tree node has its own public-private key pair. Something like this:
Public keys aren't secret, so all chat participants can know them. But private keys are more interesting.
In the MLS standard, private keys from tree nodes are only known to users who have a direct path from themselves to the root of the tree. So Greg, for example, only knows these private keys:
Each chat participant stores the entire tree locally.
For "their" nodes (which come from them to the root), they know the private keys; for "others" - only public keys. But the most important thing, the whole reason for this circus - every chat participant knows the private key of the very top node of the tree. Its root. This key is the "code" that everyone enters into their ratchet boxes to read and sign new messages.
All users in such a chat live in harmony and peace, generating new keys using the common root, and encrypting messages with them. Everything's nice and peaceful.
And then one day, Greg says: "Hey, let's invite Bob to the chat?" After all, group chats were invented to invite people into them, right?
Bob bursts into the chat. First thing, he takes any free spot in the tree. If there are no free spots, the whole tree grows and recalculates keys for new intermediate nodes.
Greg takes Bob's public key, encrypts the current tree snapshot with all the keys (except private ones) and forwards it to Bob. This is called a Welcome Message.
There's an immediate problem - Bob doesn't have any private keys yet. So he can't yet encrypt messages with us yet. For that, he needs the root key.
And we can't just tell him the old private keys up the tree, because then he could read our previous messages, and what if there were unflattering memes about him?
Time to create a new root key!
The beauty of our tree is that any chat participant (except Bob) can create a new root key.
So let Greg do the work for everyone.
Step 1: Greg throws away his keys and creates new ones. He can create them from scratch or use a ratchet for style - it doesn't matter.
Step 2: Greg turns his ratchet handle to generate new keys for all nodes from himself up the tree to the very root. This upward regeneration is why it's called a Ratchet Tree.
Now Greg knows the new root key, but others don't yet.
Step 3: Now Greg needs to safely communicate the new key to other chat participants.
Let's start with Greg's closest neighbor. They live nearby, so they both know the key directly above them in the tree.
Greg can encrypt the new key with the old public key of this shared node and send this message to his neighbor - "Bob, catch, use this now."
Bob can decrypt and read it since he also knows the shared node key, and then fire up his ratchet and, just like Greg did earlier, calculate all the keys from himself up the tree. And since everyone's ratchet works the same way, the magic of math happens and Bob gets exactly the same keys that Greg got earlier! Including the new root key!
Step 4: With neighbors from another district, it's roughly the same story. Greg takes the well-known key of the neighboring subtree, encrypts the new private key with it, and passes it to the workers in the chat.
And the workers don't need to regenerate their keys - they can stay as they are; they just need to learn the new root key.
The same happens with the rest of the tree. You can figure this out by analogy - you're not dumb.
In total, Greg only needs to send 3 messages for all 8 chat participants to learn the new root key. With small numbers this might seem trivial, but if there are, say, 15,000 people in the chat, a complete key rotation will take just 14 transmissions, not 15,000. For that kind of efficiency, developers deserve at least a case of beer.
Those who crammed LeetCode all night for interviews should now explain to everyone that we went from O(n) complexity to O(log), which is why everything got so beautifully, expensively, richly better.
The same situation happens when removing a user from the chat. The root key rotates so the removed person can't read new messages. The person responsible for key rotation in this case is either the one who did the removing or just some random unfortunate person from the chat.
Voilà!
This key tree doesn't even need to be stored directly on the server - it can be virtual and stored directly on each chat participant's computer. That's the second strength of MLS - it works great in peer-to-peer chats.
But we'll talk about those next time.