An Explanation of the Meltdown/Spectre Bugs for a Non-Technical Audience
Last week the news of two significant computer bugs was announced. They’ve been dubbed Meltdown and Spectre. These bugs take advantage of very technical systems that modern CPUs have implemented to make computers extremely fast. Even highly technical people can find it difficult to wrap their heads around how these bugs work. But, using some analogies, it’s possible to understand exactly what’s going on with these bugs. If you’ve found yourself puzzled by exactly what’s going on with these bugs, read on — this blog is for you.
“When you come to a fork in the road, take it.” — Yogi Berra
Late one afternoon walking through a forest near your home and navigating with the GPS you come to a fork in the path which you’ve taken many times before. Unfortunately, for some mysterious reason your GPS is not working and being a methodical person you like to follow it very carefully.
Cooling your heels waiting for GPS to start working again is annoying because you are losing time when you could be getting home. Instead of waiting, you decide to make an intelligent guess about which path is most likely based on past experience and set off down right hand path.
After walking for a short distance the GPS comes to life and tells you which is the correct path. If you predicted correctly then you’ve saved a significant amount of time. If not, then you hop over to the other path and carry on that way.
Something just like this happens inside the CPU in pretty much every computer. Fundamental to the very essence and operation of a computer is the ability to branch, to choose between two different code paths. As you read this, your web browser is making branch decisions continuously (for example, some part of it is waiting for you to click a link to go to some other page).
One way that CPUs have reached incredible speeds is the ability to predict which of two branches is most likely and start executing it before it knows whether that’s the correct path to take.
For example, the code that checks for you clicking this link might be a little slow because it’s waiting for mouse movements and button clicks. Rather than wait the CPU will start automatically executing the branch it thinks is most likely (probably that you don’t click the link). Once the check actually indicates “clicked” or “not clicked” the CPU will either continue down the branch it took, or abandon the code it has executed and restart at the ‘fork in the path’.
This is known as “branch prediction” and saves a great deal of idling processor time. It relies on the ability of the CPU to run code “speculatively” and throw away results if that code should not have been run in the first place.
Every time you’ve taken the right hand path in the past it’s been correct, but today it isn’t. Today it’s winter and the foliage is sparser and you’ll see something you shouldn’t down that path: a secret government base hiding alien technology.
But wanting to get home fast you take the path anyway not realizing that the GPS is going to indicate that left hand path today and keep you out of danger. Before the GPS comes back to life you catch a glimpse of an alien through the trees.
Moments later two Men In Black appear, erase your memory and dump you back at the fork in the path. Shortly after, the GPS beeps and you set off down the left hand path none the wiser.
Something similar to this happens in the Spectre/Meltdown attack. The CPU starts executing a branch of code that it has previously learnt is typically the right code to run. But it’s been tricked by a clever attacker and this time it’s the wrong branch. Worse, the code will access memory that it shouldn’t (perhaps from another program) giving it access to otherwise secret information (such as passwords).
When the CPU realizes it’s gone the wrong way it forgets all the erroneous work it’s done (and the fact that it accessed memory it shouldn’t have) and executes the correct branch instead. Even though illegal memory was accessed what it contained has been forgotten by the CPU.
The core of Meltdown and Spectre is the ability to exfiltrate information, from this speculatively executed code, concerning the illegally accessed memory through what’s known as a “side channel”.
You’d actually heard rumours of Men In Black and want to find some way of letting yourself know whether you saw aliens or not. Since there’s a short space between you seeing aliens and your memory being erased you come up with a plan.
If you see aliens then you gulp down an energy drink that you have in your backpack. Once deposited back at the fork by the Men In Black you can discover whether you drank the energy drink (and therefore whether you saw aliens) by walking 500 metres and timing yourself. You’ll go faster with the extra carbs in a can of Reactor Core.
Computers have also reached high speeds by keeping a copy of frequently or recently accessed information inside the CPU itself. The closer data is to the CPU the faster it can be used.
This store of recently/frequently used data inside the CPU is called a “cache”. Both branch prediction and the cache mean that CPUs are blazingly fast. Sadly, they can also be combined to create the security problems that have recently been reported with Intel and other CPUs.
In the Meltdown/Spectre attacks, the attacker determines what secret information (the real world equivalent of the aliens) was accessed using timing information (but not an energy drink!). In the split second after accessing illegal memory, and before the code being run is forgotten by the CPU, the attacker’s code loads a single byte into the CPU cache. A single byte which it has perfectly legal access to; something from its own program memory!
The attacker can then determine what happened in the branch just by trying to read the same byte: if it takes a long time to read then it wasn’t in cache, if it doesn’t take long then it was. The difference in timing is all the attacker needs to know what occurred in the branch the CPU should never have executed.
To turn this into an exploit that actually reads illegal memory is easy. Just repeat this process over and over again once per single bit of illegal memory that you are reading. Each single bit’s 1 or 0 can be translated into the presence or absence of an item in the CPU cache which is ‘read’ using the timing trick above.
Although that might seem like a laborious process, it is, in fact, something that can be done very quickly enabling the dumping of the entire memory of a computer. In the real world it would be impractical to hike down the path and get zapped by the Men In Black in order to leak details of the aliens (their color, size, language, etc.), but in a computer it’s feasible to redo the branch over and over again because of the inherent speed (100s of millions of branches per second!).
And if an attacker can dump the memory of a computer it has access to the crown jewels: what’s in memory at any moment is likely to be very, very sensitive: passwords, cryptographic secrets, the email you are writing, a private chat and more.
I hope that this helps you understand the essence of Meltdown and Spectre. There are many variations of both attacks that all rely on the same ideas: get the CPU to speculatively run some code (through branch prediction or another technique) that illegally accesses memory and extract information using a timing side channel through the CPU cache.
Acknowledgements: I’m grateful to all the people who read this and gave me feedback (including gently telling me the ways in which I didn’t understand branch prediction and speculative execution). Thank you especially to David Wragg, Kenton Varda, Chris Branch, Vlad Krasnov, Matthew Prince, Michelle Zatlyn and Ben Cartwright-Cox. And huge thanks for Kari Linder for the illustrations.