At any given moment, attack and defense are in a cat and mouse game where each side gains a momentary advantage. What we’ve recently seen over the past few months is a situation where defense is playing catch-up with what appears to be a serious hardware bug.
Current speculations suggest that a serious CPU bug might allow code running in user space to read kernel space memory. Such capability will make it much easier for attackers to exploit other security bugs that exist in the system or read sensitive system data. Another speculative guess suggests that the bug allows one virtual machine an introspection into another virtual machine memory. This attack vector puts in danger virtual environments such as Amazon EC2 and Azure Hyper-V where multiple tenants can co-exist on a single physical machine.
What’s going on?
In the past few days, both Azure and Amazon Web services announced emergency maintenance windows. We believe that this ties into a series of rapid changes in key parts of both Windows and Linux kernels.
Since October, developers from both operating systems have been working on a change to a key part of their core memory management code. A change in this code area is incredibly rare and usually involves small tweaks. To have such a large change coming in with so little discussion is unheard of.
Such a change would be interesting regardless of timing but several issues seem to indicate that we’re seeing a mitigation, defending against a serious CPU bug capable of breaking the fundamental security barrier between user applications and kernel data.
- First and foremost, the same core architecture changes in both Windows and Linux kernels at the same time. To have both systems attempting to mitigate the same attack at the same time is rare and indicates a hardware issue
- The speed of these changes. The patches began entering both kernels around November 2017 without any prior notice
- Linus has recently merged the changes in a hurry, despite the fact that these patches are “fail safe” (and we all know Linus opinion on such patches)
- Last, these fixes are causing a noticeable performance hit. Different tests indicate 5-30% slowdown depending on the use case
We can speculate this change is limited to Intel CPUs and does not cover AMD CPUs. According to an AMD developer: “AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against”. Having said that, it is unclear whether AMD chips are really off the hook as it looks like both Linux and Windows fixes will be applied to them too.
AMD CPUs are not affected
Based on this message along with other partial proof of concepts we can assume that there is a CPU bug that at minimum allows user programs to reliably read kernel memory and speculatively even manipulate its data.
What are these fixes all about?
A key assumption in modern operating systems is the separation between user applications and kernel code and data. This separation is enforced both in code structure and with hardware support. In hardware, user code is explicitly prevented from accessing kernel memory.
For performance reasons, all modern operating systems strongly rely on this hardware mechanism and allow user applications to exist in the same memory address space as kernel data. When the kernel is called upon (for instance, using a system call) the current processor privilege level is changed and kernel data is made accessible with little loss in performance.
For this reason, the new design is surprising. It nearly entirely separates the user application address space from the kernel address space. The precise change is simple (and obviously hard to implement well), splitting the virtual address translation tables (known as page tables) so when user code executes, the kernel address layout is entirely hidden, meaning the CPU cannot access the data. This means that for every context switch, the kernel must reload its entire address space and wipe both address lookup and data caches, incurring a significant performance hit.
Commits from the Linux kernel repository lead to a clear conclusion that this change is a mitigation and a complete solution will involve new hardware.
From cpufeatures.h in the Linux Kernel
At this point, it’s reasonable to say that the bug allows an attacker to reliably read kernel memory, allowing for easy disclosure of operating system secrets such as password hashes and bypassing the last decade of exploit mitigations such as Kernel Address Space Layout Randomisation.
If so, this can easily allow compromise of shared container environments, where multiple tenants share a single operating system kernel. In addition, we speculate that in shared virtual environments such as Amazon EC2 and Azure Hyper-V where multiple tenants can co-exist on a single physical machine, any CPU attack that can steal data from kernel memory can help compromise “adjacent” machines.
So what should you be doing?
If you have instances in a public cloud (AWS, Azure, GCP), expect downtime in their upcoming maintenance window (AWS in a couple of days and Azure starting next week). For some instances you might be eligible to do self-service maintenance proactively in the coming days. More details about these should be available in the companies’ portals or in a mail you probably received in last couple of days.
Regarding your on-premise environment, be sure you have the means to rapidly patch your machines and specifically your hypervisors. Take into account that applying these fixes will involve downtime.
As the upcoming patches are expected to mitigate a security flaw which was most probably caused by a CPU bug, mapping your environments’ CPU types and versions might help you later to find the machines which must be patched. For example, a command to check if the CPU is Intel manufactured on Linux is
cat /proc/cpuinfo | grep vendor | uniq
And on Windows machines,
Once the security issues are disclosed, operating system updates will be available for both Windows and Linux and you should update the machines ASAP.
Why is this interesting? We believe that the urgent maintenance notifications from the big cloud vendors along with the embargoed vulnerability disclosures from hypervisor maintainers mean that a serious, easily exploitable CPU flaw is on its way. We’ve been following these interesting changes for the past few months and will inform you of any change in recommendations and specific mitigations as they become available.
To make sure you get the latest, follow @GuardiCore on Twitter.