- Guardicore Labs, in collaboration with SafeBreach Labs, found a critical vulnerability in Hyper-V’s virtual network switch driver (vmswitch.sys).
- Hyper-V serves as the underlying virtualization technology for Azure – Microsoft’s public cloud.
- The vulnerability allows for both Remote Code Execution (RCE) and Denial-of-Service (DoS). Exploiting it allowed an attacker with an Azure virtual machine to take down whole regions of the cloud as well as run arbitrary code on the Hyper-V host.
- The vulnerability first appeared in a vmswitch build from August 2019, suggesting this bug might have been in production for over a year.
- In May 2021, Microsoft assigned the vulnerability CVE-2021-28476 with a CVSS score of 9.9 and released a patch for it.
- The vulnerability was found by Guardicore’s Ophir Harpaz and SafeBreach’s Peleg Hadar using an in-house developed fuzzer named hAFL1. The research process is discussed in their Black Hat USA 2021 talk and is detailed in a dedicated blog post.
Impact & Consequences
The vulnerability lies in vmswitch.sys – Hyper-V’s network switch driver. It is triggered by sending a specially crafted packet from a guest virtual machine to the Hyper-V host and can be exploited to obtain both DoS and RCE.
The security flaw first appeared in a build from August 2019, suggesting that the bug was in production for more than a year and half. It affected Windows 7, 8.1 and 10 and Windows Server 2008, 2012, 2016 and 2019.
Hyper-V is Azure’s hypervisor; for this reason, a vulnerability in Hyper-V entails a vulnerability in Azure, and can affect whole regions of the public cloud. Triggering denial of service from an Azure VM would crash major parts of Azure’s infrastructure and take down all virtual machines that share the same host.
With a more complex exploitation chain, the vulnerability can grant the attacker remote code execution capabilities. These, in turn, render the attacker omnipotent; with control over the host and all VMs running on top of it, the attacker can access personal information stored on these machines, run malicious payloads, etc.
Understanding The Bug
vmswitch - a Paravirtualized Device
In Hyper-V terminology, the host operating system runs in the “Root Partition” and any guest operating system runs inside a “Child Partition.” To provide the child partitions with interfaces to hardware devices, Hyper-V makes extensive use of paravirtualized devices. With paravirtualization, a VM knows it is virtual; both the VM and the host use modified hardware interfaces, resulting in much better performance. One such paravirtualized device is the networking switch, which was our research target.
Each paravirtualized device consists of two components:
- A virtualized service consumer (VSC) which runs in the child partition. netvsc.sys is the networking VSC.
- A virtualized device provider (VSP) which runs in the root partition. vmswitch.sys is the networking VSP.
The two components talk to each other over VMBus – an intra-partition communication protocol based on hypercalls.
netvsc (the networking consumer) communicates with vmswitch (the provider) over VMBus, using packets of type NVSP. These packets serve various purposes: initializing and establishing a VMBus channel between the two components, configuring various parameters of the communication and sending data to the Hyper-V host or other VMs. NVSP comprises many different packet types; one of them is NVSP_MSG1_TYPE_SEND_RNDIS_PKT used to send RNDIS packets.
RNDIS and OIDs
RNDIS (Remote NDIS) defines a message protocol between a host computer and a Remote NDIS device over abstract control and data channels. In a Hyper-V setup, the “host computer” is (quite confusingly) a guest VM, the “Remote NDIS device” is vmswitch or the external network adapter, and the “abstract communication channel” is VMBus.
RNDIS has various message types as well – init, set, query, reset, halt, etc. When a VM wishes to set (or query) certain parameters of its network adapter, it sends vmswitch an OID request – a message with the relevant object identifier (OID) and its parameters. Two examples of such OIDs are OID_GEN_MAC_ADDRESS which is used to set the adapter’s MAC address, and OID_802_3_MULTICAST_LIST which is used to set the adapter’s current multicast address list.
Virtual Switch Extensions
vmswitch, Hyper-V’s virtual switch, is also called “Hyper-V Extensible Switch”. Its extensions are NDIS filter drivers or Windows Filtering Platform (WFP) drivers that run inside the switch itself and can be capturing, filtering or forwarding the packets that they process. The Hyper-V Extensible Switch has a control path for OID requests as shown in the following figure:
A Notorious OID
Some OID requests are destined to the external network adapter, or other network adapters connected to vmswitch. Such OID requests include, for example, hardware offloading, Internet Protocol security (IPsec) and single root I/O virtualization (SR-IOV) requests.
When these requests arrive at the vmswitch interface, they are encapsulated and forwarded down the extensible switch control path using a special OID of type OID_SWITCH_NIC_REQUEST. A new OID request is formed as an NDIS_SWITCH_NIC_OID_REQUEST structure, whose member OidRequest points to the original OID request. The resulting message goes through the vmswitch control path until it reaches its destination driver. The flow can be seen in the diagram below.
The Buggy Code
While processing OID requests, vmswitch traces their content for logging and debugging purposes; this also applies to OID_SWITCH_NIC_REQUEST. However, due to its encapsulated structure, vmswitch needs to have special handling of this request and derefenrece OidRequest to trace the inner request as well. The bug is that vmswitch never validates the value of OidRequest and can thus dereference an invalid pointer.
The following steps lead to the vulnerable function in vmswitch:
- The message is first processed by RndisDevHostControlMessageWorkerRoutine – a generic RNDIS messages handler function.
- vmswitch identifies a set request and passes the message to a more specific handler – RndisDevHostHandleSetMessage.
- Later on, the message is passed to VmsIfrInfoParamsNdisOidRequestBuffer. The function is responsible for tracing the message parameters using IFR (Inflight Trace Recorder) – a Windows tracing feature which enables logging binary messages in real-time.
- Finally, the packet reaches VmsIfrInfoParams_OID_SWITCH_NIC_REQUEST, which is “specialized” to trace requests of type OID_SWITCH_NIC_REQUEST and their respective structure NDIS_SWITCH_NIC_OID_REQUEST.
netvsc – the networking virtual service consumer (vsc) – does not send OID requests with OID_SWITCH_NIC_REQUEST. Nonetheless, a design flaw causes vmswitch to accept and process such a request even if it comes from a guest VM. This allowed us to trigger the arbitrary pointer dereference bug in the tracing mechanism by sending an RNDIS set message with OID_SWITCH_NIC_REQUEST directly from a guest VM.
This ability can be the basis for two exploitation scenarios. If the OidRequest member contains an invalid pointer, the Hyper-V host will simply crash. Another option is to make the host’s kernel read from a memory-mapped device register. This, in turn, will trigger additional, device-specific side effects – namely, code execution. RCE on a Hyper-V host would enable an attacker to do as she wishes – read sensitive information, run malicious payloads with high privileges, etc.
What made this vulnerability so lethal is the combination of a hypervisor bug – an arbitrary pointer dereference – with a design flaw allowing a too-permissive communication channel between the guest and the host.
Vulnerabilities like CVE-2021-28476 demonstrate the risks that a shared resource model (e.g. a public cloud) brings. Indeed, in cases of shared infrastructures, even simple bugs can lead to devastating results like denial of service and remote code execution.
Vulnerabilities in software are inevitable, and this statement holds for public cloud infrastructure as well. This strengthens the importance of a hybrid cloud strategy, one which does not put all eggs in one basket (or all instances in one region). Such an approach will facilitate the recovery from DoS attack scenarios, and proper segmentation will prevent full compromise in case of a region takeover.