As true believers in collaboration in the cyber industry, we continue to open a window to our interesting projects. We hope it will benefit the community and encourage others to do the same. Several months ago we published the source code for our Infection Monkey project and today we are revealing how we built our Windows Agent to support GuardiCore Reveal, the data center and cloud visibility and segmentation policy component of our flagship product, GuardiCore Centra.
Building a lightweight and highly scalable agent for Windows
We are using agents as one method of data collection. There are several others that we use, allowing customers to decide what is best for their environment or use case. Agents collect process information, network activity and additional metadata from any kind of Windows machine. Our Windows agent has a very small footprint, handles heavily loaded machines without missing any event and works without installing any drivers. This was made possible with Event Tracing for Windows (ETW).
ETW is a dynamic diagnostics and monitoring component in Microsoft Windows, for both hardware and software. It was first introduced in Windows 2000 and since then was expanded and improved significantly. Although some people consider it the worst API ever made, we decided to take the challenge and check it out.
ETW is a tracing mechanism based on events created in kernel-mode drivers and user-mode applications, which can be consumed anywhere. It is based on the model of consumers and providers that was designed for great performance and can be used in countless number of use cases.
How GuardiCore leveraged Microsoft Event Tracing for Windows (ETW)
The first step was to find relevant ETW providers. Finding a provider that will notify us about each process creation in the operating system was quite an easy task, as it is usually the basic example in the ETW HOWTOs. The provider “Microsoft-Windows-Kernel-Process” does exactly what we needed and is very easy to use.
However, gathering the network activity information required some research about the right provider. There are 1,114 ETW providers on my Win10 laptop, 747 on one of my Win2012 machines and the list goes on and on…
Two specific tools came in handy:
- Logman.exe – command line tool that comes by default in your Windows OS. It is a very strong tool to query providers, create ETW sessions and more.
- Windows Message Analyzer – A powerful tool for capturing and analyzing a variety of protocols and Windows events. Also, you might find this one particularly useful as it lets you create network captures without installing WinPcap.
We did several rounds of trial and error with many providers, creating controlled network traffic, recording and analyzing it with Message Analyzer, until we found “Microsoft-Windows-TCPIP” is suitable for the job with the right configuration*.
* This specific provider doesn’t actually collect all network connections, so we are currently using several more providers for traffic Windows handles differently – SMB for example.
We mapped the right event IDs for network connections creation and other fields needed for our agent. The next phase was to plan and implement our own consumer .
GuardiCore ETW consumer
GuardiCore guest agent is written in C language, so using the ETW C++ API seemed natural. However, we wanted to continuously expand the agent’s capabilities with more ETW providers, and C isn’t agile enough. Fortunately, in 2013 Microsoft released TraceEvent, an open source project which allows using ETW in .Net environment. Using TraceEvent is simple and straightforward. After minor modifications we made it work with .Net 3.5 (so it would run on vanilla Win2008R2), it took us only a little more than a 1000 lines of C# code to create our ETW consumer called “GuardiCore WinDig”
WinDig can be executed independently from the command line and immediately monitor processes and network activity on your machine. It is lightweight in size, memory usage and CPU. We keep improving it with more functionalities and collect more useful data for the user. For example, we show the specific service name that created a network connection, and not “svchost” which is meaningless. Another example is showing the particular IIS website name that handled a connection instead of “IIS” which is too general.
Testing for high scalability
During the development process, we tested the agent on thousands of machines and it was very stable. It is running in production and delivers good results. The information it collects allows us to deliver the deepest levels of application and network visibility with an extremely efficient solution. We will definitely expand our usage of ETW in the future and make our agents even more appealing to the user.