A couple of years ago Netflix introduced a concept they called a “Simian Army“. The idea was to have a bunch of automated processes that checked their cloud’s resilience to various failure scenarios. A prime example was a “Chaos Monkey” which randomly shuts down servers in their infrastructure to test the application’s ability to withstand server failures. When you know that a Chaos Monkey is running free in your infrastructure and your service stays up you know that you are can handle server failure effectively. We think that a similar approach applies to securing cloud infrastructure.
What is the Infection Monkey?
The Infection Monkey is a tool which spins up an infected virtual machine inside random parts of your cloud infrastructure. Inside means behind the Firewall and whatever other perimeter defence you may have. The machine itself is infested with all sorts of violent malware that will actively try to spread and infect everything around it. We make sure to infest the monkey vm with the latest and “greatest” viruses out there without their destructive payloads. Just like the original Netflix Chaos Monkey, the Infection Monkey runs within a predefined time frame.
Why release the Infection Monkey into your cloud?
Security breaches happen all the time. They never happen exactly the way you expected, planned for or defended against. Your infrastructure should be able to withstand a breach of the exterior security layer and handle the infection of internal servers. Cloud security needs to be designed for a perimeter breach just as cloud apps need to be designed for server failure. The way to know that you are indeed ready and safe is to periodically release the Infection Monkey inside your cloud.
When should I let the Infection Monkey run loose?
While a real breach will happen at the worst possible moment, the Infection Monkey will wake up in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problem.
Are you afraid to release the infected monkey?
If we give you the monkey, will you run it inside your cloud infrastructure? Will you feel safe that it will not inflict a tremendous amount of damage?
Probably not. Unfortunately, in most cases today if a VM inside the data center gets infected there is very little to stop it from spreading the infection to other servers around it.
We believe you should have a system in place that can detect an infection inside your data center, understand the semantics of the attack, mitigate its spread and remediate infected hosts. All in real time. Recent changes in the data center infrastructure make the creation of such a system completely realistic. We believe that modern data centers should deploy security mechanisms that do not rely on the fact that no host past the perimeter ever gets compromised. Rather, have a built in “immune system” in place which can handle breaches as they happen. These mechanisms should be there, and actively tested all the time.