Trials and Tribulations – A Practical Look at the Challenges of Azure Security Groups and Flow Logs

Cloud Security Groups are the firewalls of the cloud. They are built-in and provide basic access control functionality as part of the shared responsibility model. However, Cloud Security Groups do not provide the same protection or functionality that enterprises have come to expect with on-premises deployments. While Next-Generation firewalls protect and segment applications on premises’ perimeter (mostly), AWS, Azure, and GCP do not mirror this in the cloud. Segmenting applications using Cloud Security Groups is done in a restricted manner, supporting only layer 4 traffic, ports and IPs. This means that to benefit from application-aware security capabilities with your cloud applications you will need an additional set of controls which is not available with the built-in functionality of Cloud Security Groups.

The basic function that Cloud Security Groups should provide is network separation, so they can be best compared to what VLANs provides on premises, Access Control Lists on switches and endpoint FWs. Unfortunately, like VLANs, ACLs and endpoint FWs, Cloud Security Groups come with similar ailments and limitations. This makes using them complex, expensive and ultimately ineffective for modern networks that are hybrid and require adequate segmentation. To create application aware policies, and micro-segment an application, you need to visualize application dependencies, which Cloud Security Groups do not support. Furthermore, if your application dependencies cross regions within the same cloud provider or between clouds and on premises, application security groups are ineffective by design. We will touch on this topic in upcoming posts.

In today’s post we will focus on a specific scenario and use case that is common to most organizations, discussing Cloud Security Groups and flow logs limitations within a specific vNet, and illustrating what Guardicore’s value is in this scenario.

Experiment: Simulate a SWIFT Application Migration to Azure

Let’s look at the details from an experiment performed by one of our customers during a simulation of a SWIFT application migration to Azure.

Our customer used a subscription in Azure, in the Southern region of Brazil. Within the subscription, there is a Virtual Network (vNet). The vNet includes a Subnet 10.0.2.0/24 with various application servers that serve different roles.

This customer attempted to simulate the migration of their SWIFT application to Azure given the subscription above. General segmentation rules for their migrated SWIFT application were set using both NSGs (Network Security Groups) & ASGs (Application Security Groups). These were used to administrate and control network traffic within the virtual network (vNet) and specifically to segment this application.

Let’s review the difference:

  • An NSG is the Azure resource that is used to enforce and control the network traffic. NSGs control access by permitting or denying network traffic. All traffic entering or leaving your Azure network can be processed via an NSG.
  • An ASG is an object reference within a Network Security Group. ASGs are used within an NSG to apply a network security rule to a specific workload or group of VMs. An ASG is a “network object,” and explicit IP addresses are added to this object. This provides the capability to group VMs into associated groups or workloads.

The lab setup:
The cloud setup in this experiment included a single vNet, with a single Subnet, which has its own Network Security Group (NSG) assigned.

ASGs

  • Notice that they are all contained within the same Resource Group, and belong to the Location of the vNet (Brazil South).

NSGs:

The following NSG rules were in place for the simulated migrated SWIFT Application:

  • Load Balancers to Web Servers, over specific ports, allow.
  • Web Servers to Databases, over specific ports, allow.
  • Deny all else between SWIFT servers.

The problem:

A SWIFT application team member in charge of the simulation project called the cloud security team telling them a critical backup operation had stopped working on the migrated application, and he suspects the connection is blocked. The cloud network team, at this point, had to verify the root cause of the problem, partially through process of elimination, out of several possible options:

  1. The application team member was wrong, it’s not a policy issue but a configuration issue within the application.
  2. The ASGs are misconfigured while NSGs are configured correctly.
  3. The ASGs are configured correctly but the NSGs are misconfigured or missing a rule.

The cloud team began the process of elimination. They used Azure flow logs to try to detect the possible blocked connections. The following is an example of such a log:

Using the Microsoft Azure Log Analytics platform, the cloud team sifted through the data, with no success. They were searching for a blocked connection that could potentially be the backup process. The blocked connection was non-detectable. The cloud team members therefore dismissed the issue as a misconfiguration in the application.

The SWIFT team member insisted it was not an application issue and several days passed with no solution, all while the SWIFT backup operation kept failing. In a live environment, this stalemate would have been a catastrophe, with team members likely working around the clock to find the blocked connection, or prove misconfiguration in the application itself. In many cases an incident like this would lead to removing the security policy for the sake of business continuity as millions of dollars are at stake daily.

After many debates and an escalation of the incident, it was decided- based on the Protect team’s recommendation- to leverage Guardicore Centra in the Azure cloud environment to help with the investigation and migration simulation project.

Using Guardicore Centra, the team used Reveal to filter for all failed connections related to the SWIFT application. This immediately revealed an attempted failed connection, between the SWIFT load balancer and the SWIFT databases. The connection failed due to missing allow security groups. There was no NSG in place to allow SWIFT LBs to talk to SWIFT DBs in the policy.

The filters in Reveal

 

Discovering the process

Guardicore was able to provide visibility down to the process level for further context and identification of the failed backup process.

Application Context is a Necessity

The reason the flow logs were inadequate to detect the connection was that IPs were constantly changing as the application scaled up and down and the migration simulation project moved forward. Throughout this, the teams had no context of when the backup operation was supposed to occur or what servers initiated these attempted connections, therefore the search came up empty handed. They were searching for what they thought would reveal the failed connections. As flow logs are limited to IPs and ports, they were unable to search based on application context.

The cloud team decided to use Guardicore Centra to manage the migration and segmentation of the SWIFT application simulation for ease of management and ease of maintenance. Additionally, they added process and user context to the rules for more granular security and testing. Guardicore Centra enabled comparing the on-premises application deployment with the cloud setup to make sure all configurations were in place.

The team then went on to use Guardicore Centra to simulate the SWIFT policy over real SWIFT traffic. Making sure they are not blocking additional critical services, and will not inadvertently block these in the future.

 

Guardicore Centra provided the cloud security team with:

  • Visibility and speed to detect the relevant blocked flows
  • Process and user context to identify the failed operation as the backup operation
  • Ability to receive real-time alerts on any policy violation
  • Applying process level rules & user level rules required for the critical SWIFT Application
  • Simulation and testing capabilities to simulate the policies over real application traffic before blocking

All of these features are not available in Azure. These limitations cause serious implications, such as the backup operation failure and no ability to adequately investigate and resolve the issue.

Furthermore, as part of general environment hygiene, our customer attempted to add several rules to govern the whole vNet, blocking Telnet and insecure FTP. For Telnet, our customer could add a block rule in Azure on port 23; For FTP, an issue was raised. FTP can communicate over high range ports that many other applications will need to use, how could it be blocked? Using Guardicore, a simple block rule over the ftpd process was put in place with no port restriction, immediately blocking any insecure ftp communication at process level regardless of the ports used.

Visibility is key to any successful application migration project. Understanding your application dependencies is a critical step, enabling setting up the application successfully in the cloud. Guardicore Centra provides rich context for each connection, powerful filtering capabilities, flexible timeframes and more. We collect all the flows, show successful, failed, and blocked connections, and store historical data, not just short windows of it, to be able to support many use cases. These include troubleshooting, forensics, compliance and of course, segmentation. This enables us to help our customers migrate to the cloud 30x faster and achieve their segmentation and application migration goals across any infrastructure.

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA ImageChange Image

‹ Back to Guardicore Blog