Commonalities and differences between securing VM-based Workloads on Microsoft Azure and the Google Cloud Platform (GCP)
When moving from an on-premises data center to the cloud, architecting and engineering a secure network is the first challenge. The days of buying hardware and wiring devices are over. Instead, cloud network engineers configure the network with templates and web GUIs. A secure network setup reflects three needs:
- Workload structuring and separation, i.e., the high-level network design
- Fine-grain connectivity and security controls
- Enabling intra- and inter-company interactions between systems, applications, and components
For these three areas, we answer the following questions. First, which network architectural features do cloud vendors offer to reach these aims? What are the commonalities and differences between cloud providers such as Microsoft Azure or Google’s GCP? The latter helps architects to understand underlying principles rather than just product features.
The High-Level Network Design for Clouds
Four layers structure the cloud network security (Figure 1), with the second-highest being the most intuitive to understand. This second-highest layer corresponds to classic, simple on-prem networks. An IP range constitutes a network or network zone and gets a name and identifier. It is a moment of creation: let there be a network with the IP range 10.0.0.0 to 10.15.255.255 and call it “production environment headquarters.” In general, network architects should choose addresses from the private IP range, i.e., from 10.0.0.0 to 10.255.255.255, 172.16.0.0 to 172.31.255.255, or 192.168.0.0 to 192.168.255.255.
Zoning concepts typically reflect one or more of the following dimensions to structure the network landscape and workloads:
- Geographical locations: Germany, Switzerland, Singapore, and China – or New York, Boston, San Francisco
- Stages, for example, production, preproduction, test, and development
- Business lines, e.g., investment banking, retail banking, corporate banking, private banking, or wealth management
- Data sensitivity classes such as secret, confidential, internal, or public
IT departments subdivide network zones into subnets, each taking over a subset of the zone’s IP range. A zone with IP range 10.0.0.0 to 10.15.255.255, for example, cloud have two subnets: 10.0.0.0 to 10.0.255.255 and 10.4.0.0 to 10.4.63.255. Subnets allow for finer granular structuring.
Zoning is a complex task – and that is why cloud security architects have to support the design process. Should all VMs of an application reside in one subnet – or all VMs belonging to the same application or solution cluster? Should frontend and backend components reside in one subnet, or should they be in entirely different zones? Security architecture is one (important) stakeholder. Network architecture and business and enterprise architecture have a say as well.
Implementing High-Level Zoning Concepts in Azure
In Azure, VNets and Subnets are the features for implementing zones and subnets. A VNet’s IP range is coherent. IP addresses within a VNet are unique, though different VNets can have overlapping IP addresses. Within VNet A with all its subnets, there can be only one VM with IP 10.10.10.10. However, if an IT department has two VNets Z1 and Z2, both can contain one VM with IP 10.10.10.10. Furthermore, all resources of a VNet, such as VMs, reside in one region.
A fundamental security aspect is the connectivity between VNets and Subnets. Which VMs can talk with each other by default, which ones not, and which kind of shielding can engineers implement? Azure isolates VNets with their subnets and VMs from each other. Without explicit configuration, a VM in one VNet cannot connect to a VM in a different VNet. In contrast, VMs within a VNet can contact each other by default, even if residing in different Subnets. So, placing VMs in two subnets of the same VNet does not provide any extra level of security without additional (fine-granular level) measures we discuss later.
Implementing High-Level Zoning Concepts in GCP
VPC Networks and Subnets are GCP’s corresponding concepts for structuring a network and its IP range. VMs placed in a VPC Network, whether in the same or different GCP Subnets, can reach every other VM within the VPC. GCP allows VPCs to have overlapping IP ranges, as Figure 2 illustrates. In the example, there are VPC Networks with the IP address ranges 10.0.0.0/16 and one with 10.0.0.0/20, which obviously overlap.
A GCP Subnet is a feature preventing chaos in large VPC Networks. A Subnet takes over parts of the IP range of the VPC Network to which it belongs. All IPs are unique within a Subnet, implying that Subnet IP ranges do not overlap. In GCP, Subnets also have a geographical aspect. Resources within a GCP Subnet reside in one geographic GCP region.
The Management Layers
The need for a sophisticated management layer for organizing hundreds or more VNets or VPC Networks is evident when looking at the companies and customers the cloud vendors target. They do not want (only) small companies or startups paying for 5 or 10 VMs. They want the big fishes as well: corporations listed in the Swiss Market Index, the Eurostoxx 50, etc. These companies need thousands or ten-thousands of VMs. Two layers – zones such as VNets or VPCs plus Subnets – are insufficient for large clouds. Thus, the cloud providers introduced additional management layers on top.
When trying out the clouds the first time, many get the impression that accounts are the highest ordering principle, typically represented by an email address and an additional name or identifier. But cloud providers offer features for combining accounts to GPC Organizations in the Google world and Azure Management Groups for Microsoft. They help for governance and billing purposes or identities. These features help companies for which the move to the cloud was a decentral (chaotic) grass-root movement – or when company networks and cloud infrastructures evolve over the years due to mergers and acquisitions.
Below an Azure Subscription, aka an account, Microsoft provides the concept of Resource Groups, into which all VMs, VNets, and all other components are placed. Google has the concept of GCP Projects, which corresponds to the Azure Resource Groups. But there is one big difference: Google’s idea of GCP Folders, a tree-like flexible structure with multiple layers and nesting. GCP folders help in modeling and structuring workloads based on various dimensions such as business units, geographic regions, or stages with ease (Figure 3). In contrast, Azure has a rigid 3-levels model: organization, subscription, resource group.
The management layers typically provide options to enforce and monitor configurations (“policies”). However, here we focus on directly relevant network security configuration options. In this context, GCP has one related feature: Hierarchical Firewall Policies. GCP allows associating organizations and folders with such policies, which enforce specific firewall rules for all folders and VMs hierarchically lower (or to delegate decisions to lower levels) or restricting the applicability of rules to specific VMs by specifying target networks or service accounts.
Fine-Granular Access Control for Azure
Microsoft and Google follow different philosophies regarding fine-granular access control (and firewalls) within a VNet or VPC Network. Azure provides more – and more rigid – concepts making it essential for engineers to understand the interplay. GCP has fewer, though partially more general or flexible features.
The Azure World
Azure enables architects to secure networks with three concepts: network security groups, application security groups, and firewalls.
A network security group (NSG) is a firewall attached to Azure Subnets or individual VMs (specifically: a VM’s network interface), enabling engineers to control connectivity even within Azure Subnets. An NSG defines which IP addresses outside the Azure Subnet can reach which inside IP address – and vice versa. The rules can apply to individual IPs, groups of IPs, or the complete Subnet. If VMs serve different purposes – frontend, backend, or database servers – NSGs allow configuring their exposure to other subnets or the global internet differently, based on exact necessities and the risk appetite.
Application Security Group (ASG) are a refinement of NSGs and overcome their biggest downside: IP-based rules. IP addresses might change and/or if the number of frontend servers handling customer requests increases, all relevant rules in the NSG rule sets require a modification.
An ASG is a set of VMs with a dedicated tag. They are not (!) Identified based on an IP. Instead of opening the HTTPS port to 10.5.10.26 in a pure NSG setup, ASGs allow opening the HTPPS port for all VMs of the subnet belonging to the ASG “asg-frontend-vms”. The ASG might contain a VM with the IP 10.5.10.26, but any other VM within the subnet would be governed by the same rules as soon, and as long the VM is part of the ASG.
Figure 4 provides an example of an ASG – “asg-frontend-api-servers” – on the left. The sample VM on the right belongs to precisely one ASG: “asg-frontend-api-servers”.
An ASG impacts the VMs associated with it once the ASG becomes part of an NSG. Suppose an NSG allows HTTPS traffic to a VM handling incoming API requests. Then, rather than explicitly listing IPs in the NSG (and updating them when starting or stopping VMs), the engineers create an ASG and add the relevant VMs to this ASG.
In addition to VNets, Subnets, Network Security Groups, and Application Security Groups, Azure also allows configuring a dedicated security service, Azure Firewall. Such a firewall helps block unwanted traffic. It is a component typically placed at the perimeter between company VNets and the global internet. In contrast, VNets, Subnets, NSGs, and ASGs help control the company-internal network traffic.
The various components and features might be overwhelming when looked at the first time – and seem to be overlapping. That’s one way to see it. The other way to look at all these features is that managers and architects can achieve the same goal with overlapping features – and thus share responsibilities between teams more efficiently. A high level of network security when going live is necessary, but it is not enough. IT departments have to manage and maintain all network components to stay on the same security level in the years to come. More sophisticated and partially overlapping features make the work distribution more manageable.
Fine-Granular Access Control for GCP
NSGs on the Subnets and the VMs level together with ASGs – Azure provides various options for fine-granular connectivity configuration. In contrast, GCP provides firewalls (only) on the VPC level. Rules applying only to one VM are part of the VPC firewall rulesets. This minor technical finesse impacts network or firewall processes in IT organizations. Rules on the VPC level favor central firewall management, making it more challenging to delegate (some) firewall rule decisions to individual application teams.
Figure 6 illustrates options for defining VPC Firewall Rules. A rule can apply to all VMs of a VPC Network. It can apply to specifically tagged VMs (like the ASG concept in Azure) or VM with specific service accounts. The latter is a concept not found in Azure. When defining firewall rules, architects can configure the applicability further related to where the network traffic comes from based on IP ranges, source tags, or a service account within the same VPC.
To sum things up: firewall rules and connectivity in GCP and Azure provide different concepts. The result: different setups and organizational approaches. Just one final warning for GCP: The GCP portal presents default rules for opening SSH, ICMP, or even all ingress traffic. These rules simplify the first steps in GCP for newbies – and might result in risky configurations.
Peering
By default, VMs in different VNets and VPC Networks cannot communicate. However, there are solutions for inter-zone communication – and they do not require traffic via the internet: Azure VNet Peering and Google Cloud VPC Network Peering.
Suppose one zone has the IP range 10.0.0.0 to 10.0.255.255 and a second one 10.4.0.0 to 10.4.0.255, VMs, VMs can interact easily. No colliding IP addresses cause issues when determining an actual target VM for traffic. However, why should cloud architects peer zones, thereby allowing communications between VMs placed in two zones to isolate the workloads?
Peering is not beneficial for two-zone networks but for networks with many. GCP, for example, allows peering VPC Networks of the same or different GCP Projects. The VPC Networks can even belong to different GCP Organizations. VMs in peered VPC Networks can communicate with each other if not prevented by firewall rules. The twist is the non-transitive nature of peering. Suppose three VMs – VM1, VM2, and VM3 – are in three different VPC Networks: Z1, Z2, and Z3. Peering Z1 and Z2 plus Z2 and Z3 enables the communication between VM1 and VM2 and between VM2 and VM3, but not between VM1 and VM3. Non-transitive peering is vital when setting up typical hub-and-spoke network designs (Figure 7).
VNet Peering in Azure comes with different features and two main configuration options:
- Enabling inbound and/or outbound traffic separately – not just allowing or disabling bidirectional traffic as in GCP
- Restricting or approving traffic not originating in the peered VNet, also a feature not available in GCP
Peering is a simple yet powerful concept. It helps enable and control company-internal network communications, especially for setting up zones hosting communication middleware or management components.
Connecting VMs and the Internet
Public IPs for VMs are evil; this is the first message a security architect should hammer into all engineers’ minds. Vulnerabilities on the VM or application layers or simple misconfigurations are potential entry doors for attackers. However, locking up all VMs is also not possible. Admins cannot plugin keyboards to VMs to administer them. They come via the network as well.
The solution? Intermediate components or services that forward only parts of the network traffic (e.g., only TCP while blocking UDP traffic) and potentially even inspect the traffic. In particular, the following components and services are relevant:
- Network Address Translation (NAT) Services such as the GCP Cloud NAT or Azure Virtual Network NAT. They allow VMs without public IPs to invoke services on the internet without exposing the VMs to external attackers.
- Load Balancers such as the Microsoft Azure Load Balancer or Google’s Cloud Load Balancer. Their primary purpose is to distribute high loads to multiple backend VMs, hiding this fact for the invoking applications. At the same time, a load balancer reduces the attack surface, first by preventing direct access to VMs and, second, by limiting the traffic that gets through, e.g., to the HTTP(S) protocol.
- Web Application Firewalls restrict even the content of http(s) requests, preventing, e.g., attacks with malformed invocations.
- Azure Bastion allows admins to connect to and manage VMs without publicly exposing a VM with an IP. It works as follows: administrators connect first to the Azure portal. From there, the bastion host provides access to the VMs. A bastion host is a hardened server, making it less likely that (unneeded) components potentially have security issues.
These services are worth more profound analysis. Plus, for some cases, 3rd party solutions are an option, not only the ones from the cloud vendors. But all this is another long story, as is protecting VMs and their workloads against malware and other attacks.
To conclude, Microsoft Azure and Google’s GCP provide similar yet different features for creating and securing networks. Engineers aiming at getting applications to run on VMs benefit from their similarity. In contrast, security architects have a different perspective. They do not want to open up network connections. They must close the obvious and hidden wholes in the network design. For them, it is essential to understand all these little features and details where GCP and Azure – or any other public cloud – differ. It is less searching for easter eggs on a sunny day in spring. It is more isolating and avoiding mines in challenging terrain or stormy sea.