Category Archives: AWS

AWS Backup for IaaS Workloads: The Basics

Use Cases & Technologies

Cloud providers market their platform-as-a-service features, e.g., for AI or IoT. In contrast, many companies currently focus on moving their existing Windows and Linux applications into the cloud. “Lift-and-shift” is the motto. Rearchitecting is a task for later. As you might remember, neither the rise of C nor Java caused the immediate death of COBOL applications. Thus, in the cloud, the ability to backup IaaS workload is one of the most crucial topics when moving to the cloud – and AWS Backup is Amazon’s cloud-native solution for backups. How does it help?

What does the workload look like, and which use cases should be supported? With the focus on IaaS, the VMs with their disks, file storage, and object storage are in place. The latter is a web service, but object storage is the new main storage variant in the cloud for new components configured or developed in a public cloud environment.

AWS Backup supports a long list of AWS services, including the following IaaS-related ones:

EC2, AWS’s term for VMs
Amazon Elastic Block Store (Amazon EBS), aka block storage for EC2 instances
Amazon Elastic File System (Amazon EFS), a file system solution working with EC2 with Linux operating system, a technology typically needed by many traditional applications
Amazon FSx supporting various file systems, including Windows File Server or NetApp ONTAP
Amazon Simple Storage Service (Amazon S3) as the object storage service

So, this blog post covers how to organize backups for IaaS workload running on Windows and Linux EC2 instances with attached EBS volumes on which applications run relying on EFS and FX file systems for classic file storage, plus incorporating S3 buckets as the new dominant storage type in clouds. Figure 1 visualizes this.

Typical technical use cases for backup solutions are:

Ad-hoc Backups, i.e., engineers make a manual copy before complex, risky changes
Periodic Backups to prepare if something goes wrong and one has to restore a recent or long-ago state
Continuous backups for Point-in-Time Recovery (PITR) as a solution covering both use cases above

AWS links to the solutions for 1 and 2 directly on the dashboard page for AWS in the AWS portal (Figure 2), whereas number 3 is part of the configuration options for 2.

A fourth aspect is geo-redundancy, important for disaster recovery events, such as the loss of complete data centers or region-wide blackouts (4). Finally, we look at Frameworks, a governance and compliance tool (5).

AWS Backup Plan

Setting up an AWS Backup Plan in AWS Backup means configuring one or more backup rules (Figure 3, 1) and one or more resource assignments (Figure 3, 2).

There are many ways to configure the various settings for backup rules. Thus, we focus on the settings for two scenarios. The first one is for periodic backups that have to be stored for years and which should be safe even in case of larger catastrophes. Thus, the configuration shown in Figure 4 sets the interval to four hours, whereas the retention time is 10 years, and a copy is stored in a different region than the original backup.

Figure 4: AWS Backup Plan for periodic backups with long retention and copy to another region

The second scenario is about backups needed for misconfigurations in operations. A wrong patch is deployed, an admin – by mistake – drops production data, etc. For this scenario, a continuous backup is perfect, allowing for point-in-time recovery (Figure 5). Just a warning: the fine print states that the continuous backup is available only for three AWS services: RDS, S3, and SAP Hana on EC2. For other services, the “hourly” periodicity counts. So, when looking at the IaaS services in AWS mentioned in the beginning, PITR is not available for EBS Volumes and EFS File Systems which are assigned to EC2 instances.

Figure 5: AWS Backup Rule with continuous backup for Point-in-Time Recovery (PITR)

The resource assignment defines what is in scope for the backup. It allows filtering based on the resource type (e.g., EBS, S3) and tags assigned to resources (Figure 6). Once resource assignments and backup rules are defined, everything is ready, and Azure performs the (next) backup as specified in the rule.

Figure 6: Resource assignments in AWS Backup

Frameworks for AWS Backup

Backups help bring up an application or a complete data center again when something goes severely wrong. However, if backups are missing in such an emergency, this is not just bad luck but a result of insufficient organization, governance, and oversight. AWS helps to prevent such situations by providing Frameworks for AWS Backups. They define the expected state (or configurations) for backups helping to identify a gap between how the world should be and how it is. It is a pretty exhaustive list, as Figure 3 illustrates.

Figure 7: Configuration options for AWS Backup Frameworks in the AWS Portal

On-Demand Backups

On-demand backups are easy to configure and start in AWS: Choose the resource type and a concrete instance, make sure that the retention period is set correctly (remember: too many unnecessary backups require a lot of storage), and maybe define the vault to be used for storing the backup, and you are done (Figure 8).

The most notable aspect – besides that there is a single GUI for creating on-demand backups for all relevant services – is that one can back up just one instance of one resource. Sufficient for small applications, but for anything slightly complex, the tag-based definition of what is part of a backup (as known from the periodic backups) is much more efficient.

Figure 8: On-demand Backups in AWS Backup

Additional Backup Services in AWS

AWS unites backup features for all resource types in the single AWS Backup GUI. Still, there are other options to make backups. Amazon Data Lifecycle Manager can back up EBS Volumes. The EFS-to-EFS backup solution is an option for file systems. And, S3 has a versioning feature allowing to restore previous versions of the object; it even has to be activated for backups. However, from a governance perspective, having all backups in one place – AWS Backup or a 3^rd party solution – eases the governance. And do not forget: AWS Backup comes with this clever and easy-to-use feature for governance, AWS Backup Frameworks. Use it!

Backups for Selected Database-as-a-Service Services in Azure and AWS

Database-as-a-Service (DBaaS) is one step for IT departments to focus on software development, configuring and integrating 3^rd party solutions, and maintaining and optimizing application operations rather than spending time with infrastructure and middleware topics. But even DBaaS does not make backups obsolete. Certainly, engineers should not dump production databases, but what if they do it by mistake? Also, DBaaS does not come with a 100% availability guarantee, even though you can minizine risks with redundancy – but not every database might be worth such ongoing costs. So, what are the options?

In the following, we look at backup features for selected DBaaS offerings:

Azure Cosmos DB
Azure SQL Server
Amazon DynamoDB

DBaaS implies that a web service provides database functionality via its API. Engineers and architects can only design backup strategies based on this API. So, what are the concrete features and options for the mentioned services?

Backups for Azure Cosmos DB

The backup options for most Cosmos DB variants – for NoSQL, MongoDB, Apache Casandra, Table, and Apache Gremlin, excluding PostgreSQL – are the same. For them, the first configuration decision is between continuous and periodic backup. A periodic backup means that at regular points in time, Azure performs a backup (Figure 1, A). Customers can set the periodicity to any value between 1 and 24 hours (Figure 1, B). The retention period can be up to 30 days (Figure 1, C). Azure keeps backups for so long; then Azure it them automatically. Alternatively, engineers can configure continuous backups (Figure 1, D). Then, they can roll back to any point in time within the last 7 or 30 days.

Figure 1: Backup Configurations for Azure Cosmos DB

To prepare for more extensive outages, natural disasters, larger electricity blackouts, etc., Azure copies every backup by default to a paired region, such as from Switzerland (North) to Switzerland (West) or from France (Central) to France (South). If configured via CLI, options are geo-, zone-, and locally redundant storage. Self-service restores from the second region are not self-service but via service ticket and Microsoft support. Thus, larger events might result in a high volume of service requests and heavy delays for the restoration. Geo-redundancy for the service itself (not just for backups) can be necessary for business-critical applications.

Azure DB provides a sophisticated, easy-to-use solution for device failures (even the redundancy alone covers most cases) and operational mistakes, i.e., somebody realizes she deleted some data yesterday she did not intend to. There are shortcomings in other use cases:

No support for ad-hoc backups. They are helpful, e.g., before deploying complicated patches that change the data model and data. Continuous backups cover them, but there is no option with periodic backups.
The absolute limit of 30 days retention time can be too short, depending on the risk appetite and the setup. Cyberattackers might be in the system for a while unnoticed. Data manipulations not causing immediate issues might not be covered. Also, operational mistakes go unnoticed if data is only used monthly, quarterly, or yearly for, e.g., dedicated end-of-month, end-of-quarter, or end-of-year processing.

The statement is not that there are no solutions. Exporting data by script or restoring them into a different database instance might work. However, it is not an out-of-the-box feature. It requires customer engineering – or even the need for a 3^rd party backup solution.

Backups for Azure SQL Server

Azure SQL server is a managed service – or a database-as-a-service – offered by Microsoft. While customers must make storage and server choices (unlike Cosmos, which scales up and down as needed), Microsofts performs (and automates) the database administration, including patching.

Azure SQL Server has point-in-time recovery, i.e., continuous backups enabling the restoration of the data of a specific point in time within the last 1 to 35 days. Internally, Azure performs full backups weekly and differential backups, depending on the configuration, every 12 or 24 hours (Figure 2, A). In addition, Azure backups transactional logs every 10 minutes. If the database crashes, a maximum of 10 minutes of processed transactions are lost. In other words, the Recovery Point Objective (RPO) is 10 minutes.

Figure 2: Backup Configurations for Azure SQL

If engineers do not change the backup policy, backups are geo-redundant (Figure 3), i.e., copied to a paired zone, as discussed for the Cosmos DB. If customers have to rely on the geo-redundant copy, the RPO is 1 hour instead of 10 minutes.

Figure 3: Backup Redundancy Options for Azure SQL

A significant difference to Cosmos DB is the longer chance to put backup aside. A dedicated configuration option exists for how long to keep a weekly, monthly, and yearly backup – and the setting allows one to configure multiple years, not just some weeks.

The Azure documentation states that the restore time is usually “less than 12 hours”, though it might take considerably longer. Business-critical applications cannot rely on having a backup on the other side of the globe. They might have to configure the service itself as geo-redundant.

Backups for AWS Dynamo DB

Dynamo DB is a NoSQL database-as-a-service offering from AWS. Amazon fully integrated Dynamo DB with AWS Backup, AWS’s service for managing backups on an enterprise level. While there might also be legacy options, I focus on this backup functionality. After discussing two Azure services, the idea is to see how AWS approaches backups. And one difference was already mentioned: the full integration into the standard AWS backup solution.

For AWS Dynamo DB backups, the following paragraphs cover three aspects:

Continuous backups for Point-in-Time Recovery (PITR)
Ad-hoc Backups
Periodic Backups

If explicitly enabled, point-in-time recovery based on continuous backups (Figure 4, A) is available in case of an issue. The retention time is 35 days, i.e., a restore to any given point within the last 35 days is possible.

Figure 4: Setting for Point-in-Time Recovery (PITR)

For ad-hoc backups, the first statement is: AWS provides the feature for Dynamo DB (Figure 5); there is no need for a workaround like in Azure. Engineers can configure some details – when it should start, how long it can take, in which vault to store it, and when to move it to cold storage – the most important being the retention time. When should AWS delete the backup? Here, the service allows choosing several years.

Figure 5: Configuring On-Demand Backups for Dynamo DB in AWS

Finally, there is the option to schedule periodic backups. The configuration options are nearly the same as in Figure 5, though engineers must define the periodicity and the backup window for periodic backups. But why periodic backups when there is point-in-time recovery?

The main reason is that retention time point-in-time recovery of 35 days, whereas periodic backups can be kept for years. Other reasons – exotic ones – are: maybe cost savings, or maybe it is a legacy architectural design from before point-in-time recovery was available.

To conclude: The discussed database-as-a-service offerings from Azure and AWS all come with backup functionality, but they differ for all of them. Cloud and security architects do good when checking the company-internal backup requirements and policies and validating whether and how they can implement them. This validation is necessary per cloud and service.

Application-Consistent Backups for VM Workloads in the Cloud

Application backups are not as simple as in the world of database lectures. Have you ever heard about the ACID properties with “A” representing “atomicity” and “D” durability? After a database commit, everything is on disk. Nothing can get lost. Plus, all commands within the transaction are on disk – or everything is undone. When a database crashes and restarts, its data reflects precisely the effects of all committed transactions. It works as designed if an application relies on exactly one database. Sadly, applications are more complex, as Figure 1 illustrates. They tend to access not only one database but several in parallel, plus write data to files and file shares – and disks attached to VMs.

*Figure 1: Real-live Backup Scenarios for Applications and their VMs*

Understanding Application-Consistent Backups and their Benefits

While private users can save their data by manually copying all their files to a different hard disk or the cloud, this approach is too simplistic for larger applications, even if we focus only on VM disks. If it is a large disk, files with names starting with “A” might get copied at 5:00 am and those starting with “Z” at 5:05 am. Thus, the “Z-files” could have been changed between 5:00 am and 5:05 am, making the A-files and the Z-files inconsistent. Furthermore, copying open files causes issues, changes might be only in the memory and not written to disk, or there could be pending I/O transactions. Thus, a clean-up of files and their data might be necessary when restarting the application using such file copies. The clean-up can be automated or a manual task for engineers. In any case, it prolongs application outages.

Application consistency overcomes this challenge. The idea is to perform backups such that applications run after restarting a VM without any clean-up actions. Thus, business-critical applications benefit from lower downtimes, i.e., they can provide better Recovery Time Objectives. Plus, the organization benefits in crisis events, aka business continuity management situations. When a company has to evacuate all workloads to a different cloud data center, the engineers can rely on the VMs to restart and applications to come up without manual intervention. The engineers can focus on fixing more complex issues, e.g., related to integrations with other components, rather than the complete IT department being stuck with cleaning up file systems.

The most prominent solution for application-consistent backups is Microsoft’s Windows Volume Shadow Copy Service (VSS). Microsoft products come with it, and your organization’s applications (and your third-party software provider) can also implement it for Windows workloads. The exact details of VSS are, however, not so relevant from a cloud security architecture perspective. What matters is the available features for application-consistent backups in AWS, GCP, and Azure.

Application-Consistent Backups in the Cloud

Azure comes with a solution for application-consistent backups for Linux VMs, at least for those deployed with the Azure Resource Manager and not the Service Manager. It is a framework enabling application developers or operations specialists to integrate pre- and post-scripts into Azure’s backup process. Pre-scripts can invoke, for example, APIs of the application to tell the application to finish off all “write” activities. Then, Azure performs the backup copy. Afterward, Azure invokes the post-script, and normal operations continue. For this purpose, a configuration file (Figure 2) must be on all relevant VMs.

Figure 2: Configuration File for Application Consistent Backups of Linux VMs in Azure. Highlighted are the configuration of pre- and post-backup scripts. Other settings are for defining parameters and handling exceptions and failures.

The pre- and post-scripts and this configuration file are critical from a security perspective. They run with root privileges. Thus, they must be secured to prevent attackers having gained access to the VM to change these settings and execute malicious code as root.

The situation for Windows VMs on Azure is much easier compared to the Linux world. By default, all VM backups use the Microsoft VSS service. So, if (and only if!) the applications on the VM implement VSS, all backups are application-consistent without the need for extra VM configurations. If not, the disk backup is not application- but only file-consistent.

Finally, a quick remark on the AWS and the Google Cloud Platform (GCP) features. Both follow the same approach as Azure: pre- and post-scripts for Linux VMs, and Microsoft’s VSS for Windows VMs.

Back to the Big Picture

To conclude: Application-consistent backups reduce downtimes of applications by reducing the work for engineers after crashes or VM evacuations. However, the term application consistency can be misleading. When looking again at Figure 1, it is clear that the consistency between the VM disks and the database backups is not guaranteed. Applications have to cover the challenge that the VM disk backup is from 4:07 am, one database backup is from 4:05, and the second from 4:17. So, even with application-consistent backups, there are still exciting tasks and challenges for engineers in the area of backup and recovery!

The Hub and Spoke Network Pattern

Ever wondered how large corporations design their networks in the cloud? The hub-and-spoke pattern is probably the most important to understand their on-prem and cloud network designs, no matter whether an IT department runs on GCP, AWS, Azure, or any other cloud provider.

Mesh is the “free love” vision transferred to enterprise network designs, whereas the hub-and-spoke pattern implements more of a harem concept. But to start with the big picture: The focus is not on individual applications. Network design looks at how to organize connectivity and isolation for networks with hundreds or even thousands of applications. These applications serve business needs, such as SAP or self-developed insurance solutions. The design must consider as well technical or security-related applications such as Web Application Firewalls, IAM services, Messaging Middleware, or Data Loss Prevention solutions.

Hub-and-Spoke vs Mesh Network

The first aspect is zoning. IT department group their resources. VMs belonging to an HR application might be separated from air traffic control systems. They are in two separate zones. Network design patterns define how such zones interact, i.e., which zones have connectivity and can interact directly with which other. When looking at concepts, two patterns are particularly important: the mesh network pattern and the hub-and-spoke pattern (Figure 1).

A hub-and-spoke network is hierarchical. One zone is the hub. It connects with every other zone – the spokes. It is a classic 1-to-n relationship known from harems. Resources in two separate spokes zones always interact via the hub. A mesh network builds on the free-love-idea: every zone can interact with every other zone (and zones might forward traffic to other zones in the absence of direct connectivity).

Why Hub-and-Spoke Patterns are Popular

Mesh networks (and free love) cause a mess if relationships and connectivity are not tightly managed. In contrast, the hierarchical model of hub-and-spoke networks allows for centralized governance and operations of components and eases controlling the network traffic flow.

Routing all traffic to and from the internet through one hub zone eases control and security. In this zone, DDoS protection, the (network-) data-loss-prevention solution, and other critical internet-connectivity-related applications must run. Then, perimeter security is in place for the complete data center.

One dedicated zone with internet connectivity also eases setting up web application firewalls (WAFs), which usually require integration with an IAM solution. Having both in dedicated central zones is much easier than having WAFs in ten zones and IAM solutions in seven.

Centralizing components in a hub is useful beyond internet connectivity: (Azure or on-prem) Active Directories, messaging middleware, application performance monitoring, and many more benefit from centralization. This insight does not mean building one monolithic hub with all centralized applications. Enterprise-scale networks can have several hubs, e.g., one serving as a management zone with monitoring and another for internet connectivity (Figure 2).

Figure 2: Overlapping Hub Networks Example with an Internet-facing External Zone and an internal Management Zone e.g., for Monitoring

Grouping Criteria for Zoning

Zones are groups of (cloud) resources, e.g., VMs, isolated from other resources and VMs in other zones. The grouping criteria differ between organizations, companies, and departments. Typical grouping criteria are (Figure 2):

Applications or groups of applications forming one solution
Stages such as Production, Preproduction, Integration, Test, and Development.
Teams or departments managing the resources and potentially following different change management processes
Sensitivity of the data of applications

Cloud Features for Implementing Zones

Cloud providers do not offer a “zone” or a “hub-and-spoke network” feature. They provide sophisticated building blocks for structuring workloads and networks. Tenants, Virtual Private Clouds (VPCs), and Subnets are features for organizing networks and resources hierarchically. AWS, Azure, GCP, and all the others provide routing tables for – well – routing. Access Control Lists and Network Security Groups (the exact names differ between the clouds) enable engineers to implement firewalls for blocking or allowing specific network traffic. The clouds come with Internet Gateways, NATs, etc. Everything is ready for you to implement effective and secure network designs. However, whether you do that and how many free-love-like connections you allow is up to you and all other network architects of the world.

AWS Network Zoning and Security for IaaS Workloads

A couple of months ago, I posted an article about IaaS and network perimeter protection for GCP and Azure. Today, I look at the related concepts in Amazon Web Services (AWS).

Structuring Networks in AWS

The concept of the Amazon Virtual Privat Cloud (VPC) and Subnets are the AWS terms for structuring the network. A VPC consists of a continuous IP address range represented by a CIDR block, e.g., 10.0.16.0/24. A network design with overlapping IP ranges for different VPCs is technically possible though it can result in issues when connecting, aka peering, such VPCs later.

The network design divides the IP range of a VPC further into Subnets. Subnets are the canvases into which engineers deploy VMs – EC2 instances in AWS speak, short for Elastic Compute Cloud (Figure 1). Subnets belonging to the same VPC must not have overlapping IP ranges, but they do not have to consume the complete range. Non-used IP ranges ease responding to changing business needs and IT landscapes. So, everything is similar to what we know from GCP and Azure. The way subnets are protected, however, differs slightly.

Figure 1: Network diagram with VPCs and Subnets and corresponding screens in the AWS GUI

Controlling Network Traffic with Network ACLs

One AWS concept for securing the subnet perimeter is the Network Access Control Lists (Network ACLs) feature. Network ACLs allow or deny ingress and egress traffic. They act as a stateless firewall that checks traffic against a simple ruleset. Inbound rules (Figure 2, A) allow or deny traffic from specified IP addresses outside the subnet. Outbound rules (Figure 2, B) inspect traffic leaving the subnet. In addition to the source or target IP, rules can consider the port number (e.g., 23, 8080) and the protocol type. If there are conflicting rules, the priority of the rules is the deciding factor.

Figure 2: Sample Network ACL

When designing a concrete network, architects should be aware of default behavior and setups – and how Network ACLs, VPCs, and subnets relate. First, each VPC has a default Network ACL. It allows all ingress and egress traffic to and from the subnets within the VPC. However, the VPC (by default) has no connectivity with the internet, external networks, or other AWS VPC, be it in the same or another AWS account (we discuss this in the following section). Engineers can either change the default Network ACL or create a new one and associate it with one or more subnets of the same VPC.

A simplistic example is a VPC hosting several web applications. Each application might have a frontend and a backend subnet. Frontend subnets have open HTTP and HTTPS ports plus an open JDBC port to connect to the database backend subnets, which allow only JDBC traffic. Figure 3 illustrates such a scenario. VPC A has a Network ACL “Frontends” associated with Subnets SN1 and SN2. There is a differenet second Network ACL “DB Backend” associated with Subnet SN3. VPC A has a default Network ACL, but it is not in use. In contrast, the default Network ACL for VPC B is associated with the only subnet there, SN7.

Figure 3: Understanding Network Access Control Lists NACLs) for AWS IaaS Workloads

Security Groups

In addition to Network ACLs, AWS provides a more sophisticated second feature for controlling network traffic: Security Groups. While Network ACLs impact the subnet perimeter, Security Groups focus on the traffic going through the network interfaces of individual VMs (or EC2 instances), for which AWS uses the term Elastic Network Interface (ENI). Applicable Security Groups can be configured per ENI. Each ENI must have at least one but can also have multiple Security Groups associated.

The rules for Security Groups specify the protocol as well as the source, respectively target IP addresses on the target with the following main differences:

Security Groups control traffic going through the ENI. Thus, they can also restrict traffic within a subnet, and different VMs within the same network can have different applicable Security Groups.
Security Groups act as stateful firewalls. Only the initiating in- or outbound traffic has to be allowed. Reply traffic is allowed automatically.
There are only “allow” rules, no “deny” rules. Adding an additional Security Group to a VM might result in more ports, protocols, and target or source IPs (or Security Groups) being allowed. Adding rules to Security Groups never restricts or forbids traffic.
By default, all outgoing traffic is allowed, and all incoming traffic is denied.

An important aspect from the security perspective is that EC2 instances can have secondary ENIs with a second IPv4 address which puts EC2 instances logically into two subnets. There are even more options in a world with IPv6.

Figure 4 illustrates a network design with two subnets in one VPC with one default Security Group and a newly created second security group named SG-1. Both apply to the ENI2 network interface, whereas all other VMs only have one security group. The example also illustrates two network interfaces being attached to the same EC2 instance, which, as a result, becomes part of subnet SN1 and SN2.

Figure 4: Security Groups as Means of Protecting IaaS Workloads in AWS

Connecting VPCs with the Outside

A VPC without connectivity to other VPCs or the internet is securely protected against outside attacks – and completely useless. Suppose users, admins, customers, or other programs cannot start any processing because no one can reach the EC2 instances, and they cannot access outcomes and results. In that case, there is no need to have a VPC At least some EC2 instances, respectively, subnets and VPCs they belong to must interact with external servers and services. It creates an attack surface, but that is unavoidable and taggling the risks is part of the network security design.

VPC Peering is the essential concept enabling communication between VPCs belonging to the same AWS account or other AWS accounts within the same AWS organization. Components and VMs in peered VPCs interact as if they belong to the same VPC. All traffic stays internal. No traffic goes via the internet, keeping the attack surface low regarding outside attacks.

Shared VPCs are a similar concept. AWS accounts can share a VPC with other AWS accounts within the same AWS organization. Then, the other AWS accounts can create and manage components within this shared VPC. However, while everyone uses the same Shared VPC, resources of the different AWS accounts are managed and seen only by the AWS accounts to which they belong. AWS markets Shared VPCs as a feature to reduce the number of VPCs while still ensuring high interconnectivity, separate billing, and isolation or access control.

VPC Peering and Shared VPCs are powerful concepts but are limited to the IT landscape of one single company. However, Internet Gateways and NAT Gateways are key AWS features when companies interact with others. In this context, Architects have to understand: Do they need only traffic from within a subnet or VPC to the internet, or should EC2 instances be directly reachable from the internet? The latter, obviously, poses a higher attack surface and security risk since attackers can directly reach out to VMs.

Exposure to the internet requires EC2 instances to have a Public IP and the subnet/VPC to have an Internet Gateway component with a suitable routing table. Such a design allows ingress and egress traffic from and to the internet.

If EC2 instances only need to initiate egress traffic, a NAT Gateway is the solution. It takes invocations from EC2 instances within a subnet, routes them to external IPs, and sends replies back to the initiating EC2 instance. On the way, the NAT replaces the internal caller IP with its external IP and vice versa for replies.

AWS comes with various additional and less-frequently used concepts. Three features or services center around interactions between an on-premise and cloud infrastructure: AWS Virtual Private Network, AWS Direct Connect (dedicated, private connections), and AWS Transit Gateways (managing hybrid networks).

Finally, there is a connectivity or network feature named AWS Private Link. It is less relevant for the IaaS world of EC2 but crucial for application landscapes incorporating AWS PaaS and DBaaS services. I will cover the concept in one of my next posts.

Cloud Security Architecture – AI and MLOps – Testing

Klaus Haller's Homepage and Publications