Cloud Security Architecture – AI and MLOps – Testing

Klaus Haller's Homepage and Publications

Skip to content
  • Home
  • 505 grams that change your life as an AI specialist!
  • Artificial Intelligence – MLOps – Information Management
    • Making CRISP-DM Work for Embedded Analytics
    • Quality Assurance and Machine Learning Models
    • What Google Trends teaches us about Jupyter Notebook
    • Tutorial: First Steps in Image Classification using Python, Jupyter Notebook and Gluon-CV on AWS SageMaker
    • 11 YouTube Videos: ReliableInformationSystems.tv
  • Data Migration
    • Article: A detailed process model for large scale data migration projects
    • Article: Data Migration Project Management and Standard Software – Experiences in Avaloq Implementation Projects
    • Article: Testing & Quality Assurance in Data Migration Projects
    • Article: Towards the Industrialization of Data Migration – Concepts and Patterns for Standard Software Implementation Projects
  • EA & IT Architecture
    • 5 IT Operations Cost Traps and How to Avoid Them
    • Kafka Tutorial
      • Apache Kafka Tutorial, Part 1
      • Apache Kafka Tutorial, Part 2
      • Apache Kafka Tutorial, Part 3
    • Oracle Openworld 2020
    • Oracle Openworld Learning 1: Moving your applications to the cloud and moving to a cloud application is something completely different
    • Oracle Openworld Learning 2: How the Cloud impacts the Software Customization Business
    • Oracle Openworld Learning 3: The True Value of Chatbots
    • Oracle Openworld Learning 4: The Oracle Database, ML, and Bitcoin – plus how Computer and Emotions come together
    • My Experience with the IoT on Google Cloud course
    • Big Data Technologien – Meine Erfahrung mit dem Microsoft Professional Programm
    • Project Management
    • Filmbesprechung
  • Testing
    • Test Center Management
      • Article: How Scrum Changes Test Centers
      • Article: Mobile Testing
      • Article: SCRUM – Untergang der Test Center?
      • Article: Software Quality Beyond Testing In-house Code
      • Article: Three New Technologies That Will Disrupt Your Test Organization
      • Efficient Testing versus Compliance
      • Information System Maintenance Costs
    • Test Data Management
      • Article: Datenmigration bei Standardsoftware-Einführungsprojekten
      • Article: Test Data Management – Investieren oder ignorieren?
      • Article: Test Data Management in Practice
      • Article: The Test Data Challenge for Database-Driven Applications (DBTest Workshop)
      • Article: The Test Data Challenge for Database-Driven Applications (testing experience)
      • Article: White-Box Testing for Database-driven Applications – A Requirements Analysis
  • Information Security, IT Risk & Compliance
    • Article: ISO 27001 for Developers and Testers
    • Article: When Data Is a Risk – DLP Tools and Their Role within IT Departments
    • How To Deal With Shadow-IT Applications
    • Why IT Security Departements (alone) Cannot Guarantee Secure Applications
    • Article: Data-Privacy Assessments for Application Landscapes – A Methodology
    • Article: Testdaten als Risikofaktor
    • Article: Data-Loss-Prevention-Tools minimieren Sicherheitslücken in Testumgebungen
    • Computer Security and System Management: Meine Erfahrung mit dem Coursera-Kurs der University of Colorado
    • Article: Ethik und DLP (in German)
  • About me …

Understanding Redundancy for Azure Storage Accounts

PaaS services promise software engineers nothing less than effortless eternal happiness: a few clicks and you have ready-to-use, secure, and unlimited scalable cloud service, be it an AI, a database, or a super-exotic service. But do cloud providers deliver as promised? Is, for example, redundancy a child’s play? Redundancy is much trickier than backups. Backups are about having a copy of an application’s data for emergencies. In contrast, redundancy guarantees that the application runs 7×24. Even major disasters must not impact a solution’s stability.

In the following, we look closely at one crucial Azure service, Azure Storage Accounts. It is Microsoft’s web service for storing data such as objects respectively, blobs, or files. We elaborate on which kind of disasters or failures the web service copes with – and how transparent it is from an application or operations perspective. Azure provides different options for configuring the recovery behavior of Storage Accounts (Figure 1), which the following paragraphs explain in more detail. Companies need specific guidelines on when to use which one – and writing such a guideline is a crucial cloud (security) architecture task.

Figure 1: Redundancy and Backup Options for Azure Storage Accounts – actual replication mechanisms (1) and read-feature for regional failure (2)

Cloud architects have to make (up to) three redundancy-related configuration decisions:

  1. Should the data be replicated three times within one data center (local redundancy), or should there be a copy in three separate data centers (zone redundancy)?
  2. Should there be copies in a data center further away in a secondary region (geo-redundancy)? For regulatory and compliance reasons, geo-redundant copies are usually in the same jurisdiction as the original data. Azure stores, e.g., geo-redundant copies for data in Switzerland-North (greater Zurich area) in the Azure region Switzerland-West around Geneva.
  3. Should applications be allowed to read the geo-redundant copies if the primary region fails (“read access”)?

Figure 2 and Figure 3 illustrate the configuration options for a better understanding. The architecture in Figure 2 visualizes local redundancy. An application reads and writes data stored in one data center (A). The different copies are synchronously replicated and always in sync. If – and only if – the engineers choose the geo-redundant add-on, Azure replicates the data asynchronously to one data center in a secondary region (B). In such a case, the Storage Account can be configured to allow applications to read from the storage account in the secondary region if the first region is unavailable (C).

Figure 2: Azure Storage Account with Local-Redundancy Option

Figure 3 looks very similar. It illustrates zone-redundancy. In such a set-up, the three copies are in different data centers in the primary region (A). If geo-redundancy is active, all asynchronous copies in the secondary are stored in the same single data center (B)as in the previous case in Figure 2. Applications can get read access in case of unavailability of the primary region – as with local redundancy (C).

Figure 3: Azure Storage Account with Zone-Redundancy Option

In the next step, we look closer at three failure scenarios: a failure of a single device, a data center, and an entire region. Which configuration option allows the application to continue to run? How transparent are such incidents for applications? What does the cloud operations team or the applications team have to do?

First, a failure of a single device in a data center has no impact on any application or operational staff. Azure always maintains three up-to-date copies of any data, be it in one or be it in three data centers. These copies are always in sync. Thus, if a device fails, another device in the same or another data center takes over and has precisely the same data. There is neither data loss nor interruption of application availability, nor any necessity for manual interventions.

A failure of a complete data center is transparent for zone-redundant storage accounts. Two other data centers have in-sync copies of the data if one data center burns down or is temporarily unavailable, e.g., due to power issues. Applications can continue to read or write to the Storage Account. No engineer has to intervene. In contrast, a data center failure can have severe consequences if a Storage Account is configured with local redundancy. The data becomes unavailable for read and write operations. If the data center does not come up again, all data is gone.

If an entire region fails due to a large-scale disaster, all Storage Accounts without a geo-redundancy option are temporarily or forever unavailable. Storage Accounts with geo-redundancy have a copy in a different region. In contrast to “normal” local or zone redundancy, geo-redundancy is not transparent. It might require application code adaption. Plus, the cloud staff has to perform failover and cleanup tasks.

Failover to the secondary region is an explicit decision of the customer. Its cloud staff must initiate the failover. Then, the secondary site becomes the new primary one though two aspects require attention. First, loss of data might occur. Microsoft has no formal SLA but mentions that the recovery point objective (RPO) is 15 minutes. In other words: data written in the last 15 minutes before the failure is lost. The application (and the business) must be able to cope with that. Second, Storage Accounts store their data only in one data center after the failover (local redundancy). Engineers have to reconfigure the Storage Accounts to reassure zone- or geo-redundancy. Eventually, they might want to move the data back to the original primary region.

When an entire region fails, this impacts applications even with the geo-redundancy option and even if the application code continues to run. Applications cannot read or write any data to or from their Storage Account in the primary region. Either the application code handles such circumstances (e.g., by showing users a “temporarily unavailable” page), or the application behaves unpredictably or crashes. Applications can reduce the impact on users with more sophisticated routines. The option to allow applications to read from secondary regions in case of a failure of the primary one enables applications to show users at least the currently available data. Writing changes is not possible – or only to another storage account not impacted; quite some work, but potentially worth the effort for highly business-critical applications.

To conclude: Azure offers a variety of configuration options for redundancy for Azure Storage Accounts, one of their most used services. Choosing an adequate option is an essential task for cloud architects. They have to balance the costs of the more sophisticated options with the actual availability needs based on the criticality of their applications.

This entry was posted in Azure, Cloud, Information Security, Security Architecture and tagged Azure Storage Account, Redundancy on August 24, 2022 by WPAdmin.

Post navigation

← Tutorial: Azure Function, Storage Accounts, and Managed-Identity-based Authentication AWS Network Zoning and Security for IaaS Workloads →

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Order my book "Managing AI in the Enterprise" with Springer or Amazon and get an even better AI line or project manager!

Proudly powered by WordPress