Disaster recovery

The Astro Data Plane is designed to withstand in-region Availability Zone (AZ) degradations and outages as described in Resilience. For full region outages on AWS dedicated clusters, Astro supports self-service cross-region disaster recovery (DR). For a detailed overview of Astro’s AWS disaster recovery architecture, see the AWS disaster recovery whitepaper in the Astronomer Trust Center.

Cross-region disaster recovery (AWS dedicated clusters)

Self-service cross-region disaster recovery requires the Enterprise Business Critical tier and is currently available for AWS dedicated clusters only. GCP and Azure support are planned for later this year.

Cross-region DR lets you configure a pair of dedicated clusters — a primary and a secondary — in different AWS regions. The secondary cluster stays continuously synchronized with the primary so you can fail over with minimal downtime and data loss. After failover, Astro automatically enables synchronization in the reverse direction, keeping the original primary ready for failback. When the primary region recovers, you can fail back with a single click.

How AWS disaster recovery works

The primary cluster runs all Deployments in Region A.
A multi-region database replicates Deployment metadata to the secondary cluster in Region B.
Multi-region object storage copies task logs to the secondary cluster.
User-deployed images are replicated to the secondary cluster.
On failover, the secondary cluster is promoted to active. All Deployments, configuration, environment variables, connections, and Airflow variables transfer automatically.
Clusters and Deployments retain their IDs, names, namespaces, and system-managed configuration after failover. All hostnames — including the Airflow UI, Airflow API, and Remote Execution API URLs — are updated to point to the secondary cluster and remain the same.

RTO and RPO

The following table defines the recovery time objective (RTO) and recovery point objective (RPO) for DR clusters. Targets are benchmarked with 80+ Deployments and 1,250+ concurrent task runs.

Metric	Target
Recovery time objective (RTO)	Less than one hour
Recovery point objective (RPO)	Less than 15 minutes (requires Task Logs Replication SLA)

See Task Logs Replication SLA for details on the RPO guarantee.

What gets failed over

The following items transfer to the secondary cluster automatically during failover:

Deployments and data pipelines
Dag run history, task instance metadata, and XComs
Deployment configuration
Environment variables, connections, Airflow variables, and metrics exports — whether configured via Environment Manager or directly on the Deployment
Task logs. Enable Task Logs Replication SLA for a guaranteed 15-minute RPO.

The following items do not transfer automatically and require manual steps after configuring the secondary cluster:

Networking and DNS configuration. Configure using self-service features such as VPC peering or Customer Managed Egress, or work with Astronomer support.
imagePullSecrets for Kubernetes Pod Operators (KPOs)
Customer-managed workload identities. You must configure the OIDC issuer and IAM trust policies for the secondary cluster separately. See Workload identity.
Customer-managed Transit Gateway routing on the secondary cluster

Overview

Get Started

Develop

Manage Configuration

Run Workloads

Deploy And Automate

Documentation

Remote Execution

Secrets Backend

CI/CD Templates

Observe And Alert

Administration

Infrastructure

Reference And Support

Best Practices

Airflow 3

Cross-region disaster recovery (AWS dedicated clusters)

How AWS disaster recovery works

RTO and RPO

What gets failed over

Overview

Get Started

Develop

Manage Configuration

Run Workloads

Deploy And Automate

Documentation

Remote Execution

Secrets Backend

CI/CD Templates

Observe And Alert

Administration

Infrastructure

Reference And Support

Best Practices

Airflow 3

Documentation Index

​Cross-region disaster recovery (AWS dedicated clusters)

​How AWS disaster recovery works

​RTO and RPO

​What gets failed over

Cross-region disaster recovery (AWS dedicated clusters)

How AWS disaster recovery works

RTO and RPO

What gets failed over