The Astro Data Plane is designed to withstand in-region Availability Zone (AZ) degradations and outages as described in Resilience. For full region outages on AWS dedicated clusters, Astro supports self-service cross-region disaster recovery (DR). For a detailed overview of Astro’s AWS disaster recovery architecture, see the AWS disaster recovery whitepaper in the Astronomer Trust Center.Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Cross-region disaster recovery (AWS dedicated clusters)
Self-service cross-region disaster recovery requires the Enterprise Business Critical tier and is currently available for AWS dedicated clusters only. GCP and Azure support are planned for later this year.
How AWS disaster recovery works
- The primary cluster runs all Deployments in Region A.
- A multi-region database replicates Deployment metadata to the secondary cluster in Region B.
- Multi-region object storage copies task logs to the secondary cluster.
- User-deployed images are replicated to the secondary cluster.
- On failover, the secondary cluster is promoted to active. All Deployments, configuration, environment variables, connections, and Airflow variables transfer automatically.
- Clusters and Deployments retain their IDs, names, namespaces, and system-managed configuration after failover. All hostnames — including the Airflow UI, Airflow API, and Remote Execution API URLs — are updated to point to the secondary cluster and remain the same.
RTO and RPO
The following table defines the recovery time objective (RTO) and recovery point objective (RPO) for DR clusters. Targets are benchmarked with 80+ Deployments and 1,250+ concurrent task runs.| Metric | Target |
|---|---|
| Recovery time objective (RTO) | Less than one hour |
| Recovery point objective (RPO) | Less than 15 minutes (requires Task Logs Replication SLA) |
What gets failed over
The following items transfer to the secondary cluster automatically during failover:- Deployments and data pipelines
- Dag run history, task instance metadata, and XComs
- Deployment configuration
- Environment variables, connections, Airflow variables, and metrics exports — whether configured via Environment Manager or directly on the Deployment
- Task logs. Enable Task Logs Replication SLA for a guaranteed 15-minute RPO.
- Networking and DNS configuration. Configure using self-service features such as VPC peering or Customer Managed Egress, or work with Astronomer support.
imagePullSecretsfor Kubernetes Pod Operators (KPOs)- Customer-managed workload identities. You must configure the OIDC issuer and IAM trust policies for the secondary cluster separately. See Workload identity.
- Customer-managed Transit Gateway routing on the secondary cluster