Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Each Airflow Deployment in Astro Private Cloud (APC) has its own metadata database. APC supports both PostgreSQL and MySQL as the database backend. You can connect to the database to run queries, troubleshoot issues, or perform maintenance tasks.
Database architecture
Each Deployment has a single PostgreSQL database containing two schemas:| Schema | Purpose |
|---|
airflow | Airflow metadata, including Dags, task instances, connections, and variables |
celery | Celery task results when using the Celery Executor |
Each Deployment has a single MySQL database. All Airflow metadata tables and Celery result tables exist in the same database without schema separation.
Retrieve the connection string
Each Deployment stores its database connection string in a Kubernetes secret named <release-name>-metadata. Run the following command to retrieve it:
kubectl get secret -n <deployment-namespace> \
<release-name>-metadata -o jsonpath='{.data.connection}' | base64 -d
This returns a connection URI in the following format:
postgresql://<username>:<password>@<host>:<port>/<database>?sslmode=prefer
mysql://<username>:<password>@<host>:<port>/<database>
Connect to the database
Use kubectl exec
Run the following command to open an Airflow metadata database shell from the scheduler pod:
kubectl exec -it -n <deployment-namespace> \
deployment/<release-name>-scheduler -c scheduler -- \
airflow db shell
This command uses the Airflow metadata database connection that is already configured in the Deployment.
Connect from your local computer
If the database is accessible as a Kubernetes service, you can use kubectl port-forward to connect from your local computer. Identify the database service and namespace from the <host> field in the connection string, then forward the port:
kubectl port-forward -n <database-namespace> \
svc/<database-service> <local-port>:<database-port>
If PgBouncer is enabled, forward port 6543 from the PgBouncer service in the
Deployment namespace instead.
Then connect using the credentials from the connection string:psql -h localhost -p <local-port> -U <username> -d <database>
kubectl port-forward -n <database-namespace> \
svc/<database-service> <local-port>:<database-port>
Then connect using the credentials from the connection string:mysql -h 127.0.0.1 -P <local-port> -u <username> -p <database>
If the database is an external service (for example, Amazon RDS or Google Cloud SQL), connect directly using the host and port from the connection string instead of port-forwarding.
Common queries
After you connect to the database, use the following queries to inspect Airflow metadata.
SELECT dag_id, is_active, is_paused, last_parsed_time
FROM dag ORDER BY dag_id;
SELECT dag_id, run_id, state, start_date
FROM dag_run ORDER BY start_date DESC LIMIT 20;
SELECT dag_id, is_paused, last_parsed_time
FROM dag ORDER BY dag_id;
SELECT dag_id, run_id, state, start_date
FROM dag_run ORDER BY start_date DESC LIMIT 20;
Task instance status
SELECT dag_id, task_id, state, start_date, try_number
FROM task_instance
ORDER BY start_date DESC LIMIT 50;
Task failures
SELECT dag_id, task_id, state, start_date
FROM task_instance
WHERE state = 'failed'
AND start_date > NOW() - INTERVAL '24 hours';
SELECT dag_id, task_id, state, start_date
FROM task_instance
WHERE state = 'failed'
AND start_date > NOW() - INTERVAL 24 HOUR;
External database configuration
To use an external database instead of the APC-managed database, create the Deployment using the upsertDeployment Houston API mutation with the following fields:
skipAirflowDatabaseProvisioning: Set to true so Commander doesn’t provision a database for this Deployment.
metadataConnection or metadataConnectionJson: The connection string or JSON object pointing to your external database.
resultBackendConnection or resultBackendConnectionJson: The connection string or JSON object for the Celery result backend.
For a complete example of the upsertDeployment mutation payload with external database configuration, see Bring your own Airflow database.
Back up and restore
For production environments, run backup and restore commands from your local
computer or a dedicated admin computer. Do not run backup or restore commands
from inside Airflow containers.
Before you run a backup, estimate the database size and confirm that your local computer has enough free disk space.
SELECT
schemaname,
pg_size_pretty(SUM(pg_total_relation_size(format('%I.%I', schemaname, tablename)::regclass))) AS total_size
FROM pg_tables
GROUP BY schemaname
ORDER BY SUM(pg_total_relation_size(format('%I.%I', schemaname, tablename)::regclass)) DESC;
SELECT
table_schema,
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS size_mb
FROM information_schema.tables
GROUP BY table_schema
ORDER BY size_mb DESC;
Back up the database
pg_dump -h <host> -p <port> -U <username> -d <database> -n airflow > backup.sql
To back up a MySQL database from your local computer, use port-forwarding or a direct connection to the database host:mysqldump -h <host> -P <port> -u <username> -p <database> > backup.sql
Restore the database
psql -h <host> -p <port> -U <username> -d <database> < backup.sql
To restore a MySQL database from your local computer, use port-forwarding or a direct connection to the database host:mysql -h <host> -P <port> -u <username> -p <database> < backup.sql
Replace the placeholders with values from the connection string. See Retrieve the connection string.
Security best practices
Direct database access bypasses Airflow’s security model. Use with caution and
only for troubleshooting or maintenance tasks.
- Use read-only access for monitoring.
- Never expose database ports publicly.
- Use SSL for all connections.
- Rotate credentials regularly.