Access Airflow database

Each Airflow Deployment in Astro Private Cloud (APC) has its own metadata database. APC supports both PostgreSQL and MySQL as the database backend. You can connect to the database to run queries, troubleshoot issues, or perform maintenance tasks.

Database architecture

PostgreSQL
MySQL

Each Deployment has a single PostgreSQL database containing two schemas:

Schema	Purpose
`airflow`	Airflow metadata, including Dags, task instances, connections, and variables
`celery`	Celery task results when using the Celery Executor

Retrieve the connection string

Each Deployment stores its database connection string in a Kubernetes secret named <release-name>-metadata. Run the following command to retrieve it:

kubectl get secret -n <deployment-namespace> \
  <release-name>-metadata -o jsonpath='{.data.connection}' | base64 -d

This returns a connection URI in the following format:

PostgreSQL
MySQL

postgresql://<username>:<password>@<host>:<port>/<database>?sslmode=prefer

mysql://<username>:<password>@<host>:<port>/<database>

Connect to the database

Use kubectl exec

Run the following command to open an Airflow metadata database shell from the scheduler pod:

kubectl exec -it -n <deployment-namespace> \
  deployment/<release-name>-scheduler -c scheduler -- \
  airflow db shell

This command uses the Airflow metadata database connection that is already configured in the Deployment.

Connect from your local computer

If the database is accessible as a Kubernetes service, you can use kubectl port-forward to connect from your local computer. Identify the database service and namespace from the <host> field in the connection string, then forward the port:

PostgreSQL
MySQL

kubectl port-forward -n <database-namespace> \
  svc/<database-service> <local-port>:<database-port>

If PgBouncer is enabled, forward port 6543 from the PgBouncer service in the Deployment namespace instead.

Then connect using the credentials from the connection string:

psql -h localhost -p <local-port> -U <username> -d <database>

kubectl port-forward -n <database-namespace> \
  svc/<database-service> <local-port>:<database-port>

Then connect using the credentials from the connection string:

mysql -h 127.0.0.1 -P <local-port> -u <username> -p <database>

If the database is an external service (for example, Amazon RDS or Google Cloud SQL), connect directly using the host and port from the connection string instead of port-forwarding.

Common queries

After you connect to the database, use the following queries to inspect Airflow metadata.

Dag information

Airflow 2.x
Airflow 3.x

SELECT dag_id, is_active, is_paused, last_parsed_time
FROM dag ORDER BY dag_id;

SELECT dag_id, run_id, state, start_date
FROM dag_run ORDER BY start_date DESC LIMIT 20;

SELECT dag_id, is_paused, last_parsed_time
FROM dag ORDER BY dag_id;

SELECT dag_id, run_id, state, start_date
FROM dag_run ORDER BY start_date DESC LIMIT 20;

Task instance status

SELECT dag_id, task_id, state, start_date, try_number
FROM task_instance
ORDER BY start_date DESC LIMIT 50;

Task failures

PostgreSQL
MySQL

SELECT dag_id, task_id, state, start_date
FROM task_instance
WHERE state = 'failed'
  AND start_date > NOW() - INTERVAL '24 hours';

SELECT dag_id, task_id, state, start_date
FROM task_instance
WHERE state = 'failed'
  AND start_date > NOW() - INTERVAL 24 HOUR;

External database configuration

To use an external database instead of the APC-managed database, create the Deployment using the upsertDeployment Houston API mutation with the following fields:

skipAirflowDatabaseProvisioning: Set to true so Commander doesn’t provision a database for this Deployment.
metadataConnection or metadataConnectionJson: The connection string or JSON object pointing to your external database.
resultBackendConnection or resultBackendConnectionJson: The connection string or JSON object for the Celery result backend.

For a complete example of the upsertDeployment mutation payload with external database configuration, see Bring your own Airflow database.

Back up and restore

For production environments, run backup and restore commands from your local computer or a dedicated admin computer. Do not run backup or restore commands from inside Airflow containers.

Before you run a backup, estimate the database size and confirm that your local computer has enough free disk space.

PostgreSQL
MySQL

SELECT
  schemaname,
  pg_size_pretty(SUM(pg_total_relation_size(format('%I.%I', schemaname, tablename)::regclass))) AS total_size
FROM pg_tables
GROUP BY schemaname
ORDER BY SUM(pg_total_relation_size(format('%I.%I', schemaname, tablename)::regclass)) DESC;

SELECT
  table_schema,
  ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS size_mb
FROM information_schema.tables
GROUP BY table_schema
ORDER BY size_mb DESC;

Back up the database

PostgreSQL
MySQL

pg_dump -h <host> -p <port> -U <username> -d <database> -n airflow > backup.sql

To back up a MySQL database from your local computer, use port-forwarding or a direct connection to the database host:

mysqldump -h <host> -P <port> -u <username> -p <database> > backup.sql

Restore the database

PostgreSQL
MySQL

psql -h <host> -p <port> -U <username> -d <database> < backup.sql

To restore a MySQL database from your local computer, use port-forwarding or a direct connection to the database host:

mysql -h <host> -P <port> -u <username> -p <database> < backup.sql

Replace the placeholders with values from the connection string. See Retrieve the connection string.

Security best practices

Direct database access bypasses Airflow’s security model. Use with caution and only for troubleshooting or maintenance tasks.

Use read-only access for monitoring.
Never expose database ports publicly.
Use SSL for all connections.
Rotate credentials regularly.

Overview

Install And Upgrade

Documentation

Reference

Database architecture

Retrieve the connection string

Connect to the database

Use kubectl exec

Connect from your local computer

Common queries

Dag information

Task instance status

Task failures

External database configuration

Back up and restore

Back up the database

Restore the database

Security best practices

Overview

Install And Upgrade

Documentation

Reference

Documentation Index

​Database architecture

​Retrieve the connection string

​Connect to the database

​Use kubectl exec

​Connect from your local computer

​Common queries

​Dag information

​Task instance status

​Task failures

​External database configuration

​Back up and restore

​Back up the database

​Restore the database

​Security best practices

Database architecture

Retrieve the connection string

Connect to the database

Use kubectl exec

Connect from your local computer

Common queries

Dag information

Task instance status

Task failures

External database configuration

Back up and restore

Back up the database

Restore the database

Security best practices