Astro Private Cloud (APC) uses PostgreSQL as the primary database system for both platform management (Houston) and Apache Airflow deployments. The following sections describe the database architecture, provisioning, connection pooling, and high availability configuration.Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Database components
Houston database
The Houston API uses a PostgreSQL database to store platform configuration and state. Key tables:- User - System users with authentication credentials
- Workspace - Organizational units for grouping deployments
- Deployment - Airflow deployment configurations
- Cluster - Data plane cluster properties including cloud provider, region, and configuration
- RoleBinding - User and service account role assignments
- ServiceAccount - API keys for programmatic access
- DeployRevision - Deployment version history
- TaskUsage - Task execution metrics
Airflow metadata database
Each Airflow Deployment gets its own isolated PostgreSQL database containing two schemas:| Schema | Purpose |
|---|---|
airflow | Airflow scheduler and webserver metadata |
celery | Celery task results backend |
Database provisioning
Automatic provisioning
By default, Commander automatically provisions databases for new Deployments. No additional configuration is required.External database
To use pre-existing or managed databases, setskipAirflowDatabaseProvisioning to true in the upsertDeployment mutation:
Connection pooling (PgBouncer)
APC uses PgBouncer for connection pooling to reduce database connection overhead. Airflow can open many database connections due to its distributed nature, and each PostgreSQL connection creates a dedicated OS process. PgBouncer reduces this overhead by maintaining a pool of reusable server connections. For more detail on why PgBouncer is recommended, see the Apache Airflow Helm chart production guide.PgBouncer is only used with the embedded PostgreSQL database. When you provide a
metadataConnectionString URI through the upsertDeployment mutation, Airflow connects directly to the external database and bypasses PgBouncer. If you use an external database and need connection pooling, configure it through your managed database service or deploy a separate PgBouncer instance.transaction pool mode by default, which means server connections are returned to the pool after each transaction completes. This mode does not support session-level features such as prepared statements or SET commands that persist across transactions.
Configuration
Pool sizes
Pool sizes are configured per Deployment in the Airflow chart values:cl_waiting metric. If clients consistently wait for connections, increase pool sizes or scale PgBouncer replicas.
Airflow also maintains its own SQLAlchemy connection pool (default
pool_size: 5, max_overflow: 10) between each Airflow component and PgBouncer. With these defaults, each Airflow process can open up to 15 simultaneous database connections. The full connection chain is: Airflow component → SQLAlchemy pool → PgBouncer → PostgreSQL.Kerberos support
PgBouncer in APC includes Kerberos support:High availability
PostgreSQL replication
Replication is disabled by default. The following example shows a recommended production configuration that enables streaming replication with synchronous commit:Default and recommended for production
The following table compares the chart default values with starting points that Astronomer recommends for production environments. The Default column reflects the values shipped in the Helm chart. The Recommended for production column reflects general guidance based on Astronomer operational experience. Adjust these values based on the number of Deployments, workload scale, and observed resource utilization in your environment. For additional production guidance, see the Apache Airflow Helm chart production guide. Astronomer also recommends using a managed database service such as Amazon RDS, Google Cloud SQL, or Azure Database for PostgreSQL rather than the embedded PostgreSQL container for production workloads.PostgreSQL
| Setting | Default | Recommended for production |
|---|---|---|
postgresql.resources.requests.cpu | 250m | 1000m. Adjust based on query load |
postgresql.resources.requests.memory | 256Mi | 2Gi. Adjust based on active connection count |
postgresql.resources.limits.cpu | 1000m | 2000m. Adjust based on query load |
postgresql.resources.limits.memory | 2Gi | 4Gi. Adjust based on active connection count |
postgresql.persistence.enabled | true | true |
postgresql.persistence.size | 8Gi | 50Gi or more. Scale based on the number of Deployments and retention policy |
postgresql.replication.enabled | false | true |
postgresql.replication.slaveReplicas | 1 | 2 |
postgresql.replication.synchronousCommit | "off" | "on" |
postgresql.replication.numSynchronousReplicas | 0 | 1 |
PgBouncer (platform level)
| Setting | Default | Recommended for production |
|---|---|---|
pgbouncer.resources.requests.cpu | 250m | 250m |
pgbouncer.resources.requests.memory | 256Mi | 256Mi |
pgbouncer.resources.limits.cpu | 250m | 500m. Adjust based on connection throughput |
pgbouncer.resources.limits.memory | 256Mi | 512Mi. Adjust based on connection count |
PgBouncer (per Deployment)
These values match the Apache Airflow Helm chart defaults. No official formula ties pool sizes to a specific workload metric such as Dag count. Instead, adjust pool sizes based on observed connection utilization.| Setting | Default | Recommended for production |
|---|---|---|
pgbouncer.metadataPoolSize | 10 | 10. Increase if cl_waiting is consistently non-zero |
pgbouncer.resultBackendPoolSize | 5 | 5. Increase for Deployments with high Celery task throughput |
pgbouncer.maxClientConn | 100 | 100. Increase if Deployments run many concurrent Airflow components |
Backup and recovery
For database backup and restore procedures, including size estimation,pg_dump/mysqldump commands, and restore steps, see Access Airflow database.
Connection strings
Houston database
The Houston database connection depends on whether you use an external database or the in-cluster PostgreSQL. You configure this during control plane installation. For external databases, provide the connection throughhouston.backendConnection in your Helm values:
houston.backendSecretName. The secret must contain a connection key with the full connection URI.
To retrieve the active Houston database connection string, read the astronomer-bootstrap secret:
The in-cluster PostgreSQL option (
global.postgresqlEnabled: true) is only for development or proof-of-concept environments and is not supported in production.Airflow database
Both the metadata and result backend connections use the same database with different schemas and credentials:Monitoring
Monitor the following metrics to track database and connection pool health:| Metric | Description |
|---|---|
pg_stat_activity | Active database connections |
pg_database_size | Database size on disk |
pgbouncer_pools_* | PgBouncer connection pool statistics, including cl_waiting |
Best practices
- Enable replication for production Deployments.
- Use PgBouncer to manage connection overhead.
- Monitor connection pools for
cl_waitingand database size growth. - Configure regular backups. See Access Airflow database.
- Size pools based on observed connection utilization, not Dag count.
- Use separate credentials for each Deployment.
- Enable SSL for database connections in production.
- Use a managed database service for production workloads instead of the in-cluster PostgreSQL container.