Skip to main content

Documentation Index

Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

You can create Astronomer Software Deployments with the Houston API that use pre-created databases, external to the Airflow Deployment, as both a metadata storage and result storage backend.

Prerequisites

  • Workspace Admin user privileges and a Workspace ID
  • (Optional) A MySQL or PostgreSQL database
  • (Optional) An existing Deployment
If you create a new connection to an external database from a Deployment with existing DAG data, you must migrate that historic data to the new database. Information about your historic Deployment activity, such as task instances and DAG runs, won’t be displayed as the database where you stored that information has changed.

Step 1: Add configuration

  1. Open your values.yaml file.
  2. Add the following to your values.yaml file. These configurations enable the use of manual connection strings and provide an example of the database connection string format, which you can later update for your specific connection.
deployments:
  manualConnectionStrings:
    enabled: true
  database:
    connection: postgresql://example_user:example_pass@host:5432?sslmode=prefer
  1. Push the configuration change. See Apply a Config Change.

Step 2: (Optional) Create your database

Substitute astro-db-name with your own database name, if you need to create a new database.
CREATE DATABASE astro-db-name;

Step 3: Add a user account to your database for the connection

Substitute astro-user-name and astro-user-password with your information. You can use an existing database for this step.
  1. Create a user with a password for Astronomer Software to use to access the database.
CREATE USER astro-user-name WITH PASSWORD 'astro-user-password';
  1. Grant all privileges on the database to the user.
GRANT ALL PRIVILEGES ON DATABASE postgreSQL_linked_DB TO astro-user-name;
  1. Grant USAGE and CREATE privileges on the public schema to astro-user-name:
GRANT USAGE, CREATE ON SCHEMA public TO astro-user-name;
Now, go into the database you created, which is astro-db-name in this example, and run the following queries
  1. Grant all privileges on all tables, sequences, and functions to the user.
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO astro-user-name;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO astro-user-name;
GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO astro-user-name;
  1. Set default privileges for the user, so any new tables, sequences, or functions automatically have the user’s access.
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO astro-user-name;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON SEQUENCES TO astro-user-name;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON FUNCTIONS TO astro-user-name;
GRANT USAGE, CREATE ON SCHEMA public TO astro-user-name;

Step 4: Retrieve database host information

Retrieve the connection information for your external database. For example, with AWS, you can retrieve your endpoint information by Finding the connection information for an RDS for MySQL DB instance.

Step 5: Compose a connection strings for your database

You need connection strings that define how Astronomer Software configures the connection to your external databases from your Airflow Deployment. The values of these strings are used when you define your metadataConnection or resultBackendConnection when you create, update, or upsert your Deployment. Use the values for your astro-user-name, astro-user-password, astro-db-name, and the host information you retrieved to compose the connection strings in the following format, depending on whether you want to define a result backend connection or a metadata database connection.

With PGBouncer disabled

  • metadataConnection:
    postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name
    
  • resultBackendConnection:
    db+postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name
    
:::warning Celery ExecutorThe connection string format validation regex don’t cover the resultbackend connection string format, which includes db+. This is specifically required for the Celery executor worker. If the connection string doesn’t include db+, then Celery worker pod fails. The regex validation is not implemented because it adds the complications on format validation logic in different scenarios.:::

With PGBouncer enabled

If you have PGBouncer enabled, and are using Postgres, you must configure metadataConnectionJson and resultBackendConnectionJson instead.Use the values for your astro-user-name, astro-user-password, astro-db-name, and the host information you retrieved to compose the connection strings in the following format, depending on whether you want to define a result backend connection or a metadata database connection.
  • metadataConnectionJson:
    
    "metadataConnectionJson": {
        "user": "astro-user-name",
        "pass": "astro-user-password",
        "protocol": "postgresql",
        "host": "host",
        "port": 5432,
        "db": "astro-db-name"
        },
    
  • resultBackendConnectionJson:
    "resultBackendConnectionJson": {
        "user": "astro-user-name",
        "pass": "astro-user-password",
        "protocol": "postgresql",
        "host": "host",
        "port": 5432,
        "db": "astro-db-name"
        },
    
    

Step 6: Add to Deployment configuration

Use the Houston API to create your Deployment configuration.
When using an external database, set skipAirflowDatabaseProvisioning to true in the upsertDeployment mutation to prevent Houston from provisioning a database for this Deployment. This field is only available on the upsertDeployment mutation, not on createDeployment.
The following example shows the mutation and queries for using createDeployment. See Houston API code examples for examples on how to use the update and upsert options for configuring your Deployment.

Create a new Deployment

mutation createDeployment(
  $workspaceUuid: Uuid!
  $releaseName: String
  $namespace: String!
  $type: String!
  $label: String!
  $description: String
  $version: String
  $airflowVersion: String
  $runtimeVersion: String
  $executor: ExecutorType
  $workers: Workers
  $webserver: Webserver
  $scheduler: Scheduler
  $triggerer: Triggerer
  $config: JSON
  $properties: JSON
  $dagDeployment: DagDeployment
  $astroUnitsEnabled: Boolean
  $rollbackEnabled: Boolean
  $metadataConnection: String
  $resultBackendConnection: String
  $metadataConnectionJson: JSON
  $resultBackendConnectionJson: JSON
) {
  createDeployment(
    workspaceUuid: $workspaceUuid
    releaseName: $releaseName
    namespace: $namespace
    type: $type
    label: $label
    airflowVersion: $airflowVersion
    description: $description
    version: $version
    executor: $executor
    workers: $workers
    webserver: $webserver
    scheduler: $scheduler
    triggerer: $triggerer
    config: $config
    properties: $properties
    runtimeVersion: $runtimeVersion
    dagDeployment: $dagDeployment
    astroUnitsEnabled: $astroUnitsEnabled
    rollbackEnabled: $rollbackEnabled
    metadataConnection: $metadataConnection
    resultBackendConnection: $resultBackendConnection
    metadataConnectionJson: $metadataConnectionJson
    resultBackendConnectionJson: $resultBackendConnectionJson
  ) {
    id
    config
    urls {
      type
      url
      __typename
    }
    properties
    description
    label
    releaseName
    namespace
    status
    type
    version
    workspace {
      id
      label
      __typename
    }
    airflowVersion
    runtimeVersion
    desiredAirflowVersion
    dagDeployment {
      type
      nfsLocation
      repositoryUrl
      branchName
      syncInterval
      syncTimeout
      ephemeralStorage
      dagDirectoryLocation
      rev
      sshKey
      knownHosts
      __typename
    }
    createdAt
    updatedAt
    __typename
  }
}

JSON Query example

{
  "workspaceUuid": "cm3g0cjd2000008l74jigb54y",
 "metadataConnectionJson": {
    "user": "astro-user-name",
    "pass": "astro-password",
    "protocol": "postgresql",
    "host": "host",
    "port": 5432,
    "db": "astro-db-name"
    },
  "resultBackendConnectionJson": {
    "user": "astro-user-name",
    "pass": "astro-password",
    "protocol": "postgresql",
    "host": "postgres-db-lb.external-postgres.svc.cluster.local",
    "port": 5432,
    "db": "astro-db-name"
    },
  "namespace": "",
  "type": "airflow",
  "config": {
    "executor": "CeleryExecutor",
    "workers": {},
    "webserver": {},
    "scheduler": {
      "replicas": 1
    },
    "triggerer": {}
  },
  "executor": "CeleryExecutor",
  "workers": {},
  "webserver": {},
  "scheduler": {
    "replicas": 1
  },
  "triggerer": {},
  "label": "Rt1160-Celery-Pgbouncer-Enabled-Json-5",
  "description": "",
  "runtimeVersion": "11.6.0",
  "properties": {
    "extra_capacity": {
      "cpu": 1000,
      "memory": 3840
    }
  },
  "astroUnitsEnabled": false,
  "rollbackEnabled": true,
  "dagDeployment": {
    "type": "dag_deploy",
    "nfsLocation": "",
    "repositoryUrl": "",
    "branchName": "",
    "syncInterval": 1,
    "syncTimeout": 120,
    "ephemeralStorage": 2,
    "dagDirectoryLocation": "",
    "rev": "",
    "sshKey": "",
    "knownHosts": ""
  }
}

Example query string variables

{
  "workspaceUuid": "cm3g0cjd2000008l74jigb54y",
  "metadataConnection": "postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name",
  "resultBackendConnection": "db+postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name",
  "namespace": "",
  "type": "airflow",
  "config": {
    "executor": "CeleryExecutor",
    "workers": {},
    "webserver": {},
    "scheduler": {
      "replicas": 1
    },
    "triggerer": {}
  },
  "executor": "CeleryExecutor",
  "workers": {},
  "webserver": {},
  "scheduler": {
    "replicas": 1
  },
  "triggerer": {},
  "label": "Rt1160-Celery-Pgbouncer-Enabled-Json-5",
  "description": "",
  "runtimeVersion": "11.6.0",
  "properties": {
    "extra_capacity": {
      "cpu": 1000,
      "memory": 3840
    }
  },
  "astroUnitsEnabled": false,
  "rollbackEnabled": true,
  "dagDeployment": {
    "type": "dag_deploy",
    "nfsLocation": "",
    "repositoryUrl": "",
    "branchName": "",
    "syncInterval": 1,
    "syncTimeout": 120,
    "ephemeralStorage": 2,
    "dagDirectoryLocation": "",
    "rev": "",
    "sshKey": "",
    "knownHosts": ""
  }
}