Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Info This page has not yet been updated for Airflow 3. The concepts shown are relevant, but some code may need to be updated. If you run any examples, take care to update import statements and watch for any other breaking changes.BigQuery is Google’s fully managed and serverless data warehouse. Integrating BigQuery with Airflow lets you execute BigQuery jobs from a DAG. There are multiple ways to connect Airflow and BigQuery, all of which require a GCP Service Account:
- Use the contents of a service account key file directly in an Airflow connection.
- Copy the service account key file to your Airflow project.
- Store the contents of a service account key file in a secrets backend.
- Use a Kubernetes service account to integrate Airflow and BigQuery. This is possible only if you run Airflow on Astro or Google Kubernetes Engine (GKE).
Tip If you’re an Astro user, Astronomer recommends using workload identity to authorize to your Deployments to BigQuery. This eliminates the need to specify secrets in your Airflow connections or copying credentials file to your Astro project. See Authorize Deployments to your cloud.
Prerequisites
- The Astro CLI.
- A locally running Astro project.
- A Google Cloud project with BigQuery API enabled.
- Permissions to create an IAM service account or use an existing one. See Google documentation.
Get connection details
A connection from Airflow to Google BigQuery requires the following information:- Service account name
- Service account key file
- Google Cloud Project ID
Create your connection
Info Astro users can also create connections using the Astro Environment Manager, which stores connections in an Astro-managed secrets backend. These connections can be shared across multiple deployed and local Airflow environments. See Create Airflow connections in the Astro UI.
How it works
Airflow uses thepython-bigquery library to connect to GCP BigQuery through the BigQueryHook. If you don’t define specific key credentials in the connection, Google defaults to using Application Default Credentials (ADC). This means when you use Workload Identity to connect to BigQuery, Airflow relies on ADC to authenticate.