BigQuery is Google’s fully managed and serverless data warehouse. Integrating BigQuery with Airflow lets you execute BigQuery jobs from a DAG. There are multiple ways to connect Airflow and BigQuery, all of which require a GCP Service Account:Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Use the contents of a service account key file directly in an Airflow connection.
- Copy the service account key file to your Airflow project.
- Store the contents of a service account key file in a secrets backend.
- Use a Kubernetes service account to integrate Airflow and BigQuery. This is possible only if you run Airflow on Astro or Google Kubernetes Engine (GKE).
Prerequisites
- The Astro CLI.
- A locally running Astro project.
- A Google Cloud project with BigQuery API enabled.
- Permissions to create an IAM service account or use an existing one. See Google documentation.
Get connection details
A connection from Airflow to Google BigQuery requires the following information:- Service account name
- Service account key file
- Google Cloud Project ID
Create your connection
Astro users can also create connections using the Astro Environment Manager, which stores connections in an Astro-managed secrets backend. These connections can be shared across multiple deployed and local Airflow environments. See Create Airflow connections in the Astro UI.
How it works
Airflow uses thepython-bigquery library to connect to GCP BigQuery through the BigQueryHook. If you don’t define specific key credentials in the connection, Google defaults to using Application Default Credentials (ADC). This means when you use Workload Identity to connect to BigQuery, Airflow relies on ADC to authenticate.