Databricks is a popular unified data and analytics platform built around Apache Spark that provides users with fully managed Apache Spark clusters and interactive workspaces. This guide provides the basic setup for creating a Databricks connection. For a complete integration tutorial, see Orchestrate Databricks jobs with Airflow.Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- An Airflow environment with the Airflow Databricks provider (
apache-airflow-providers-databricks) installed. - A Databricks account.
Astro users can also create connections using the Astro Environment Manager, which stores connections in an Astro-managed secrets backend. These connections can be shared across multiple deployed and local Airflow environments. See Create Airflow connections in the Astro UI.
Connect with an OAuth Connection
An OAuth connection from Airflow to Databricks requires the following information:- Host: Databricks URL
- Service Principal Client ID / Login: Service Principal Client ID
- Service Principal Client Secret / Password: Service Principal Client Secret
- In the Databricks Cloud UI, copy the URL of your Databricks workspace. It should be formatted as either
https://dbc-75fc7ab7-96a6.cloud.databricks.com/orhttps://your-org.cloud.databricks.com/. - Create a service principal in Databricks and copy the Client ID and Client Secret, see Authorize service principal access to Databricks with OAuth.
Connect with a Personal Access Token
A Personal Access Token (PAT) connection from Airflow to Databricks requires the following information:- Host: Databricks URL
- Personal Access Token / Password: Personal access token
- In the Databricks Cloud UI, copy the URL of your Databricks workspace. It should be formatted as either
https://dbc-75fc7ab7-96a6.cloud.databricks.com/orhttps://your-org.cloud.databricks.com/. - To use a personal access token for a user, follow the Databricks documentation to generate a new token. To generate a personal access token for a service principal, see Manage personal access tokens for a service principal. Copy the personal access token.
See also
- Apache Airflow Databricks provider package documentation
- Databricks modules in the Airflow Registry