If some of your tasks require specific resources such as a GPU, you might want to run them in a different cluster than your Airflow instance. In setups where both clusters are used by the same AWS, Azure or GCP account, you can manage separate clusters with roles and permissions.Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
To launch Pods in external clusters from a local Airflow environment, you must have valid authentication for the external cluster so that your local Airflow environment has permissions to launch a Pod in the external cluster. For managed Kubernetes services from public cloud providers, authentication is federated through the native IAM service. To grant the Astro role permissions to launch pods on your cluster, you can either include static credentials or use workload identity to authorize the Astro role to your cluster.
Prerequisites
- Network connectivity between your Airflow execution environment and the external Kubernetes cluster:
- Hosted execution mode: A network connection between your Astro Deployment and the external cluster.
- Remote execution mode: Network connectivity between the environment where your Remote Execution Agent runs and the external cluster. You are responsible for managing this connectivity. A direct network connection between Astro and the external cluster is not required.
Setup
kubeconfig file.To trigger remote Pods on an Azure AKS Cluster, the following packages and dependencies need to be added to your Docker image.
FROM quay.io/astronomer/astro-runtimeX.Y.Z
USER root
RUN curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
RUN az aks install-cli
USER astro
The following configuration file below is a sample Kubernetes
kubeconfig file that allows the Kubernetes command-line tool, kubectl, or other clients to connect to a remote Kubernetes cluster, remote-kpo, using Azure Workload Identity for authentication.# Specifies the version of the Kubernetes API for this configuration file.
# v1 is the standard version used for kubeconfig files.
apiVersion: v1
# List of Kubernetes clusters that the configuration can connect to.
clusters:
- cluster:
# base64-encoded certificate for the Kubernetes API server to verify SSL communication.
certificate-authority-data: <certificate>
# URL of the Kubernetes API server.
# This is the endpoint of the remote cluster you want to interact with.
server: <Azure server address>
# Name of the cluster, which is referenced in the contexts section.
name: <AKS cluster>
# List of contexts that define which cluster and user combination to use when interacting with Kubernetes.
contexts:
# Describes the context for connecting to the cluster.
- context:
# References the cluster from the clusters section.
cluster: <AKS cluster>
# Associates the user configuration to be used for authentication with the cluster.
user: <user>
# The name of the context, which is referenced by current-context.
name: <AKS cluster>
# Specifies the active context that will be used by default when running kubectl commands.
current-context: <AKS cluster>
# Identifies the file type as a Kubernetes Config.
kind: Config
preferences: {}
# List of users and the method they use for authentication.
users:
# Defines the user that is being used in the context.
# This user is responsible for authenticating with the Kubernetes cluster.
- name: <your user>
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --login
- workloadidentity
- --tenant-id
- <Tenant ID from Step 1>
- --client-id
- <Client ID from Step 1>
# The server ID for Azure Kubernetes Service (AKS). This is a static ID representing AKS.
- --server-id
- 6dae42f8-4368-4678-94ff-3960e28e3630
# Specifies the path to the federated token that the managed identity uses to authenticate.
- --federated-token-file
- /var/run/secrets/azure/tokens/azure-identity-token
- --environment
- AzurePublicCloud
command: kubelogin
# Specifies if the kubelogin command should not attempt to provide additional cluster information beyond the authentication token.
provideClusterInfo: false
There are multiple ways to pass the
kubeconfig file to your Airflow Connection. If your kubeconfig file contains any sensitive information, we recommend storing it as JSON inside the connection, described in option 3.kubeconfig file resides in the default location on the machine (~/.kube/config), you can leave all fields empty in the connection configuration. Airflow will automatically use the kubeconfig from the default location.
Add the following COPY command at the end of your Dockerfile to add your kubeconfig file inside your Astro Runtime Docker Image.kubeconfig file by inserting the path into the Kube config path field of your Airflow Connection.
Add the following COPY command at the end of your Dockerfile to add your kubeconfig file inside your Astro Runtime Docker Image.kubeconfig file to JSON format and paste it into the Kube config (JSON format) field in the connection configuration.
Use an online converter like https://jsonformatter.org/yaml-to-json to convert YAML to JSON. Remove any sensitive information first.# import dag object
from airflow.decorators import DAG
# import the KubernetesPodOperator
from airflow.providers.cncf.kubernetes.operators.pod import (
KubernetesPodOperator,
)
from airflow.utils.dates import days_ago
default_args = {
"owner": "Astronomer",
"depends_on_past": False,
}
# instantiate the dag
with DAG(
dag_id="remote_kpo",
default_args=default_args,
schedule_interval=None,
start_date=days_ago(1),
tags=["KPO"],
):
# launch a pod in the Kubernetes cluster
remote_kpo = KubernetesPodOperator(
task_id="az_remote_kpo",
kubernetes_conn_id="<my-az-connection>",
namespace="<my-aks-namespace>",
image="debian",
cmds=["bash", "-cx"],
arguments=["echo", "hello world!"],
name="hello-world",
get_logs=True,
in_cluster=False,
)