Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Airflow 3This feature is only available for Airflow 3.x Deployments.
Airflow task logs are generated when tasks execute in the Worker and Triggerer components. The logging sidecar is a container that runs alongside these components to collect and ship task logs to external systems like Splunk, Elasticsearch, AWS CloudWatch, or other log aggregation services.
The following procedure describes how to configure your Remote Execution Agent to use the logging sidecar. This process configures the loggingSidecar section in your values.yaml file, which controls the deployment of a sidecar container that collects and forwards task logs.
Prerequisites
- You must have permissions for
deployment create or pod create in the kubernetes Namespace where your Remote Execution Agent is installed.
Enable the Logging Sidecar
- Configure volumes in your Agent Worker and Agent Triggerer components of your
values.yaml file to collect task logs:
workers:
- name: default-worker
volumes:
- name: task-logs
emptyDir: {}
volumeMounts:
- name: task-logs
mountPath: /usr/local/airflow/logs
triggerer:
volumes:
- name: task-logs
emptyDir: {}
volumeMounts:
- name: task-logs
mountPath: /usr/local/airflow/logs
- To enable the logging sidecar, set
enabled to true in your Remote Execution Agent’s values.yaml file, and define the name of your logging sidecar and the image you want to use. Astronomer recommends using Vector for exporting task logs and the following example uses the Timber docker image for it.
loggingSidecar:
enabled: true
name: vector-logging-sidecar
image: timberio/vector:0.45.0-debian
- Allocate resources for your sidecar container in the
values.yaml file:
loggingsidecar:
resources:
limits:
cpu: "0.5"
memory: "1Gi"
requests:
cpu: "0.5"
memory: "1Gi"
Example logging sidecar configuration
The following YAML file shows a full configuration example for a logging sidecar that uses Vector to export task log data to the Splunk Cloud Platform.
loggingSidecar:
enabled: true
name: vector-logging-sidecar
image: timberio/vector:0.45.0-debian
# Mount the task logs directory to access log files
volumeMounts:
- name: task-logs
mountPath: /etc/vector/task_logs
# Resource allocation for the sidecar container
resources:
limits:
cpu: "0.5"
memory: "1Gi"
requests:
cpu: "0.5"
memory: "1Gi"
# Vector configuration
config: |
data_dir: /etc/vector/task_logs
# Define log sources
sources:
task_logs:
type: file
include:
- /etc/vector/task_logs/**/*.log
transforms:
parse_task_log_file:
type: remap
inputs:
- task_logs
source: |
parsed = parse_regex!(.file, r'/dag_id=(?P<dagID>[0-9a-z-_]+)/run_id=(?P<runID>[^/]+)/task_id=(?P<taskID>[0-9a-z-_]+)/(?:map_index=(?P<mapIndex>-?[0-9]+)/)?attempt=(?P<attempt>[0-9]+)/(?P<tiID>[0-9a-z-]+)(?:\.log\.trigger\.[0-9]+)?\.log$')
.tiID = parsed.tiID
.attempt = parsed.attempt
.taskID = parsed.taskID
.runID = parsed.runID
.dagID = parsed.dagID
.mapIndex = parsed.mapIndex ?? -1
sinks:
splunk:
type: splunk_hec_logs
inputs:
- parse_task_log_file
endpoint: https://<your-domain>.splunkcloud.com
default_token: <token>
index: <your-index>
indexed_fields:
- tiID
- attempt
- taskID
- runID
- dagID
encoding:
codec: "text"
workers:
- name: default-worker
volumes:
- name: task-logs
emptyDir: {}
volumeMounts:
- name: task-logs
mountPath: /usr/local/airflow/logs
triggerer:
volumes:
- name: task-logs
emptyDir: {}
volumeMounts:
- name: task-logs
mountPath: /usr/local/airflow/logs