Forward logs to Amazon S3

APC uses Vector for log collection and forwarding. You can configure Vector to send Airflow task logs to Amazon S3 for long-term storage, compliance, or integration with other analytics tools.

If you previously configured S3 log forwarding using Fluentd in APC 0.37 or earlier, you must replace your fluentd.s3 configuration with the Vector extraSinks configuration described in this document. Fluentd is no longer used for log collection in APC 1.0.

Architecture

Vector continues forwarding logs to Elasticsearch for the Airflow UI while also sending copies to S3.

The logs forwarded to S3 are Airflow task logs and deployment logs, not APC platform logs from Houston, Commander, or Registry.

Prerequisites

An existing S3 bucket
AWS IAM credentials with S3 write access
APC 1.0 or later

Configure AWS IAM

Create IAM policy

Create an IAM policy with S3 write permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": "arn:aws:s3:::your-logs-bucket"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::your-logs-bucket/*"
    }
  ]
}

For more information on S3 permissions, see Amazon S3 actions.

Provide credentials to Vector

IRSA (Recommended)

For EKS clusters, use IAM Roles for Service Accounts (IRSA) to securely provide AWS credentials:

Create an IAM role with the S3 policy attached
Configure the trust relationship for the Vector service account:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:astronomer:astronomer-vector"
        }
      }
    }
  ]
}

Annotate the Vector service account in your values.yaml:

vector:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/vector-s3-role

EC2 instance profile

For self-managed Kubernetes on EC2, attach the IAM policy to the EC2 instance profile used by your worker nodes.

Static credentials

For non-AWS environments or testing, use static credentials:

vector:
  extraEnv:
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: aws-credentials
          key: access-key-id
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: aws-credentials
          key: secret-access-key
    - name: AWS_REGION
      value: "us-east-1"

Create the secret:

kubectl create secret generic aws-credentials \
  --namespace astronomer \
  --from-literal=access-key-id=AKIAIOSFODNN7EXAMPLE \
  --from-literal=secret-access-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Static credentials are less secure than IRSA or instance profiles. Use only for testing or non-AWS environments.

Configure Vector S3 sink

Add the S3 sink to your values.yaml:

vector:
  extraSinks:
    s3_logs:
      type: aws_s3
      inputs:
        - transform_remove_fields
      bucket: "your-logs-bucket"
      region: "us-east-1"
      key_prefix: "airflow-logs/{{ "{{ namespace }}" }}/{{ "{{ release }}" }}/%Y/%m/%d/"
      compression: gzip
      encoding:
        codec: json
      batch:
        max_bytes: 10485760
        timeout_secs: 300
      request:
        retry_attempts: 5

Configuration options

For a full list of available options, see the Vector aws_s3 sink configuration reference.

OptionDescriptionExamplebucketS3 bucket namemy-logs-bucketregionAWS regionus-east-1key_prefixS3 object key prefix with templatinglogs/%Y/%m/%d/compressionCompression algorithmgzip, zstd, noneencoding.codecOutput formatjson, text, ndjsonbatch.max_bytesMax batch size before flush10485760 (10MB)batch.timeout_secsMax time before flush300 (5 minutes)

Key prefix templating

Use template variables in key_prefix:

VariableDescription{{ "{{ namespace }}" }}Kubernetes namespace{{ "{{ release }}" }}Deployment release name%Y, %m, %dDate components%H, %M, %STime components

Example: airflow-logs/{{ "{{ namespace }}" }}/%Y/%m/%d/%H/

Apply configuration

Push the configuration to your APC installation. For detailed instructions, see Apply a config change.

helm upgrade astronomer astronomer/astronomer \
  -f values.yaml \
  --namespace astronomer

Verify Vector pods restart with the new configuration:

kubectl rollout status daemonset/astronomer-vector -n astronomer

Verify log delivery

Check Vector logs

kubectl logs -n astronomer -l app=vector --tail=100 | grep -i s3

List S3 objects

aws s3 ls s3://your-logs-bucket/airflow-logs/ --recursive | head -20

Read a log file

aws s3 cp s3://your-logs-bucket/airflow-logs/path/to/file.json.gz - | gunzip | head -5

Advanced configuration

Filter logs by severity

Only forward ERROR and WARNING logs to S3 using a VRL filter condition:

vector:
  extraTransforms:
    filter_errors:
      type: filter
      inputs:
        - transform_remove_fields
      condition:
        type: vrl
        source: '.level == "ERROR" || .level == "WARNING"'

  extraSinks:
    s3_errors:
      type: aws_s3
      inputs:
        - filter_errors
      bucket: "your-logs-bucket"
      # ... rest of config

Partition by deployment

Organize logs by deployment namespace:

vector:
  extraSinks:
    s3_logs:
      type: aws_s3
      inputs:
        - transform_remove_fields
      bucket: "your-logs-bucket"
      key_prefix: "deployments/{{ "{{ namespace }}" }}/{{ "{{ pod }}" }}/%Y/%m/%d/"
      # ... rest of config

Multiple destinations

Forward to both S3 and another system:

vector:
  extraSinks:
    s3_archive:
      type: aws_s3
      inputs:
        - transform_remove_fields
      bucket: "archive-bucket"
      # ... config

    splunk_realtime:
      type: splunk_hec
      inputs:
        - transform_remove_fields
      endpoint: "https://splunk.example.com:8088"
      token: "${SPLUNK_TOKEN}"

S3 lifecycle policies

Configure S3 lifecycle rules to manage log retention:

{
  "Rules": [
    {
      "ID": "ArchiveOldLogs",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "airflow-logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

Apply via AWS CLI:

aws s3api put-bucket-lifecycle-configuration \
  --bucket your-logs-bucket \
  --lifecycle-configuration file://lifecycle.json

Troubleshooting

Logs not appearing in S3

Check Vector pod logs:

kubectl logs -n astronomer -l app=vector | grep -i error

Verify AWS credentials:

kubectl exec -n astronomer -it ds/astronomer-vector -c vector -- \
  sh -c 'echo $AWS_ACCESS_KEY_ID'

Inspect the logs for credential errors or permission issues. Look for lines containing CredentialsNotLoaded (no credentials found) or Invalid credentials (credentials rejected by AWS). For example:

2026-04-16T18:27:48.827213Z ERROR vector::topology::builder: msg="Healthcheck failed." error=Invalid credentials component_kind="sink" component_type="aws_s3" component_id=s3_logs

To see which credentials Vector loaded, look for lines matching aws_config::profile::credentials:

2026-04-16T18:27:48.247566Z  INFO aws_config::profile::credentials: constructed abstract provider from config file chain=ProfileChain { base: AccessKey(Credentials { provider_name: "ProfileFile", access_key_id: "AKIA5WLLPVSPD7JDVSXF", secret_access_key: "** redacted **", expires_after: "never" }), chain: [] }

These lines show the access key ID in use, which can help confirm whether the correct credentials are being picked up.

Permission denied errors

Verify your IAM policy includes both s3:PutObject and s3:ListBucket permissions. The bucket resource ARN should not include /* for ListBucket.

High latency

Adjust batch settings for faster delivery:

vector:
  extraSinks:
    s3_logs:
      batch:
        max_bytes: 5242880    # 5MB
        timeout_secs: 60      # 1 minute

Overview

Install And Upgrade

Documentation

Reference

Architecture

Prerequisites

Advanced configuration

Filter logs by severity

Partition by deployment

Multiple destinations

S3 lifecycle policies

Troubleshooting

Logs not appearing in S3

Permission denied errors

High latency

Overview

Install And Upgrade

Documentation

Reference

Documentation Index

​Architecture

​Prerequisites

​Advanced configuration

​Filter logs by severity

​Partition by deployment

​Multiple destinations

​S3 lifecycle policies

​Troubleshooting

​Logs not appearing in S3

​Permission denied errors

​High latency

​Related documentation

Architecture

Prerequisites

Advanced configuration

Filter logs by severity

Partition by deployment

Multiple destinations

S3 lifecycle policies

Troubleshooting

Logs not appearing in S3

Permission denied errors

High latency

Related documentation