By default, Airflow uses the metadata database to store XComs, which works well for local development but has limited performance. For production environments that use XCom to pass data between tasks, Astronomer recommends using a custom XCom backend. Custom XCom backends allow you to configure where Airflow stores information that is passed between tasks using XComs. The Object Storage XCom Backend available in the Common IO provider is the easiest way to store XComs in a remote object storage solution. This tutorial will show you how to set up a custom XCom backend using object storage for AWS S3, GCP Cloud Storage or Azure Blob Storage. To learn more about other options for setting custom XCom backends, see Strategies for custom XCom backends in Airflow.Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Warning While a custom XCom backend allows you to store virtually unlimited amounts of data as XComs, you will also need to scale other Airflow components to pass large amounts of data between tasks. For help running Airflow at scale, reach out to Astronomer.
Time to complete
This tutorial takes approximately 45 minutes to complete.Assumed knowledge
To get the most out of this tutorial, make sure you have an understanding of:- XCom basics. See Passing data between Airflow tasks.
- Airflow connections. See Manage connections in Apache Airflow.
Prerequisites
- The Astro CLI with an Astro project running Astro Runtime 11.5.0 or higher (Airflow 2.9.2 or higher). To set up a custom XCom backend with older versions of Airflow, see Custom XCom backends.
- An account in either AWS, GCP, or Azure with permissions to create and configure an object storage container.
Step 1: Set up your object storage container
First, you need to set up the object storage container in your cloud provider where Airflow will store the XComs.Step 2: Install the required provider packages
To use the Object Storage XCom Backend, you need to install the Common IO provider package and the provider package for your object storage container provider.Step 3: Set up your Airflow connection
An Airflow connection is necessary to connect Airflow with your object storage container provider. In this tutorial, you’ll use the Airflow UI to configure your connection.-
Start your Astro project by running:
Step 4: Configure your custom XCom backend
Configuring a custom XCom backend with object storage can be done by setting environment variables in your Astro project.Info If you are setting up a custom XCom backend for an Astro deployment, you have to set the following environment variables for your deployment. See Environment variables for instructions.
-
Add the
AIRFLOW__CORE__XCOM_BACKENDenvironment variable to your.envfile. It defines the class to use for the custom XCom backend implementation.
-
Add the
AIRFLOW__COMMON.IO__XCOM_OBJECTSTORAGE_THRESHOLDenvironment variable to your.envfile to determine when Airflow will store XComs in the object storage vs the metadata database. The default value is-1which will store all XComs in the metadata database. Set the value to0to store all XComs in the object storage. Any positive value means any XCom with a byte size greater than the threshold will be stored in the object storage and any XCom with a size equal to or less than the threshold will be stored in the metadata database. For this tutorial we will set the threshold to1000bytes, which means any XCom larger than 1KB will be stored in the object storage. -
Optional. Define the
AIRFLOW__COMMON_IO__XCOM_OBJECTSTORE_COMPRESSIONenvironment variable to compress the XComs stored in the object storage with fsspec supported compression algorithms likezip. The default value isNone. -
Restart your Airflow project by running:
Step 5: Test your custom XCom backend
We will use a simple DAG to test your custom XCom backend.-
Create a new file in the
dagsdirectory of your Astro project calledcustom_xcom_backend_test.pyand add the following code: -
Manually trigger the
custom_xcom_backend_testDAG in the Airflow UI and navigate to the XCom tab of thepush_objectstask. You should see that thesmall_objXCom shows its value, meaning it was stored in the metadata database, since it is smaller than 1KB. Thebig_dictXCom shows shows the path to the object in the object storage containing the serialized value of the XCom.