After you set up an Astro project, and you start implementing some pipelines, extracting several Python functions for reusability, and scaling other operations, you’ve now got multiple Airflow deployments. How do you reuse code between your projects? In this guide, you’ll learn about various options for reusing code and their pros and cons. Specifically, this guide demonstrates three options, ordered from simple implementation, but poor reusability, to comprehensive implementation, but excellent reusability:Documentation Index
Fetch the complete documentation index at: https://astronomer-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
| Solution | When to use |
|---|---|
| Shared Python code in same file | When reusing code only within a single script |
Shared Python code in /include folder | When reusing code in multiple scripts, but within the same Git repository |
| Shared Python code in Python package in separate project | When reusing code in multiple Git projects |
_get_locations() and _get_purchases()) instantiate a database client, execute a query, and return the result. The only difference is the query being executed. Any change to the database connection logic now requires two changes. If you continue copy-pasting these functions for running more queries, you have multiple copies of the same business logic, which also requires multiple code changes when you want to modify the database connection logic.
Shared Python code in same file
To reduce the burden of maintaining the same business logic multiple times, and have only one way of querying the database, you can extract the code into a single function within the same DAG file:query, and the database connection logic is only defined once, regardless what query is given. This way you don’t have to maintain the same business logic in multiple places anymore. To use this function, reference it in your DAG script:
Shared Python code in /include folder
To reuse a piece of code across multiple scripts it needs to be accessible in a shared location. The Astro Runtime Docker image provides a convenient mechanism for that, the /include folder:
You can store the function in a separate file, for example /include/db.py:
query_db function can be imported from multiple scripts (within the same Git repository).
Shared Python code in Python package in separate project
In some cases, you might have code that needs to be shared across different Airflow deployments. For example, if you’re onboarding multiple teams to the Astronomer platform and each team has their own code repository. This means you can’t reuse the code in the/include folder, because it resides in a different Git repository.
To reuse code over multiple projects, you need to store it in a separate Git repository which can be reused by multiple projects. The best way to do this is to create your own Python package from the repository you want to be available to multiple projects. This takes a bit more work to set up, but enables multiple teams using multiple Git repositories to maintain a single source of code. You can see an example Python package in this repo.
The number of options for developing, building, and releasing a Python package are limitless and this guide only provides general guidance. See Structuring your project and Packaging Python projects for more information on Python packaging.
Setting up a custom Python package requires roughly the following steps:
- Create a separate Git repository for your shared code.
- Write a
pyproject.tomlfile. This is a configuration file which contains the build requirements of your Python project. You can find an example here. - Create a folder for your code, e.g.
my_company_airflow. - Create a folder for tests, e.g.
tests. - Create a CI/CD pipeline to test, build, and release your package. You can see an example GitHub Actions workflow in the custom package demo.
- Ensure your setup works correctly by building and releasing a first version of the package.
- Validate the package by installing it in a project via the
requirements.txtfile.
my_company_airflow/db.py:
- Think about how you distribute the Python package. Do you require/have an internal repository for storing Python packages such as Artifactory or devpi?
- Determine who is responsible for maintaining the shared Git repository.
- Set developments standards from the beginning, such as Flake8 linting and Black formatting.
- Ensure the end-to-end CI/CD pipeline works first, then start developing application code.