Airflow factory pattern for DAG and Kubernetes pod operator

Alon Rolnik
2 min readNov 21, 2021

A few months ago I started to work on a project that uses Airflow. It uses Kubernetes executor with KubernetesPodOperator mainly for most of the DAGs.

It had dozens of DAGs, all of them were defined ad hoc, with a lot of boilerplate code in it.

The first thing I did was to introduce a factory method to generate those DAGs and get control of what we do.

In the beginning, nothing new was added, only a function that calls to another function, but as time goes by, this turned out to be very useful as more common logic was needed.

How would the DAG factory look like?

You might ask yourself, why I use this name AirflowDAGFactory, well you can use another name like Factory for example, but make sure to update dag_discovery_safe_mode in the configuration file to false, otherwise, Airflow will scan only for dags that contain the words Airflow and DAG in the DAGs folder.

As we can see, nothing special so far, however, we can already see how it become useful, notice the dagrun_timeout parameter, we can define default timeout for our generated DAGs in the system. Needs special macros for templating? add user_defined_macros and use it in all your DAGs.

Most of our DAGs use KubernetesPodOperator and factory method for it introduced as well.

KubernetesPodOperator Factory:

This factory turns out to be very useful when we upgraded to Airflow 2, lot of stuff changed mainly around the Kubernetes executors and the KubernetesPodOperator, with this factory function, we could do all the adjustments needed for the upgrade inside one place.

What if you want to track all the pods from the same run_id? Simply add a label of the run_id to your pods, later, you can search for it in your monitoring systems.

Conclusions

Using factories methods for your DAGs and operators in Airflow will help you to monitor your DAG’s more easily, reduce the boilerplate code, and make the upgrade process easy when code changes are introduced between versions.

--

--

Alon Rolnik

Father, Husband, Software Engineer. Blog about software, startups, distributed systems. Follow me on Twitter https://twitter.com/AlonRolnik