Skip to content

Setup

In this exercise, you will provision an Airflow Helm chart and test the autoscaling of Airflow with KEDA.

Important

This tutorial describes the steps using the Rafay Web Console. The entire workflow can also be fully automated and embedded into an automation pipeline

Watch a video of the steps below.


Assumptions

You have already provisioned or imported a Kubernetes cluster into your Rafay Org and created a blueprint with KEDA


Step 1: Create Namespace

  • Login into the Web Console
  • Navigate to Infrastructure -> Namespaces
  • Create a new namespace, specify the name (e.g. airflow) and select type as Wizard
  • In the placement section, select a cluster
  • Click Save & Go to Publish
  • Publish the namespace

Create namespace


Step 2: Create Airflow Add-on

  • Navigate to Infrastructure -> Add-Ons
  • Select New Add-On -> Create New Add-On from Catalog
  • Search for airflow
  • Select airflow from default-helm

Catalog

  • Select Create Add-On
  • Enter a name for the add-on
  • Specify the namespace (e.g. airflow and select the namespace created as part of the previous step

Addon create

  • Click Create
  • Enter a version name
  • Upload the following helm values. Be sure to update the storageClassName with a storageclass in your cluster
# Airflow Worker Config
workers:
  # Number of airflow celery workers in StatefulSet
  replicas: 1
  # Allow KEDA autoscaling.
  keda:
    enabled: true
dags:
  persistence:
    enabled: true
    storageClassName: openebs-hostpath
    #existingClaim: airflow-dags-pvc
    subPath: ""
  mountPath: /opt/airflow/dags
config:
  core:
    parallelism: 128  
  • Click Save Changes

Airflow create

Important

The parallelism setting in the values file of 128 will limit the max scaling of Airflow workers to 8. This scaling will only occur when the number of Airflow Running and Queued tasks reaches 128. This may require multiple DAGs to reach this level of scaling.


Step 3: Update Blueprint

  • Navigate to Infrastructure -> Blueprints
  • Edit the previously created KEDA blueprint
  • Enter a version name
  • Click Configure Add-Ons
  • Select the previously created Airflow add-on
  • Click Save Changes

Blueprint

  • Click Save Changes

Blueprint


Step 4: Apply Blueprint

  • Navigate to Infrastructure -> Clusters
  • Click the gear icon on your cluster and select Update Blueprint
  • Select the previously updated blueprint
  • Click Save and Publish

Blueprint

After a few seconds, the blueprint with the KEDA and Airflow add-ons will be published on the cluster.

Blueprint


Step 5: Verify deployment

  • Navigate to Infrastructure -> Clusters
  • Click KUBECTL on your cluster
  • Type the following command
kubectl get all -n airflow

Airflow

  • Type the following command
kubectl get scaledobjects -n airflow

Scaledobjects


Step 6: Test Scaling

Create Airflow DAG

  • Create a file named keda_test_dag.py with the following contents
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import time

def simulate_task(task_number):
    print(f"Task {task_number} is running")
    time.sleep(60)  # simulate a heavy task
    print(f"Task {task_number} is done")

default_args = {
    'start_date': datetime(2023, 1, 1),
}

with DAG(
    dag_id='keda_scaling_test',
    default_args=default_args,
    schedule_interval=None,
    catchup=False,
    concurrency=20,
    max_active_runs=10,
    tags=['keda', 'scaling', 'test']
) as dag:
    for i in range(20):  # 20 parallel tasks
        PythonOperator(
            task_id=f'simulated_task_{i}',
            python_callable=simulate_task,
            op_args=[i]
        )
  • Enter the following command to copy the file into the Airflow scheduler pod. Be sure to update the name of the scheduler pod with the pod name in your environment.
kubectl cp ./keda_test_dag.py airflow-scheduler-fd795f55b-hmpqj:/opt/airflow/dags -n airflow

Login to Airflow

  • Enter the following command create a port forward to the Airflow webserver
kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow

Login

You will see the previously loaded DAG. If you do not, refresh the screen as it can take up to a few minutes for the DAG to load.

Dag

Trigger DAG

  • Enter the following command to watch the Airflow pods
watch kubectl get pods -n airflow
  • In the Airflow console, click the trigger DAG button on the DAG

Trigger Dag

You will see that KEDA scales the needed Airflow worker pods

Every 2.0s: kubectl get pods -n airflow                                                                                                       keeda-cluster-tim: Wed May 28 20:29:25 2025

NAME                                 READY   STATUS    RESTARTS   AGE
airflow-postgresql-0                 1/1     Running   0          23m
airflow-redis-0                      1/1     Running   0          23m
airflow-scheduler-85cf6f8d47-zbcqf   2/2     Running   0          23m
airflow-statsd-8b64dd664-5qn5b       1/1     Running   0          23m
airflow-triggerer-0                  2/2     Running   0          23m
airflow-webserver-66bff8568c-x5lj9   1/1     Running   0          23m
airflow-worker-0                     2/2     Running   0          18s

After a few minutes, the DAG tasks will have completed and KEDA will scale down the worker pods.

If the dag is triggered multiple times in a short time period, the number of queued tasks will increase causing KEDA to scale up additional pods.