Setup
In this exercise, you will provision an Airflow Helm chart and test the autoscaling of Airflow with KEDA.
Important
This tutorial describes the steps using the Rafay Web Console. The entire workflow can also be fully automated and embedded into an automation pipeline
Watch a video of the steps below.
Assumptions¶
You have already provisioned or imported a Kubernetes cluster into your Rafay Org and created a blueprint with KEDA
Step 1: Create Namespace¶
- Login into the Web Console
- Navigate to Infrastructure -> Namespaces
- Create a new namespace, specify the name (e.g. airflow) and select type as Wizard
- In the placement section, select a cluster
- Click Save & Go to Publish
- Publish the namespace
Step 2: Create Airflow Add-on¶
- Navigate to Infrastructure -> Add-Ons
- Select New Add-On -> Create New Add-On from Catalog
- Search for airflow
- Select airflow from default-helm
- Select Create Add-On
- Enter a name for the add-on
- Specify the namespace (e.g. airflow and select the namespace created as part of the previous step
- Click Create
- Enter a version name
- Upload the following helm values. Be sure to update the storageClassName with a storageclass in your cluster
# Airflow Worker Config
workers:
# Number of airflow celery workers in StatefulSet
replicas: 1
# Allow KEDA autoscaling.
keda:
enabled: true
dags:
persistence:
enabled: true
storageClassName: openebs-hostpath
#existingClaim: airflow-dags-pvc
subPath: ""
mountPath: /opt/airflow/dags
config:
core:
parallelism: 128
- Click Save Changes
Important
The parallelism setting in the values file of 128 will limit the max scaling of Airflow workers to 8. This scaling will only occur when the number of Airflow Running and Queued tasks reaches 128. This may require multiple DAGs to reach this level of scaling.
Step 3: Update Blueprint¶
- Navigate to Infrastructure -> Blueprints
- Edit the previously created KEDA blueprint
- Enter a version name
- Click Configure Add-Ons
- Select the previously created Airflow add-on
- Click Save Changes
- Click Save Changes
Step 4: Apply Blueprint¶
- Navigate to Infrastructure -> Clusters
- Click the gear icon on your cluster and select Update Blueprint
- Select the previously updated blueprint
- Click Save and Publish
After a few seconds, the blueprint with the KEDA and Airflow add-ons will be published on the cluster.
Step 5: Verify deployment¶
- Navigate to Infrastructure -> Clusters
- Click KUBECTL on your cluster
- Type the following command
kubectl get all -n airflow
- Type the following command
kubectl get scaledobjects -n airflow
Step 6: Test Scaling¶
Create Airflow DAG¶
- Create a file named keda_test_dag.py with the following contents
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import time
def simulate_task(task_number):
print(f"Task {task_number} is running")
time.sleep(60) # simulate a heavy task
print(f"Task {task_number} is done")
default_args = {
'start_date': datetime(2023, 1, 1),
}
with DAG(
dag_id='keda_scaling_test',
default_args=default_args,
schedule_interval=None,
catchup=False,
concurrency=20,
max_active_runs=10,
tags=['keda', 'scaling', 'test']
) as dag:
for i in range(20): # 20 parallel tasks
PythonOperator(
task_id=f'simulated_task_{i}',
python_callable=simulate_task,
op_args=[i]
)
- Enter the following command to copy the file into the Airflow scheduler pod. Be sure to update the name of the scheduler pod with the pod name in your environment.
kubectl cp ./keda_test_dag.py airflow-scheduler-fd795f55b-hmpqj:/opt/airflow/dags -n airflow
Login to Airflow¶
- Enter the following command create a port forward to the Airflow webserver
kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow
- In your browser, navigate to http://localhost:8080/
- Login with admin/admin
You will see the previously loaded DAG. If you do not, refresh the screen as it can take up to a few minutes for the DAG to load.
Trigger DAG¶
- Enter the following command to watch the Airflow pods
watch kubectl get pods -n airflow
- In the Airflow console, click the trigger DAG button on the DAG
You will see that KEDA scales the needed Airflow worker pods
Every 2.0s: kubectl get pods -n airflow keeda-cluster-tim: Wed May 28 20:29:25 2025
NAME READY STATUS RESTARTS AGE
airflow-postgresql-0 1/1 Running 0 23m
airflow-redis-0 1/1 Running 0 23m
airflow-scheduler-85cf6f8d47-zbcqf 2/2 Running 0 23m
airflow-statsd-8b64dd664-5qn5b 1/1 Running 0 23m
airflow-triggerer-0 2/2 Running 0 23m
airflow-webserver-66bff8568c-x5lj9 1/1 Running 0 23m
airflow-worker-0 2/2 Running 0 18s
After a few minutes, the DAG tasks will have completed and KEDA will scale down the worker pods.
If the dag is triggered multiple times in a short time period, the number of queued tasks will increase causing KEDA to scale up additional pods.