Install Apache Airflow
This section provides a step-by-step guide to installing Apache Airflow.
Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. In Airflow, workflows are defined as code, and they are more maintainable, versionable, testable, and collaborative.You can use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes tasks on an array of workers while following the specified dependencies. Availability of a rich set of command line utilities help performing complex surgeries on DAGs. The user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
Requirements
- Kubernetes version 1.30+
- Helm version 3.10+
- Optional: PV provisioner support
Download and Install Apache Airflow
To install the Apache Airflow:
- Add Apache Airflow repositories to helm
configuration.
helm repo add apache-airflow https://airflow.apache.org helm repo update - Download the Helm chart to
local.
mkdir -p /home/user/airflow-chart-download cd /home/user/airflow-chart-download # Download the chart locally, it should download a tarball in the same directory helm pull apache-airflow/airflow # untar the downloaded tarball tar -zxvf airflow-1.18.0.tgz - Update the airflow image with the HCL provided airflow image in the
helm chart inside the values.yaml as shown
below.
defaultAirflowRepository: <Repository_URL> defaultAirflowTag: <Image_Tag> airflowVersion: <3.1.2> - Alternatively, you can override the default configuration in the values-local.yaml file.
- Airflow requires persistent storage for DAGs and logs. Include the DAGs and Log
details in the values-local.yaml
file.
dags: persistence: enabled: true existingClaim: airflow-dags-pvc logs: persistence: enabled: true existingClaim: airflow-logs-pvcNote: Make sure these volumes are created before starting the installation. - Optional: If you want customize the URL path for the Airflow UI, edit your
values.yamlfile to set the Airflow base URL. - Update the
base_urlfield under theconfig:apisection with your desired path:config: api: base_url: https://<your-domain-or-loadbalancer>/airflowNote: You must ensure that your Kubernetes Ingress is configured to correctly route traffic for the/airflowpath to the Airflow API server service.
Deploy/Upgrade Apache Airflow
- Deploy the Apache Airflow using
Helm.
helm upgrade --install airflow /home/user/airflow-chart-download/airflow \ --namespace airflow --create-namespace --debug \ -f values-local.yamlNote: Always specify an override file (-f values-local.yaml) to avoid editing the base chart. - After deploying Apache Airflow, verify the deployed
pods.
kubectl get pods --namespace airflow - Access the airflow UI application using the HOST/PORT
URL.
http://<LoadBalancer_or_Ingress_Hostname>/airflow
Uninstall Apache Airflow
To uninstall the Apache Airflow:
- To completely remove the Airflow, run the command
below:
helm delete airflow --namespace airflow - The Helm command may not delete all resources, specifically those created by Helm hooks. You may need to manually verify and remove any remaining resources, if you no longer need the stored data.