Install Apache Airflow

This section provides a step-by-step guide to installing Apache Airflow.

Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. In Airflow, workflows are defined as code, and they are more maintainable, versionable, testable, and collaborative.You can use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes tasks on an array of workers while following the specified dependencies. Availability of a rich set of command line utilities help performing complex surgeries on DAGs. The user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

Requirements

  • Kubernetes version 1.30+
  • Helm version 3.10+
  • Optional: PV provisioner support

Download and Install Apache Airflow

To install the Apache Airflow:

  1. Add Apache Airflow repositories to helm configuration.
    helm repo add apache-airflow https://airflow.apache.org
    helm repo update
  2. Download the Helm chart to local.
    mkdir -p /home/user/airflow-chart-download
    cd /home/user/airflow-chart-download
    # Download the chart locally, it should download a tarball in the same directory
    helm pull apache-airflow/airflow
    # untar the downloaded tarball
    tar -zxvf airflow-1.18.0.tgz
  3. Update the airflow image with the HCL provided airflow image in the helm chart inside the values.yaml as shown below.
    defaultAirflowRepository: <Repository_URL>
    defaultAirflowTag: <Image_Tag>
    airflowVersion: <3.1.2>
  4. Alternatively, you can override the default configuration in the values-local.yaml file.
  5. Airflow requires persistent storage for DAGs and logs. Include the DAGs and Log details in the values-local.yaml file.
    dags:
      persistence:
        enabled: true
        existingClaim: airflow-dags-pvc
    logs:
      persistence:
        enabled: true
        existingClaim: airflow-logs-pvc
    Note: Make sure these volumes are created before starting the installation.
  6. Optional: If you want customize the URL path for the Airflow UI, edit your values.yaml file to set the Airflow base URL.
  7. Update the base_url field under the config:api section with your desired path:
    config:
    api:
      base_url: https://<your-domain-or-loadbalancer>/airflow
    Note: You must ensure that your Kubernetes Ingress is configured to correctly route traffic for the /airflow path to the Airflow API server service.

Deploy/Upgrade Apache Airflow

  1. Deploy the Apache Airflow using Helm.
    helm upgrade --install airflow /home/user/airflow-chart-download/airflow \
      --namespace airflow --create-namespace --debug \
      -f values-local.yaml
    Note: Always specify an override file (-f values-local.yaml) to avoid editing the base chart.
  2. After deploying Apache Airflow, verify the deployed pods.
    kubectl get pods --namespace airflow
  3. Access the airflow UI application using the HOST/PORT URL.
    http://<LoadBalancer_or_Ingress_Hostname>/airflow

Uninstall Apache Airflow

To uninstall the Apache Airflow:

  1. To completely remove the Airflow, run the command below:
    helm delete airflow --namespace airflow
  2. The Helm command may not delete all resources, specifically those created by Helm hooks. You may need to manually verify and remove any remaining resources, if you no longer need the stored data.