Part 1: Apache Aiflow setup (MacOs, Apple)

Published on
4 mins read
––– views
Photo by Chris Ried on Unsplash
python-notes

Airflow Setup Instructions for MacOS - Installation - 2024

Step: 01

Installation from PyPI

pip install "apache-airflow==2.8.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.3/constraints-3.8.txt"
pip install "apache-airflow==2.8.3"

[OR]

AIRFLOW_VERSION=2.8.3
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-no-providers-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

Note: In case if you face any issues with pip while executing the above command, we can upgrade pip itself by using the below command: python3 -m pip install -U pip

If the 'airflow' command is not recognized:

Ensure that ~/.local/bin is in your PATH environment variable, and add it if necessary:

PATH=$PATH:~/.local/bin

Step: 02

1. Airflow Working Directory

Create configuration files and metadata storage directories.

mkdir -p ~/Documents/airflow-tutorial/airflow`

Set environment variables by exporting the AIRFLOW_HOME directory.

export AIRFLOW_HOME=~/Documents/airflow-tutorial/airflow

By default, airflow uses ~/airflow as it's AIRFLOW_HOME directory. We can overwrite this by setting a different path.

Airflow will initialize the airflow.cfg file here along with the logs folder. We'll store our dags and plugins in this directory.

Alternatively, we can set a permanent environment variable like AIRFLOW_HOME in your bash_profile.

2. Initialize Database (migrate)

After installation, it's crucial to initialize the database. Run the following command to create the necessary tables: By default, Airflow uses sqlite database and following command initializes the necessary tables.

cd ${AIRFLOW_HOME}
airflow db migrate

Output:

DB: sqlite:////Users/xxxx/Documents/airflow-tutorial/airflow/airflow.db
Performing upgrade to the metadata database sqlite:////Users/xxxx/Documents/airflow-tutorial/airflow/airflow.db
[2024-03-14T12:57:33.692+0000] {migration.py:216} INFO - Context impl SQLiteImpl.
[2024-03-14T12:57:33.693+0000] {migration.py:219} INFO - Will assume non-transactional DDL.
[2024-03-14T12:57:33.693+0000] {migration.py:216} INFO - Context impl SQLiteImpl.
[2024-03-14T12:57:33.693+0000] {migration.py:219} INFO - Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running stamp_revision -> 88344c1d9134
Database migrating done!

DeprecationWarning: db init is deprecated. Use db migrate instead to migrate the db and/or airflow connections create-default-connections to create the default connections

> ll
total 1080
-rw------- 1 xxxxx staff 83K 14 Mar 12:57 airflow.cfg
-rw-r--r-- 1 xxxxx staff 456K 14 Mar 12:57 airflow.db
drwxr-xr-x 3 xxxxx staff 96B 14 Mar 12:57 logs

Here, airflow.cfg file contains the configuration properties for the airflow and various settings. The airflow.db is the database file. Also, there is a logs folder.

3. Setup Admin user

Create an Admin User: Create a user to access the Airflow web interface.

airflow users create \
--username admin \
--password admin \
--firstname Vijay \
--lastname Anand \
--role Admin \
--email admin@example.com

Output:

[2024-03-14T13:09:21.075+0000] {override.py:1820} INFO - Added Permission can read on View Menus to role Admin
[2024-03-14T13:09:21.078+0000] {override.py:1769} INFO - Created Permission View: menu access on Resources
[2024-03-14T13:09:21.080+0000] {override.py:1820} INFO - Added Permission menu access on Resources to role Admin
[2024-03-14T13:09:21.089+0000] {override.py:1769} INFO - Created Permission View: can read on Permission Views
[2024-03-14T13:09:21.091+0000] {override.py:1820} INFO - Added Permission can read on Permission Views to role Admin
[2024-03-14T13:09:21.095+0000] {override.py:1769} INFO - Created Permission View: menu access on Permission Pairs
[2024-03-14T13:09:21.097+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin
[2024-03-14T13:09:21.217+0000] {override.py:1458} INFO - Added user admin
User "admin" created with role "Admin"

Run the following command to list the users:

airflow users list

Output:

id | username | email | first_name | last_name | roles
===+==========+===================+============+===========+======
1 | admin | admin@example.com | Vijay | Anand | Admin

4. Starting the Airflow scheduler

The scheduler is the component that actually manages and runs the various jobs.

Start the scheduler:

airflow scheduler -D

[OR]

airflow scheduler \
--pid ${AIRFLOW_HOME}/logs/airflow-scheduler.pid \
--stdout ${AIRFLOW_HOME}/logs/airflow-scheduler.out \
--stderr ${AIRFLOW_HOME}/logs/airflow-scheduler.out \
-l ${AIRFLOW_HOME}/logs/airflow-scheduler.log \
-D

(executed scheduler as deamon process)

Stop the scheduler:

ps -ef | egrep 'airflow scheduler' | grep -v grep| awk '{print $2}' | xargs kill -9

[OR]

cat ${AIRFLOW_HOME}/logs/airflow-scheduler.pid | xargs kill -15

5. Starting the Airflow Web Server

Start the webserver:

export AIRFLOW_HOME=~/Documents/airflow-tutorial/airflow
airflow webserver --port 8080 -D

[OR]

airflow webserver \
--pid ${AIRFLOW_HOME}/logs/airflow-webserver.pid \
--stdout ${AIRFLOW_HOME}/logs/airflow-webserver.out \
--stderr ${AIRFLOW_HOME}/logs/airflow-webserver.out \
-l ${AIRFLOW_HOME}/logs/airflow-webserver.log \
-D

(executed webserver as deamon process)

After the scheduler and webserver have been initialized, open any browser and go to http://localhost:8080/. Port 8080 should be the default port for Airflow.

Stop the webserver:

ps -ef | grep -i 'airflow webserver' | grep -v grep | awk '{print $2}' | xargs kill -9

[OR]

cat ${AIRFLOW_HOME}/logs/airflow-webserver.pid | xargs kill -15

screenshot

screenshot

You can also start Airflow with:

python -m airflow

These instructions provide detailed steps for setting up Apache Airflow on macOS, ensuring a smooth setup process for managing workflows effectively. Adjustments can be made as necessary based on your specific environment and requirements.

To kill all airflow process:

for pid in $(ps -ef | grep "airflow" | awk '{print $2}'); do kill -9 $pid; done

Reference:

  1. https://www.linkedin.com/pulse/install-apache-airflow-mac-os-ranga-reddy/