Part 1: Apache Aiflow setup (MacOs, Apple)
- Published on
- • 4 mins read•––– views
Airflow Setup Instructions for MacOS - Installation - 2024
Step: 01
Installation from PyPI
pip install "apache-airflow==2.8.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.3/constraints-3.8.txt"
pip install "apache-airflow==2.8.3"
[OR]
AIRFLOW_VERSION=2.8.3
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-no-providers-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
Note: In case if you face any issues with pip while executing the above command, we can upgrade pip itself by using the below command:
python3 -m pip install -U pip
If the 'airflow' command is not recognized:
Ensure that ~/.local/bin
is in your PATH environment variable, and add it if necessary:
PATH=$PATH:~/.local/bin
Step: 02
1. Airflow Working Directory
Create configuration files and metadata storage directories.
mkdir -p ~/Documents/airflow-tutorial/airflow`
Set environment variables by exporting the AIRFLOW_HOME directory.
export AIRFLOW_HOME=~/Documents/airflow-tutorial/airflow
By default, airflow uses ~/airflow
as it's AIRFLOW_HOME directory. We can overwrite this by setting a different path.
Airflow will initialize the airflow.cfg
file here along with the logs folder. We'll store our dags and plugins in this directory.
Alternatively, we can set a permanent environment variable like AIRFLOW_HOME in your bash_profile
.
2. Initialize Database (migrate)
After installation, it's crucial to initialize the database. Run the following command to create the necessary tables:
By default, Airflow uses sqlite
database and following command initializes the necessary tables.
cd ${AIRFLOW_HOME}
airflow db migrate
Output:
DB: sqlite:////Users/xxxx/Documents/airflow-tutorial/airflow/airflow.dbPerforming upgrade to the metadata database sqlite:////Users/xxxx/Documents/airflow-tutorial/airflow/airflow.db[2024-03-14T12:57:33.692+0000] {migration.py:216} INFO - Context impl SQLiteImpl.[2024-03-14T12:57:33.693+0000] {migration.py:219} INFO - Will assume non-transactional DDL.[2024-03-14T12:57:33.693+0000] {migration.py:216} INFO - Context impl SQLiteImpl.[2024-03-14T12:57:33.693+0000] {migration.py:219} INFO - Will assume non-transactional DDL.INFO [alembic.runtime.migration] Context impl SQLiteImpl.INFO [alembic.runtime.migration] Will assume non-transactional DDL.INFO [alembic.runtime.migration] Running stamp_revision -> 88344c1d9134Database migrating done!
DeprecationWarning:
db init
is deprecated. Usedb migrate
instead to migrate the db and/or airflow connections create-default-connections to create the default connections
> lltotal 1080-rw------- 1 xxxxx staff 83K 14 Mar 12:57 airflow.cfg-rw-r--r-- 1 xxxxx staff 456K 14 Mar 12:57 airflow.dbdrwxr-xr-x 3 xxxxx staff 96B 14 Mar 12:57 logs
Here, airflow.cfg
file contains the configuration properties for the airflow and various settings.
The airflow.db
is the database file. Also, there is a logs
folder.
3. Setup Admin user
Create an Admin User: Create a user to access the Airflow web interface.
airflow users create \--username admin \--password admin \--firstname Vijay \--lastname Anand \--role Admin \--email admin@example.com
Output:
[2024-03-14T13:09:21.075+0000] {override.py:1820} INFO - Added Permission can read on View Menus to role Admin[2024-03-14T13:09:21.078+0000] {override.py:1769} INFO - Created Permission View: menu access on Resources[2024-03-14T13:09:21.080+0000] {override.py:1820} INFO - Added Permission menu access on Resources to role Admin[2024-03-14T13:09:21.089+0000] {override.py:1769} INFO - Created Permission View: can read on Permission Views[2024-03-14T13:09:21.091+0000] {override.py:1820} INFO - Added Permission can read on Permission Views to role Admin[2024-03-14T13:09:21.095+0000] {override.py:1769} INFO - Created Permission View: menu access on Permission Pairs[2024-03-14T13:09:21.097+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin[2024-03-14T13:09:21.217+0000] {override.py:1458} INFO - Added user adminUser "admin" created with role "Admin"
Run the following command to list the users:
airflow users list
Output:
id | username | email | first_name | last_name | roles===+==========+===================+============+===========+======1 | admin | admin@example.com | Vijay | Anand | Admin
4. Starting the Airflow scheduler
The scheduler is the component that actually manages and runs the various jobs.
Start the scheduler:
airflow scheduler -D
[OR]
airflow scheduler \--pid ${AIRFLOW_HOME}/logs/airflow-scheduler.pid \--stdout ${AIRFLOW_HOME}/logs/airflow-scheduler.out \--stderr ${AIRFLOW_HOME}/logs/airflow-scheduler.out \-l ${AIRFLOW_HOME}/logs/airflow-scheduler.log \-D
(executed scheduler as deamon process)
Stop the scheduler:
ps -ef | egrep 'airflow scheduler' | grep -v grep| awk '{print $2}' | xargs kill -9
[OR]
cat ${AIRFLOW_HOME}/logs/airflow-scheduler.pid | xargs kill -15
5. Starting the Airflow Web Server
Start the webserver:
export AIRFLOW_HOME=~/Documents/airflow-tutorial/airflow
airflow webserver --port 8080 -D
[OR]
airflow webserver \--pid ${AIRFLOW_HOME}/logs/airflow-webserver.pid \--stdout ${AIRFLOW_HOME}/logs/airflow-webserver.out \--stderr ${AIRFLOW_HOME}/logs/airflow-webserver.out \-l ${AIRFLOW_HOME}/logs/airflow-webserver.log \-D
(executed webserver as deamon process)
After the scheduler and webserver have been initialized, open any browser and go to http://localhost:8080/. Port 8080 should be the default port for Airflow.
Stop the webserver:
ps -ef | grep -i 'airflow webserver' | grep -v grep | awk '{print $2}' | xargs kill -9
[OR]
cat ${AIRFLOW_HOME}/logs/airflow-webserver.pid | xargs kill -15
You can also start Airflow with:
python -m airflow
These instructions provide detailed steps for setting up Apache Airflow on macOS, ensuring a smooth setup process for managing workflows effectively. Adjustments can be made as necessary based on your specific environment and requirements.
To kill all airflow process:
for pid in $(ps -ef | grep "airflow" | awk '{print $2}'); do kill -9 $pid; done
Reference: