前言

最近打算读取和修改Airflow metadata DB数据,完成用户权限管理,还有获取Airflow dags tasks运行信息
后来修改Airflow API设置,发现2.2.4版本stable rest API功能很多,上面需求基本上可以通过call API完成
既然花时间了解Airflow metadata table,干脆写个文章记录下
本文章只针对Airflow 2.2.4版本,转载请注明来自 Nero Blog

Airflow基本介绍

Airflow基础概念: Airflow Concepts
Airflow概述: Airflow Overview
Airflow官方网站提到: 了解Airflow DB架构主要帮助排查故障和学习,不建议直接访问或者修改DB数据

It might be changed at any time and you should not directly access the database to retrieve information from it or modify the data - you should use REST API Reference to do that instead. The main purpose of this diagram is to help with troubleshooting and understanding of the internal Airflow DB architecture in case you have any problems with the database - for example when dealing with problems with migrations

Airflow tables (缓慢更新ing…)

ab_permission

ab_permission

Columns Data type NULL
id INTEGER NOT NULL
name VARCHAR(100) NOT NULL





ab_permission记录9种基础action

  1. can_read
  2. can_edit
  3. can_delete
  4. can_create
  5. menu_access
  6. can_list
  7. can_show
  8. can_get
  9. can_test
Airflow Resource-Based permissions

Permissions (each consistent of a resource + action pair) are then added to roles.

ab_permission_view

ab_permission_view

Columns Data type NULL
id INTEGER NOT NULL
permission_id INTEGER
view_menu_id INTEGER






ab_permission_view记录Airflow现有permission组合
Permission是由resource(permission_id) + action(view_menu_id)组合而成
因此ab_permission和ab_view_menu 2个表,是ab_permission_view关联表
跟dag权限相关的资源
  1. DAG Runs
  2. Task Instances
  3. DAG Code
  4. Task Logs
  5. Website
  6. Task Reschedules

ab_permission_view_role

ab_permission_view_role

Columns Data type NULL
id INTEGER NOT NULL
permission_view_id INTEGER
role_id INTEGER






ab_permission_view_role记录permission和role的关联关系
role和permission的关系是一对多,如表格所示
id permission_view_id role_id
1 14 3
2 5 3
3 38 3

ab_register_user

ab_register_user

Columns Data type NULL
id INTEGER NOT NULL
email VARCHAR(256) NOT NULL
first_name VARCHAR(64) NOT NULL
last_name VARCHAR(64) NOT NULL
password VARCHAR(256)
registration_date DATETIME
registration_hash VARCHAR(256)
username VARCHAR(256) NOT NULL












ab_register_user记录Airflow注册用户信息

ab_role

ab_role

Columns Data type NULL
id INTEGER NOT NULL
name VARCHAR(64) NOT NULL





ab_role记录Airflow现有的role
Airflow自带的role有5个
  1. Admin
  2. Public
  3. Viewer
  4. User
  5. Op

ab_user

ab_user

Columns Data type NULL
id INTEGER NOT NULL
active BOOLEAN
changed_by_fk INTEGER
changed_on DATETIME
created_by_fk INTEGER
created_on DATETIME
email VARCHAR(256) NOT NULL
fail_login_count INTEGER
first_name VARCHAR(64) NOT NULL
last_login DATETIME
last_name VARCHAR(64) NOT NULL
login_count INTEGER
password VARCHAR(256) NOT NULL
username VARCHAR(256) NOT NULL


















ab_user记录Airflow用户信息
其中包括用户基本资料、登录时间、登录失败次数...
修改或删除用户API和command中,会用username识别用户,在创建用户时要保证username的唯一性

ab_user_role

ab_user_role

Columns Data type NULL
id INTEGER NOT NULL
role_id INTEGER
user_id INTEGER






ab_user_role记录user和role的关联关系
user和role的关系是一对多,如表格所示
id role_id user_id
1 2 3
2 1 3

ab_view_menu

ab_view_menu

Columns Data type NULL
id INTEGER NOT NULL
name VARCHAR(250) NOT NULL





ab_view_menu记录52种Airflow resource
每当新建一个dag,会创建新资源DAG:{dag_id}
  1. Connections
  2. DAGs
  3. DAGs Runs
  4. Task Instances
  5. Audit Logs
  6. ImportError
  7. Pools
  8. Providers
  9. Variables
  10. XComs
  11. DAG Code
  12. Configurations
  13. Plugins
  14. Roles
  15. Permissions
  16. Users
  17. DAG Dependencies
  18. Jobs
  19. My Password
  20. My Profile
  21. SLA Misses
  22. Task Logs
  23. Website
  24. Browse
  25. Documentation
  26. Docs
  27. Admin
  28. Task Reschedules
  29. Triggers
  30. Password
  31. IndexView
  32. UtilView
  33. LocaleView
  34. SecurityApi
  35. AuthLDAPView
  36. CustomUserLDAPModelView
  37. List Users
  38. Security
  39. List Roles
  40. User Stats Chart
  41. User's Statistics
  42. RegisterUserModelView
  43. Base Permissions
  44. View Menus
  45. Views/Menus
  46. Permission Views
  47. Permission on Views/Menus
  48. MenuApi
  49. AutocompleteView
  50. Airflow
  51. DagDependenciesView
  52. RedocView

alembic_version

alembic_version

Columns Data type NULL
version_num VARCHAR(32) NOT NULL






connection

connection

Columns Data type NULL
id INTEGER NOT NULL
conn_id VARCHAR(250) NOT NULL
conn_type VARCHAR(500) NOT NULL
description VARCHAR(5000) NOT NULL
extra TEXTT
host VARCHAR(5000)
is_encrypted BOOLEAN
is_encrypted_encrypted BOOLEAN
login VARCHAR(5000)
password VARCHAR(5000)
port INTEGER
schema VARCHAR(5000)






dag

dag

Columns Data type NULL
dag_id VARCHAR(250) NOT NULL
default_view VARCHAR(250)
description TEXT
fileloc VARCHAR(2000)
has_import_errors BOOLEAN
has_task_concurrency_limits BOOLEAN NOT NULL
is_active BOOLEAN
is_paused BOOLEAN
is_subdag BOOLEAN
last_expired TIMESTAMP
last_parsed_time TIMESTAMP
last_pickled TIMESTAMP
max_active_runs INTEGER
max_active_tasks INTEGER NOT NULL
next_dagrun TIMESTAMP
next_dagrun_create_after TIMESTAMP
next_dagrun_data_interval_end TIMESTAMP
next_dagrun_data_interval_start TIMESTAMP
owners VARCHAR(2000)
pickle_id INTEGER
processor_subdir VARCHAR(2000)
root_dag_id VARCHAR(250)
schedule_interval TEXT
scheduler_lock BOOLEAN
timetable_description VARCHAR(1000)






dag_code

dag_code

Columns Data type NULL
fileloc_hash BIGINT NOT NULL
fileloc VARCHAR(2000) NOT NULL
last_updated TIMESTAMP NOT NULL
source_code TEXT NOT NULL






dag_pickle

dag_pickle

Columns Data type NULL
id INTEGER NOT NULL
created_dttm TIMESTAMP
pickle BLOB
pickle_hash BIGINT






dag_run

dag_run

Columns Data type NULL
id INTEGER NOT NULL
conf BLOB
creating_job_id INTEGER
dag_hash VARCHAR(32)
dag_id VARCHAR(250) NOT NULL
data_interval_end TIMESTAMP
data_interval_start TIMESTAMP
end_date TIMESTAMP
execution_date TIMESTAMP NOT NULL
external_trigger BOOLEAN
last_scheduling_decision TIMESTAMP
log_template_id INTEGER
queued_at TIMESTAMP
run_id VARCHAR(250) NOT NULL
run_type VARCHAR(50) NOT NULL
start_date TIMESTAMP
state VARCHAR(50)






dag_tag

dag_tag

Columns Data type NULL
dag_id VARCHAR(250) NOT NULL
name VARCHAR(100) NOT NULL






import_error

import_error

Columns Data type NULL
id INTEGER NOT NULL
filename VARCHAR(1024)
stacktrace TEXT
timestamp TIMESTAMP






job

job

Columns Data type NULL
id INTEGER NOT NULL
dag_id VARCHAR(250)
end_date TIMESTAMP
executor_class VARCHAR(500)
hostname VARCHAR(500)
job_type VARCHAR(30)
latest_heartbeat TIMESTAMP
start_date TIMESTAMP
state VARCHAR(20)
unixname VARCHAR(1000)














log
rendered_task_instance_fields
sensor_instance
serialized_dag
session
sla_miss
slot_pool
task_fail
task_instance
task_reschedule
test_pg_connection
trigger
variable
xcom