Airflow Metadata Database
前言
最近打算读取和修改Airflow metadata DB数据,完成用户权限管理,还有获取Airflow dags tasks运行信息
后来修改Airflow API设置,发现2.2.4版本stable rest API功能很多,上面需求基本上可以通过call API完成
既然花时间了解Airflow metadata table,干脆写个文章记录下
本文章只针对Airflow 2.2.4版本,转载请注明来自 Nero Blog!
Airflow基本介绍
Airflow基础概念: Airflow Concepts
Airflow概述: Airflow Overview
Airflow官方网站提到: 了解Airflow DB架构主要帮助排查故障和学习,不建议直接访问或者修改DB数据
It might be changed at any time and you should not directly access the database to retrieve information from it or modify the data - you should use REST API Reference to do that instead. The main purpose of this diagram is to help with troubleshooting and understanding of the internal Airflow DB architecture in case you have any problems with the database - for example when dealing with problems with migrations
Airflow tables (缓慢更新ing…)
ab_permission
ab_permission
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
name | VARCHAR(100) | NOT NULL |
ab_permission记录9种基础action
- can_read
- can_edit
- can_delete
- can_create
- menu_access
- can_list
- can_show
- can_get
- can_test
Permissions (each consistent of a resource + action pair) are then added to roles.
ab_permission_view
ab_permission_view
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
permission_id | INTEGER | |
view_menu_id | INTEGER |
ab_permission_view记录Airflow现有permission组合
Permission是由resource(permission_id) + action(view_menu_id)组合而成
因此ab_permission和ab_view_menu 2个表,是ab_permission_view关联表
跟dag权限相关的资源
- DAG Runs
- Task Instances
- DAG Code
- Task Logs
- Website
- Task Reschedules
ab_permission_view_role
ab_permission_view_role
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
permission_view_id | INTEGER | |
role_id | INTEGER |
ab_permission_view_role记录permission和role的关联关系
role和permission的关系是一对多,如表格所示
id | permission_view_id | role_id |
---|---|---|
1 | 14 | 3 |
2 | 5 | 3 |
3 | 38 | 3 |
ab_register_user
ab_register_user
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
VARCHAR(256) | NOT NULL | |
first_name | VARCHAR(64) | NOT NULL |
last_name | VARCHAR(64) | NOT NULL |
password | VARCHAR(256) | |
registration_date | DATETIME | |
registration_hash | VARCHAR(256) | |
username | VARCHAR(256) | NOT NULL |
ab_register_user记录Airflow注册用户信息
ab_role
ab_role
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
name | VARCHAR(64) | NOT NULL |
ab_role记录Airflow现有的role
Airflow自带的role有5个
- Admin
- Public
- Viewer
- User
- Op
ab_user
ab_user
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
active | BOOLEAN | |
changed_by_fk | INTEGER | |
changed_on | DATETIME | |
created_by_fk | INTEGER | |
created_on | DATETIME | |
VARCHAR(256) | NOT NULL | |
fail_login_count | INTEGER | |
first_name | VARCHAR(64) | NOT NULL |
last_login | DATETIME | |
last_name | VARCHAR(64) | NOT NULL |
login_count | INTEGER | |
password | VARCHAR(256) | NOT NULL |
username | VARCHAR(256) | NOT NULL |
ab_user记录Airflow用户信息
其中包括用户基本资料、登录时间、登录失败次数...
修改或删除用户API和command中,会用username识别用户,在创建用户时要保证username的唯一性
ab_user_role
ab_user_role
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
role_id | INTEGER | |
user_id | INTEGER |
ab_user_role记录user和role的关联关系
user和role的关系是一对多,如表格所示
id | role_id | user_id |
---|---|---|
1 | 2 | 3 |
2 | 1 | 3 |
ab_view_menu
ab_view_menu
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
name | VARCHAR(250) | NOT NULL |
ab_view_menu记录52种Airflow resource
每当新建一个dag,会创建新资源DAG:{dag_id}
- Connections
- DAGs
- DAGs Runs
- Task Instances
- Audit Logs
- ImportError
- Pools
- Providers
- Variables
- XComs
- DAG Code
- Configurations
- Plugins
- Roles
- Permissions
- Users
- DAG Dependencies
- Jobs
- My Password
- My Profile
- SLA Misses
- Task Logs
- Website
- Browse
- Documentation
- Docs
- Admin
- Task Reschedules
- Triggers
- Password
- IndexView
- UtilView
- LocaleView
- SecurityApi
- AuthLDAPView
- CustomUserLDAPModelView
- List Users
- Security
- List Roles
- User Stats Chart
- User's Statistics
- RegisterUserModelView
- Base Permissions
- View Menus
- Views/Menus
- Permission Views
- Permission on Views/Menus
- MenuApi
- AutocompleteView
- Airflow
- DagDependenciesView
- RedocView
alembic_version
alembic_version
Columns | Data type | NULL |
---|---|---|
version_num | VARCHAR(32) | NOT NULL |
connection
connection
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
conn_id | VARCHAR(250) | NOT NULL |
conn_type | VARCHAR(500) | NOT NULL |
description | VARCHAR(5000) | NOT NULL |
extra | TEXTT | |
host | VARCHAR(5000) | |
is_encrypted | BOOLEAN | |
is_encrypted_encrypted | BOOLEAN | |
login | VARCHAR(5000) | |
password | VARCHAR(5000) | |
port | INTEGER | |
schema | VARCHAR(5000) |
dag
dag
Columns | Data type | NULL |
---|---|---|
dag_id | VARCHAR(250) | NOT NULL |
default_view | VARCHAR(250) | |
description | TEXT | |
fileloc | VARCHAR(2000) | |
has_import_errors | BOOLEAN | |
has_task_concurrency_limits | BOOLEAN | NOT NULL |
is_active | BOOLEAN | |
is_paused | BOOLEAN | |
is_subdag | BOOLEAN | |
last_expired | TIMESTAMP | |
last_parsed_time | TIMESTAMP | |
last_pickled | TIMESTAMP | |
max_active_runs | INTEGER | |
max_active_tasks | INTEGER | NOT NULL |
next_dagrun | TIMESTAMP | |
next_dagrun_create_after | TIMESTAMP | |
next_dagrun_data_interval_end | TIMESTAMP | |
next_dagrun_data_interval_start | TIMESTAMP | |
owners | VARCHAR(2000) | |
pickle_id | INTEGER | |
processor_subdir | VARCHAR(2000) | |
root_dag_id | VARCHAR(250) | |
schedule_interval | TEXT | |
scheduler_lock | BOOLEAN | |
timetable_description | VARCHAR(1000) |
dag_code
dag_code
Columns | Data type | NULL |
---|---|---|
fileloc_hash | BIGINT | NOT NULL |
fileloc | VARCHAR(2000) | NOT NULL |
last_updated | TIMESTAMP | NOT NULL |
source_code | TEXT | NOT NULL |
dag_pickle
dag_pickle
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
created_dttm | TIMESTAMP | |
pickle | BLOB | |
pickle_hash | BIGINT |
dag_run
dag_run
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
conf | BLOB | |
creating_job_id | INTEGER | |
dag_hash | VARCHAR(32) | |
dag_id | VARCHAR(250) | NOT NULL |
data_interval_end | TIMESTAMP | |
data_interval_start | TIMESTAMP | |
end_date | TIMESTAMP | |
execution_date | TIMESTAMP | NOT NULL |
external_trigger | BOOLEAN | |
last_scheduling_decision | TIMESTAMP | |
log_template_id | INTEGER | |
queued_at | TIMESTAMP | |
run_id | VARCHAR(250) | NOT NULL |
run_type | VARCHAR(50) | NOT NULL |
start_date | TIMESTAMP | |
state | VARCHAR(50) |
dag_tag
dag_tag
Columns | Data type | NULL |
---|---|---|
dag_id | VARCHAR(250) | NOT NULL |
name | VARCHAR(100) | NOT NULL |
import_error
import_error
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
filename | VARCHAR(1024) | |
stacktrace | TEXT | |
timestamp | TIMESTAMP |
job
job
Columns | Data type | NULL |
---|---|---|
id | INTEGER | NOT NULL |
dag_id | VARCHAR(250) | |
end_date | TIMESTAMP | |
executor_class | VARCHAR(500) | |
hostname | VARCHAR(500) | |
job_type | VARCHAR(30) | |
latest_heartbeat | TIMESTAMP | |
start_date | TIMESTAMP | |
state | VARCHAR(20) | |
unixname | VARCHAR(1000) |
log
rendered_task_instance_fields
sensor_instance
serialized_dag
session
sla_miss
slot_pool
task_fail
task_instance
task_reschedule
test_pg_connection
trigger
variable
xcom