LogoDuyệtSr. Data Engineer
HomeAboutPhotosInsightsCV

Footer

Logo

Resources

  • Rust Tiếng Việt
  • /archives
  • /series
  • /tags
  • Status

me@duyet.net

  • About
  • LinkedIn
  • Resume
  • Projects

© 2026 duyet.net | Sr. Data Engineer | 2026-02-27

Airflow control the parallelism and concurrency (draw)

Airflow control the parallelism and concurrency

Airflow configuration to allow for a larger scheduling capacity and frequency:

  • parallelism
  • max_active_tasks_per_dag
  • max_active_runs_per_dag

DAGs have configurations that improve efficiency:

  • max_active_tasks: Overrides max_active_tasks_per_dag.
  • max_active_runs: Overrides max_active_runs_per_dag.

Operators or tasks also have configurations that improves efficiency and scheduling priority:

  • max_active_tis_per_dag: This parameter controls the number of concurrent running task instances across dag_runs per task.
  • pool: See Pools.
  • priority_weight: See Priority Weights.
  • queue: See Queues for CeleryExecutor deployments only.

Credits

  • Airflow Fundamental Concepts > Backfill
  • How to control the parallelism or concurrency of an Airflow installation?
Jul 16, 2023·3 years ago
|Data|
DataData EngineeringApache Airflow
|Edit|

Related Posts

Airflow Dataset (Data-aware scheduling)

Airflow since 2.4, in addition to scheduling DAGs based upon time, they can also be scheduled based upon a task updating a dataset. This will change the way you schedule DAGs.

Sep 27, 2022·3 years ago
Read more

DuckDB

In this post, I want to explore the features and capabilities of DuckDB, an open-source, in-process SQL OLAP database management system written in C++11 that has been gaining popularity recently. According to what people have said, DuckDB is designed to be easy to use and flexible, allowing you to run complex queries on relational datasets using either local, file-based DuckDB instances or the cloud service MotherDuck.

Sep 3, 2023·2 years ago
Read more

Running Spark in GitHub Actions

This post provides a quick and easy guide on how to run Apache Spark in GitHub Actions for testing purposes

May 7, 2023·3 years ago
Read more

GPT vs Traditional NLP Models

The field of Natural Language Processing (NLP) has seen remarkable advancements in recent years, and the emergence of the Generative Pre-trained Transformer (GPT) has revolutionized the way NLP models operate. GPT is a cutting-edge language model that employs deep learning to generate human-like text. Unlike conventional NLP models, which required extensive training on specific tasks, GPT is pre-trained on vast amounts of data and can be fine-tuned for various NLP tasks

Apr 1, 2023·3 years ago
Read more
On this page
  • Credits
On this page
  • Credits