Yaml Format
::: {.contents local=""} :::
The minimal DAG definition is as simple as follows.
steps:
- name: step 1
command: echo hello
- name: step 2
command: echo world
depends:
- step 1
steps:
- name: step 1
dir: /path/to/working/directory
command: some command
script
field provides a way to run arbitrary snippets of code in any
language.
steps:
- name: step 1
command: "bash"
script: |
cd /tmp
echo "hello world" > hello
cat hello
output: RESULT
- name: step 2
command: echo ${RESULT} # hello world
depends:
- step 1
You can define environment variables and refer to them using the env
field.
env:
- SOME_DIR: ${HOME}/batch
- SOME_FILE: ${SOME_DIR}/some_file
steps:
- name: some task in some dir
dir: ${SOME_DIR}
command: python main.py ${SOME_FILE}
You can define parameters using the params
field and refer to each
parameter as $1, $2, etc. Parameters can also be command substitutions
or environment variables. It can be overridden by the --params=
parameter of the start
command.
params: param1 param2
steps:
- name: some task with parameters
command: python main.py $1 $2
Named parameters are also available as follows.
params: ONE=1 TWO=`echo 2`
steps:
- name: some task with parameters
command: python main.py $ONE $TWO
You can use command substitution in field values. I.e., a string enclosed in backquotes (`) is evaluated as a command and replaced with the result of standard output.
env:
TODAY: "`date '+%Y%m%d'`"
steps:
- name: hello
command: "echo hello, today is ${TODAY}"
Sometimes you have parts of a DAG that you only want to run under
certain conditions. You can use the preconditions
field to add
conditional branches to your DAG.
For example, the task below only runs on the first date of each month.
steps:
- name: A monthly task
command: monthly.sh
preconditions:
- condition: "`date '+%d'`"
expected: "01"
If you want the DAG to continue to the next step regardless of the
step's conditional check result, you can use the continueOn
field:
steps:
- name: A monthly task
command: monthly.sh
preconditions:
- condition: "`date '+%d'`"
expected: "01"
continueOn:
skipped: true
You can define functions in the DAG file and call them in steps. The
params
field is required for functions. The args
field is used to
pass arguments to functions. The arguments can be command substitutions
or environment variables.
functions:
- name: my_function
params: param1 param2
command: python main.py $param1 $param2
steps:
- name: step 1
call:
function: my_function
args:
param1: 1
param2: 2
The output
field can be used to set an environment variable with
standard output. Leading and trailing space will be trimmed
automatically. The environment variables can be used in subsequent
steps.
steps:
- name: step 1
command: "echo foo"
output: FOO # will contain "foo"
The [stdout]{.title-ref} field can be used to write standard output to a file.
steps:
- name: create a file
command: "echo hello"
stdout: "/tmp/hello" # the content will be "hello\n"
The [stderr]{.title-ref} field allows to redirect stderr to other file without writing to the normal log file.
steps:
- name: output error file
command: "echo error message >&2"
stderr: "/tmp/error.txt"
It is often desirable to take action when a specific event happens, for example, when a DAG fails. To achieve this, you can use [handlerOn]{.title-ref} fields.
handlerOn:
failure:
command: notify_error.sh
exit:
command: cleanup.sh
steps:
- name: A task
command: main.sh
If you want a task to repeat execution at regular intervals, you can use the [repeatPolicy]{.title-ref} field. If you want to stop the repeating task, you can use the [stop]{.title-ref} command to gracefully stop the task.
steps:
- name: A task
command: main.sh
repeatPolicy:
repeat: true
intervalSec: 60
You can use the [schedule]{.title-ref} field to schedule a DAG with Cron expression.
schedule: "5 4 * * *" # Run at 04:05.
steps:
- name: scheduled job
command: job.sh
See scheduler configuration
{.interpreted-text role=“ref”} for more
details.
This section provides a comprehensive list of available fields that can be used to configure DAGs and their steps in detail. Each field serves a specific purpose, enabling granular control over how the DAG runs. The fields include:
name
: The name of the DAG, which is optional. The default name is the name of the file.description
: A brief description of the DAG.schedule
: The execution schedule of the DAG in Cron expression format.group
: The group name to organize DAGs, which is optional.tags
: Free tags that can be used to categorize DAGs, separated by commas.env
: Environment variables that can be accessed by the DAG and its steps.logDir
: The directory where the standard output is written. The default value is${DAGU_HOME}/logs/dags
.restartWaitSec
: The number of seconds to wait after the DAG process stops before restarting it.histRetentionDays
: The number of days to retain execution history (not for log files).delaySec
: The interval time in seconds between steps.maxActiveRuns
: The maximum number of parallel running steps.params
: The default parameters that can be referred to by$1
,$2
, and so on.preconditions
: The conditions that must be met before a DAG or step can run.mailOn
: Whether to send an email notification when a DAG or step fails or succeeds.MaxCleanUpTimeSec
: The maximum time to wait after sending a TERM signal to running steps before killing them.handlerOn
: The command to execute when a DAG or step succeeds, fails, cancels, or exits.steps
: A list of steps to execute in the DAG.
In addition, a global configuration file, $DAGU_HOME/config.yaml
, can
be used to gather common settings, such as logDir
or env
.
Note: If DAGU_HOME
environment variable is not set, the default path
is $HOME/.dagu/config.yaml
.
Example:
name: DAG name
description: run a DAG
schedule: "0 * * * *"
group: DailyJobs
tags: example
env:
- LOG_DIR: ${HOME}/logs
- PATH: /usr/local/bin:${PATH}
logDir: ${LOG_DIR}
restartWaitSec: 60
histRetentionDays: 3
delaySec: 1
maxActiveRuns: 1
params: param1 param2
preconditions:
- condition: "`echo $2`"
expected: "param2"
mailOn:
failure: true
success: true
MaxCleanUpTimeSec: 300
handlerOn:
success:
command: "echo succeed"
failure:
command: "echo failed"
cancel:
command: "echo canceled"
exit:
command: "echo finished"
Each step can have its own set of configurations, including:
name
: The name of the step.description
: A brief description of the step.dir
: The working directory for the step.command
: The command and parameters to execute.stdout
: The file to which the standard output is written.output
: The variable to which the result is written.script
: The script to execute.signalOnStop
: The signal name (e.g.,SIGINT
) to be sent when the process is stopped.mailOn
: Whether to send an email notification when the step fails or succeeds.continueOn
: Whether to continue to the next step, regardless of whether the step failed or not or the preconditions are met or not.retryPolicy
: The retry policy for the step.repeatPolicy
: The repeat policy for the step.preconditions
: The conditions that must be met before a step can run.
Example:
steps:
- name: some task
description: some task
dir: ${HOME}/logs
command: bash
stdout: /tmp/outfile
ouptut: RESULT_VARIABLE
script: |
echo "any script"
signalOnStop: "SIGINT"
mailOn:
failure: true
success: true
continueOn:
failure: true
skipped: true
retryPolicy:
limit: 2
intervalSec: 5
repeatPolicy:
repeat: true
intervalSec: 60
preconditions:
- condition: "`echo $1`"
expected: "param1"