Configuring Jobs

Jobs are configured by files in the jobs directory. The jobs directory may contain subdirectories, to organize jobs. Each job file uses the .yaml file suffix, and specifies one job, with a unique job ID, in YAML format.

The Apsis config file specifies the location of the jobs directory; see [[config]].

A job config contains these top-level keys:

  • params (optional)

  • program, which specifies what to run

  • schedule, which specifies when to schedule runs

  • metadata (optional), additional information not interpreted by Apsis

  • conditions (optional) that must be met for a run

  • actions (optional) to take when a run changes state

For durations in seconds, you may also use durations like 30s, 10 min (600 seconds), 1.5h (5400 seconds), and 1 day (86400 seconds).

Job ID

The job’s ID is given by the path under the jobs directory, with the .yaml suffix removed. For example, if the jobs directory is /path/to/jobs, the job file /path/to/jobs/data/pipeline/start.yaml has the job ID data/pipeline/start.

Params

The params key in the job config takes a list of parameter names. For example,

params:
  - date
  - message

or equivalently,

params: ["date", "message"]

If params is omitted, the job has no parameters.

Parameters aren’t required; a job without parameters can be run repeatedly, just like a cron job.

Program

The program key program describes how a run executes. Apsis provides several types of programs, and you may extend Apsis with additional program types as well.

See Configurating Programs for more information.

Schedule

The schedule key specifies when new runs are created and for when they are scheduled.

A job may have a single schedule, given as a dict, or multiple schedules, as a list of dicts.

# Single schedule
schedule:
    type: interval
    interval: 3600

# Two schedules
schedule:
  - type: interval
    interval: 3600
  - type: daily
    tz: UTC
    daytime: 12:00:00

See Configuring Schedules for more information.

Metadata

A job can store arbitrary metadata, such as descriptive text, tags, and operator instructions. The metadata key accepts arbitrary subkeys. None affect how runs of the job are executed.

Apsis does understand certain metadata keys. The description key contains descriptive Markdown text shown in the UI.

metadata:
    description: |
        Daily cleanup job.

        Removes temporary files that have been created within the last 24
        hours.

The labels key is an array of string labels, also shown in the UI.

metadata:
    labels:
        - test
        - blue-team

Any other metadata keys are preserved but ignored by Apsis.

Conditions

A condition temporarily prevents a scheduled run from starting. While waiting for a condition, the run is in the waiting state. Multiple conditions may apply to a run; it is waiting until all are satisfied.

Max running jobs

The max_running condition causes a run to wait as long as there are too many other running runs with the same job ID and arguments. For max_running: 1, there may be only one such running job.

condition:
    type: max_running
    count: 1

Dependencies

The dependency condition causes a run to wait until another run exists in a given state. Specify the job ID of the dependency, and any arguments.

condition:
    type: dependency
    job_id: "previous job"
    args:
        label: foobar

The arguments are template-expanded. If the dependency job shares a param with the dependent job, it may be omitted; the same arg is used.

By default, the dependency causes the run to wait until a matching success run arises. You can specify another target state or set of states:

condition:
    type: dependency
    job_id: "previous job"
    args:
        label: foobar
    states: ["success", "failure"]

This condition does not actually create the dependecy run. You must create that run elsewhere, usually by scheduling it. If the run doesn’t exist at all, the dependency condition will wait until waiting.max_time elapses, and then transition the run to error.

To check that a corresponding dependency run exists at all, using set exist to true. With this, the condition transitions the run to error _immediately_ if the run does not exist, or if it has completed unsuccessfully. If a run exists that may still transition to success, the condition waits as usual.

condition:
    type: dependency
    job_id: "previous job"
    args:
        label: foobar
    exist: true

Instead of true, you may provide a set of states in which the run must exist. The default is state from which one of the target states is reachable.

Skipping Duplicates

The skip_duplicate condition causes a run to transition to the skipped state if there is another run with the same job ID and arguments that is either waiting or running.

condition:
    type: skip_duplicate

By default, Apsis looks for other runs in the waiting, starting, or running states to determine whether to skip this run. You can override this with check_states. You can also specify a different (finished) state to transition to. For example, to transition a run to error if there is already another run in either of the failure or error states:

condition:
  type: skip_duplicate
  check_states: [failure, error]
  target_state: error

As with other conditions, this condition is applied only when a run is in the waiting state.

Actions

FIXME: Write this.

Binding

Apsis creates specific runs for a job, according to the job’s schedule. When Apsis creates a run, it binds the run’s arguments in the program and conditions. Each string-valued config field is expanded as a jinja2 template. The run’s args are available as substitution variables.

For example, consider this job config:

params:
- color
- fruit

program:
    type: shell
    command: "echo The color of {{ fruit }} is {{ color }}."

When Apsis creates a run with color: red and fruit: apple, it expands the program to,

program:
    type: shell
    command: "echo The color of apple is red."

The contents of a {{ … }} expansion is evaluated as a jinja2 expression. The following additional Ora types and functions are available:

These functions and types allow you to perform time computations on program and condition dates and times. For example, this job has a dependency on another job load data. Each run of this job is labeled with a date, and depends on a load data run with the previous date, according to the workdays calendar.

params: [region, date]

...

condition:
    type: dependency
    job_id: load data
    args:
        date: {{ get_calendar('workdays').before(date) }}

Keep in mind that Apsis run arguments are always strings, so Apsis converts the result using str.