Run Commands
Read the GLOSSARY series >

Job

Learn about the concept of a Job, which is a unit of work that is created by a pipeline.

February 22, 2024

About #

A job is an execution of a pipeline triggered by new data detected in an input repository.

When a commit is made to the input repository of a pipeline, jobs are created for all downstream pipelines in a directed acyclic graph (DAG), but they do not run until the prior pipelines they depend on produce their output. Each job runs the user’s code against the current commit in a repository at a specified branch and then submits the results to the output repository of the pipeline as a single output commit.

Each job has a unique alphanumeric identifier (ID) that users can reference in the <pipeline>@<jobID> format. Jobs have the following states:

SateDescription
CREATEDAn input commit exists, but the job has not been started by a worker yet.
STARTINGThe worker has allocated resources for the job (that is, the job counts towards parallelism), but it is still waiting on the inputs to be ready.
UNRUNNABLEThe job could not be run, because one or more of its inputs is the result of a failed or unrunnable job. As a simple example, say that pipelines Y and Z both depend on the output from pipeline X. If pipeline X fails, both pipeline Y and Z will pass from STARTING to UNRUNNABLE to signify that they had to be cancelled because of upstream failures.
RUNNINGThe worker is processing datums.
EGRESSThe worker has completed all the datums and is uploading the output to the egress endpoint.
FINISHINGAfter all of the datum processing and egress (if any) is done, the job transitions to a finishing state where all of the post-processing tasks such as compaction are performed.
FAILUREThe worker encountered too many errors when processing a datum.
KILLEDThe job timed out, or a user called StopJob
SUCCESSNone of the bad stuff happened.