A job is an execution of a pipeline triggered by new data detected in an input repository.
When a commit is made to the input repository of a pipeline, jobs are created for all downstream pipelines in a directed acyclic graph (DAG), but they do not run until the prior pipelines they depend on produce their output. Each job runs the user’s code against the current commit in a repository at a specified branch and then submits the results to the output repository of the pipeline as a single output commit.
Each job has a unique alphanumeric identifier (ID) that users can reference in the
<pipeline>@<jobID> format. Jobs have the following states:
|An input commit exists, but the job has not been started by a worker yet.
|The worker has allocated resources for the job (that is, the job counts towards parallelism), but it is still waiting on the inputs to be ready.
|The job could not be run, because one or more of its inputs is the result of a failed or unrunnable job. As a simple example, say that pipelines Y and Z both depend on the output from pipeline X. If pipeline X fails, both pipeline Y and Z will pass from
UNRUNNABLE to signify that they had to be cancelled because of upstream failures.
|The worker is processing datums.
|The worker has completed all the datums and is uploading the output to the egress endpoint.
|After all of the datum processing and egress (if any) is done, the job transitions to a finishing state where all of the post-processing tasks such as compaction are performed.
|The worker encountered too many errors when processing a datum.
|The job timed out, or a user called StopJob
|None of the bad stuff happened.