Run Commands

Create a Pipeline

Learn how to create a pipeline using the pachctl create command.

November 29, 2023

To create a pipeline, you need to define a pipeline specification in YAML, JSON, or Jsonnet.

Before You Start #

A basic pipeline must have all of the following:

How to Create a Pipeline #

Via Local File #

  1. Define a pipeline specification in YAML, JSON, or Jsonnet.

  2. Pass the pipeline configuration to HPE ML Data Management:

    pachctl create pipeline -f <pipeline_spec>

Via URL #

  1. Find a pipeline specification hosted in a public or internal repository.
  2. Pass the pipeline configuration to HPE ML Data Management:
pachctl create pipeline -f https://raw.githubusercontent.com/pachyderm/pachyderm/2.8.x/examples/opencv/edges.json

Via Jsonnet #

Jsonnet Pipeline specs let you create pipelines while passing a set of parameters dynamically, allowing you to reuse the baseline of a given pipeline while changing the values of chosen fields. You can, for example, create multiple pipelines out of the same jsonnet pipeline spec file while pointing each of them at different input repositories, parameterize a command line in the transform field of your pipelines, or dynamically pass various docker images to train different models on the same dataset.

For illustration purposes, in the following example, we are creating a pipeline named edges-1 and pointing its input repository at the repo ‘images’:

pachctl create pipeline --jsonnet jsonnet/edges.jsonnet --arg suffix=1 --arg src=images
📖

You can define multiple pipeline specifications in one file by separating the specs with the following separator: ---. This works in both JSON and YAML files.

Examples #

JSON #

{
  "pipeline": {
    "name": "edges"
  },
  "description": "A pipeline that performs image edge detection by using the OpenCV library.",
  "transform": {
    "cmd": [ "python3", "/edges.py" ],
    "image": "pachyderm/opencv"
  },
  "input": {
    "pfs": {
      "repo": "images",
      "glob": "/*"
    }
  }
}

YAML #

pipeline:
  name: edges
description: A pipeline that performs image edge detection by using the OpenCV library.
transform:
  cmd:
  - python3
  - "/edges.py"
  image: pachyderm/opencv
input:
  pfs:
    repo: images
    glob: "/*"

Considerations #