Beginner Tutorial

First-Time Setup

Connect to Existing Instance

Language Clients

Target Audience

Intro to Data Versioning

Intro to Pipelines

Developer Workflow

CI/CD Integration

👉 Create a Machine Learning Workflow

The Push Images Flag

Working with Pipelines

High-Level Architecture Diagram

Azure Deployment

Set Up AWS Secret Manager

Pachctl Auto-completion

Unified Deployment

Authentication & IdP Connectors

Authorization (RBAC)

Access Control (RBAC) Roles & Permissions

Manage RBAC via Console

Add Roles to User via PachCTL

Add Roles to Group via PachCTL

Environment Variables

Kubernetes RBAC

Import a Kubernetes Context

Log Aggregation (Loki)

Non-Default Namespaces

Enterprise Edition

Activate Enterprise via Helm

Activate Enterprise via PachCTL

Features Overview

Enterprise Server (ES)

Activate ES for Multi-Cluster

Activate ES for Single-Cluster

Register a Cluster via Helm

Register a Cluster via PachCTL

Server Management

TLS (SSL, HTTPS)

Tracing (Jaeger)

Blob/Object Storage

Defer Processing via Staging Branch

Skip Failed Datums

Time-Windowed Data

Build Pipelines & DAGs

Create a Pipeline

Delete a Pipeline

Draw a Pipeline

Inspect a Pipeline

Jsonnet Pipeline Specifications

Update a Pipeline

View Pipeline Jobs & Runtimes

Create a Project

Set a Project as Current

Add a Project Resource

Grant Project Access

Delete a Project

Process Specific Commits

Set Branch Triggers

Set Output Branch

List Global Commits & Jobs

List Global ID Sub Commits

Track Downstream

Delete Branch Head

Squash Non-Head Commits

Delete File From History

Standard ML Pipeline

AutoML Pipeline

Multi-Pipeline DAG

Data Parallelism Pipeline

Task Parallelism Pipeline

Docker Image + User Code

Egress To An SQL Database

Export via Egress

Export via PachCTL

Mount a Repo Locally

S3 Gateway Operations

Create S3 Bucket

Delete an S3 Object

Delete Empty S3 Bucket

Get an S3 Object

List S3 Buckets

List S3 Objects

Write an S3 Object

Google BigQuery

Docker Installation Guide

Run in Determined

Local Installation Guide

Troubleshooting

VS Code Auto-Completion

Weights and Biases

pachctl auth activate

pachctl auth check

pachctl auth check project

pachctl auth check repo

pachctl auth deactivate

pachctl auth get

pachctl auth get cluster

pachctl auth get enterprise

pachctl auth get project

pachctl auth get repo

pachctl auth get-config

pachctl auth get-groups

pachctl auth get-robot-token

pachctl auth login

pachctl auth logout

pachctl auth revoke

pachctl auth roles-for-permission

pachctl auth rotate-root-token

pachctl auth set

pachctl auth set cluster

pachctl auth set enterprise

pachctl auth set project

pachctl auth set repo

pachctl auth set-config

pachctl auth use-auth-token

pachctl auth whoami

pachctl buildinfo

pachctl check status

pachctl completion

pachctl completion bash

pachctl completion zsh

pachctl config delete

pachctl config delete context

pachctl config get

pachctl config get active-context

pachctl config get active-enterprise-context

pachctl config get context

pachctl config get metrics

pachctl config import-kube

pachctl config list

pachctl config list context

pachctl config set

pachctl config set active-context

pachctl config set active-enterprise-context

pachctl config set context

pachctl config set metrics

pachctl config update

pachctl config update context

pachctl connect

pachctl copy file

pachctl create branch

pachctl create defaults

pachctl create pipeline

pachctl create project

pachctl create repo

pachctl create secret

pachctl debug analyze

pachctl debug binary

pachctl debug dump

pachctl debug local

pachctl debug log-level

pachctl debug profile

pachctl debug template

pachctl delete all

pachctl delete branch

pachctl delete commit

pachctl delete commitV2

pachctl delete defaults

pachctl delete file

pachctl delete job

pachctl delete pipeline

pachctl delete project

pachctl delete repo

pachctl delete secret

pachctl delete transaction

pachctl diff file

pachctl draw pipeline

pachctl edit pipeline

pachctl enterprise

pachctl enterprise deactivate

pachctl enterprise get-state

pachctl enterprise heartbeat

pachctl enterprise pause

pachctl enterprise pause-status

pachctl enterprise register

pachctl enterprise sync-contexts

pachctl enterprise unpause

pachctl find commit

pachctl finish commit

pachctl finish transaction

pachctl get file

pachctl glob file

pachctl idp create-client

pachctl idp create-connector

pachctl idp delete-client

pachctl idp delete-connector

pachctl idp get-client

pachctl idp get-config

pachctl idp get-connector

pachctl idp list-client

pachctl idp list-connector

pachctl idp set-config

pachctl idp update-client

pachctl idp update-connector

pachctl inspect

pachctl inspect branch

pachctl inspect cluster

pachctl inspect commit

pachctl inspect datum

pachctl inspect defaults

pachctl inspect file

pachctl inspect job

pachctl inspect pipeline

pachctl inspect project

pachctl inspect repo

pachctl inspect secret

pachctl inspect transaction

pachctl kube-events

pachctl license

pachctl license activate

pachctl license add-cluster

pachctl license delete-all

pachctl license delete-cluster

pachctl license get-state

pachctl license list-clusters

pachctl license update-cluster

pachctl list branch

pachctl list commit

pachctl list datum

pachctl list file

pachctl list job

pachctl list pipeline

pachctl list project

pachctl list repo

pachctl list secret

pachctl list transaction

pachctl port-forward

pachctl put file

pachctl rerun pipeline

pachctl restart

pachctl restart datum

pachctl resume transaction

pachctl run cron

pachctl run pfs-load-test

pachctl run pps-load-test

pachctl squash commit

pachctl squash commitV2

pachctl start commit

pachctl start pipeline

pachctl start transaction

pachctl stop job

pachctl stop pipeline

pachctl stop transaction

pachctl subscribe

pachctl subscribe commit

pachctl unmount

pachctl update defaults

pachctl update pipeline

pachctl update project

pachctl update repo

pachctl validate

pachctl validate pipeline

pachctl version

pachctl wait commit

pachctl wait job

Debug Pipelines

Troubleshooting Deployments

View Audit Logs

View Kubernetes Logs

Client Initialization (Start Here)

Breast Cancer Detection

Distributed Image Processing

Spout Pipelines

Coding Conventions

Contributor Setup

Developing on Windows with VSCode

Documentation Style Guide

GitHub

Create a Machine Learning Workflow

Learn how to integrate into your Machine Learning workflows.

April 4, 2024

Because HPE ML Data Management is a language and framework agnostic and platform, and because it easily distributes analysis over large data sets, data scientists can use any tooling for creating machine learning workflows. Even if that tooling is not familiar to the rest of an engineering organization, data scientists can autonomously develop and deploy scalable solutions by using containers. Moreover, HPE ML Data Management’s pipeline logic paired with data versioning make any results reproducible for debugging purposes or during the development of improvements to a model.

For maximum leverage of HPE ML Data Management’s built functionality, HPE ML Data Management recommends that you combine model training processes, persisted models, and model utilization processes, such as making inferences or generating results, into a single HPE ML Data Management pipeline Directed Acyclic Graph (DAG).

Such a pipeline enables you to achieve the following goals:

Keep a rigorous historical record of which models were used on what data to produce which results.
Automatically update online ML models when training data or parameterization changes.
Easily revert to other versions of an ML model when a new model does not produce an expected result or when bad data is introduced into a training data set.

The following diagram demonstrates an ML pipeline:

Example of a machine learning workflow

You can update the training dataset at any time to automatically train a new persisted model. Also, you can use any language or framework, including Apache Spark™, Tensorflow™, scikit-learn™, or other, and output any format of persisted model, such as pickle, XML, POJO, or other. Regardless of the framework, HPE ML Data Management versions the model so that you can track the data that was used to train each model.

HPE ML Data Management processes new data coming into the input repository with the updated model. Also, you can recompute old predictions with the updated model, or test new models on previously input and versioned data. This feature enables you to avoid manual updates to historical results or swapping ML models in production.

For examples of ML workflows in HPE ML Data Management see Machine Learning Examples.

👈 CI/CD Integration The Push Images Flag 👉