First-Time Setup

HPE Machine Learning Data Management can be deployed in Kubernetes using a wide variety of container orchestrators, but to get you set up for the very first time, we recommend using Docker Desktop. This installation method is very fast and will provide you with everything you need to start the Beginner Tutorial.

Before You Start

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Manual Step Summary:

  1. Open a PowerShell terminal.
  2. Run each of the following:
dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart

dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
  1. Run each of the following:
wsl --update

wsl --set-default-version 2

wsl --install -d Ubuntu 
  1. Restart your machine.
  2. Start a WSL terminal and set up your first Ubuntu user.
  3. Update Ubuntu.
sudo apt update
sudo apt upgrade -y
  1. Install Homebrew in Ubuntu so you can complete the rest of this guide:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

All installation steps after 1. Install Docker Desktop must be run through the WSL terminal (Ubuntu) and not in PowerShell.

You are now ready to continue to Step 1.


1. Install Docker Desktop

  1. Install Docker Desktop for your machine.
  2. Navigate to Settings for Mac, Windows, or Linux.
    • Adjust your resources (~4 CPUs and ~12GB Memory)
    • Enable Kubernetes
    • On Windows, enable Docker Desktop integration in Ubuntu if Ubuntu is not your default Linux distro.
  3. Select Apply & Restart.

2. Install Pachctl CLI

brew tap pachyderm/tap && brew install pachyderm/tap/pachctl@2.10  

AMD

curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v2.10.0/pachctl_2.10.0_amd64.deb && sudo dpkg -i /tmp/pachctl.deb

ARM

curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v2.10.0/pachctl_2.10.0_arm64.deb && sudo dpkg -i /tmp/pachctl.deb

AMD

curl -L https://github.com/pachyderm/pachyderm/releases/download/v2.10.0/pachctl_2.10.0_linux_amd64.tar.gz | sudo tar -xzv --strip-components=1 -C /usr/local/bin

ARM

curl -L https://github.com/pachyderm/pachyderm/releases/download/v2.10.0/pachctl_2.10.0_linux_arm64.tar.gz | sudo  tar -xzv --strip-components=1 -C /usr/local/bin

3. Install & Configure Helm

  1. Install Helm:

    brew install helm
    curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
    curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
  2. Add the Pachyderm repo to Helm:

    helm repo add pachyderm https://helm.pachyderm.com  
    helm repo update  
  3. Install PachD:

    Tip
    Open your browser and check http://localhost before installing. If any other tools are using the same port as HPE Machine Learning Data Management, add the following argument to the below command: --set proxy.service.httpPort=8080
    helm install pachyderm pachyderm/pachyderm \
      --set deployTarget=LOCAL \
      --set proxy.enabled=true \
      --set proxy.service.type=LoadBalancer  \
      --set proxy.host=localhost 

    Are you using an Enterprise trial key? If so, you can set up Enterprise Pachyderm locally by storing your trial key in a license.txt file and passing it into the following Helm command:

    helm install pachyderm pachyderm/pachyderm \
      --set deployTarget=LOCAL \
      --set pachd.enterpriseLicenseKey="$(cat license.txt)" \
      --set proxy.enabled=true \
      --set proxy.service.type=LoadBalancer  \
      --set proxy.host=localhost \
      --set pachd.storage.backend=<YOUR_BACKEND> \
      --set pachd.storage.storageURL="s3://my-bucket" or "gs://my-bucket" or "azure://my-container"

    This unlocks Enterprise features but also requires user authentication to access Console. A mock user is created by default to get you started, with the username: admin and password: password.

    This may take several minutes to complete.

4. Verify Installation

  1. In a new terminal, run the following command to check the status of your pods:
    kubectl get pods
    NAME                                           READY   STATUS      RESTARTS   AGE
    pod/console-5b67678df6-s4d8c                   1/1     Running     0          2m8s
    pod/etcd-0                                     1/1     Running     0          2m8s
    pod/pachd-c5848b5c7-zwb8p                      1/1     Running     0          2m8s
    pod/pg-bouncer-7b855cb797-jqqpx                1/1     Running     0          2m8s
    pod/postgres-0                                 1/1     Running     0          2m8s
  2. Re-run this command after a few minutes if pachd is not ready.

5. Connect to Cluster

pachctl connect http://localhost:80
Warning
If you set the httpPort to a new value, such as 8080, use that value in the command. pachctl connect http://localhost:8080

Optionally open your browser and navigate to the Console UI.

Tip

You can check your Pachyderm version and connection to pachd at any time with the following command:

pachctl version
COMPONENT           VERSION  

pachctl             2.10.0  
pachd               2.10.0