Mount a Repo Locally
Learn how to mount a repository as a local filesystem using the pachctl mount command.
December 4, 2023
HPE ML Data Management uses FUSE to mount repositories as local filesystems. Because Apple has announced phasing out support for macOS kernel extensions, including FUSE, this functionality is no longer stable on macOS Catalina (10.15) or later.
HPE ML Data Management enables you to mount a repository
as a local filesystem on your computer by using the
pachctl mount
command. This command
uses the Filesystem in Userspace (FUSE) user interface to export a HPE ML Data Management
File System (PFS) to a Unix computer system.
This functionality is useful when you want to pull data locally to experiment,
review the results of a pipeline, or modify the files
in the input repository directly.
You can mount a HPE ML Data Management repo in one of the following modes:
Read-only
â you can read the mounted files to further experiment with them locally, but cannot modify them.Read-write
â you can read mounted files, modify their contents, and push them back into your centralized HPE ML Data Management input repositories.
Prerequisites #
You must have the following configured for this functionality to work:
Unix or Unix-like operating system, such as Ubuntu 16.04 or macOS Yosemite or later.
FUSE for your operating system installed:
On macOS, run:
brew install osxfuse
On Ubuntu, run:
sudo apt-get install -y fuse
For more information, see:
Mounting Repositories in Read-Only Mode #
By default, HPE ML Data Management mounts all repositories in read-only mode. You can access the files through your file browser or enable third-party applications access. Read-only access enables you to explore and experiment with the data, without modifying it. For example, you can mount your repo to a local computer and then open that directory in a Jupyter Notebook for exploration.
The pachctl mount
command allows you to mount not only the default
branch, typically a master
branch, but also other HPE ML Data Management
branches. By default, HPE ML Data Management mounts the master
branch. However,
if you add a branch to the name of the repo, the HEAD
of that branch
will be mounted.
Example:
pachctl mount images --repos images@staging
You can also mount a specific commit, but because commits
might be on multiple branches, modifying them might result in data deletion
in the HEAD
of the branches. Therefore, you can only mount commits in
read-only mode. If you want to write to a specific commit that is not
the HEAD
of a branch, you can create a new branch with that commit as HEAD
.
Mounting Repositories in Read-Write Mode #
Running the pachctl mount
command with the --write
flag grants you
write access to the mounted repositories, which means that you can
open the files for editing and put them back to the HPE ML Data Management
repository.
Your changes are saved to the HPE ML Data Management repository only after you interrupt the pachctl mount
with CTRL+C
or with pachctl unmount
, unmount /<path-to-mount>
, or fusermount -u /<path-to-mount>
.
For example, you have the OpenCV example pipeline
up and running. If you want to edit files in the images
repository, experiment with brightness and contrast
settings in liberty.png
, and finally have your edges
pipeline process those changes.
If you do not mount the images
repo, you would have to
first download the files to your computer, edit them,
and then put them back to the repository. The pachctl mount
command automates all these steps for you. You can mount just the
images
repo or all HPE ML Data Management repositories as directories
on you machine, edit as needed, and, when done,
exit the pachctl mount
command. Upon exiting the pachctl mount
command, HPE ML Data Management uploads all the changes to the corresponding
repository.
If someone else modifies the files while you are working on them
locally, their changes will likely be overwritten when you exit
pachctl mount
. This happens because Therefore, make sure that you do not work on the
same files while someone else is working on them.
Use writable mount ONLY when you have sole ownership over the mounted data. Otherwise, merge conflicts or unexpected data overwrites can occur.
Because output repositories are created by the HPE ML Data Management pipelines, they are immutable. Only a pipeline can change and update files in these repositories. If you try to change a file in an output repo, you will get an error message.
How to Mount/Unmount a HPE ML Data Management Repo #
To mount a HPE ML Data Management repo on a local computer, complete the following steps:
In a terminal, go to a directory in which you want to mount a HPE ML Data Management repo. It can be any new empty directory on your local computer. For example,
mydirectory
.Run
pachctl mount
for a repository and branch that you want to mount:pachctl mount <path-on-your-computer> [flags]
Example:
- If you want to mount all the repositories in your HPE ML Data Management cluster
to a
mydirectory
directory on your computer and giveWRITE
access to them, run:
pachctl mount mydirectory --write
- If you want to mount the master branch of the
images
repo and enable file editing in this repository, run:
pachctl mount mydirectory --repos images@master+w
To give read-only access, omit
+w
.System Response:
ro for images: &{Branch:master Write:true} ri: repo:<name:"montage" > created:<seconds:1591812554 nanos:348079652 > size_bytes:1345398 description:"Output repo for pipeline montage." branches:<repo:<name:"montage" > name:"master" > continue ri: repo:<name:"edges" > created:<seconds:1591812554 nanos:201592492 > size_bytes:136795 description:"Output repo for pipeline edges." branches:<repo:<name:"edges" > name:"master" > continue ri: repo:<name:"images" > created:<seconds:1591812554 nanos:28450609 > size_bytes:244068 branches:<repo:<name:"images" > name:"master" > MkdirAll /var/folders/jl/mm3wrxqd75l9r1_d0zktphdw0000gn/T/pfs201409498/images
The command runs in your terminal until you terminate it by pressing
CTRL+C
.- Tip
Mount multiple repos at once by appending each mount instruction to the same command.
For example, the following will mount both repos to the
/mydirectory
directory.
pachctl mount ./mydirectory -r first_repo@master -r second_repo@master
- If you want to mount all the repositories in your HPE ML Data Management cluster
to a
You can check that the repo was mounted by running the mount command in your terminal:
mount /dev/disk1s1 on / (apfs, local, read-only, journaled) devfs on /dev (devfs, local, nobrowse) /dev/disk1s2 on /System/Volumes/Data (apfs, local, journaled, nobrowse) /dev/disk1s5 on /private/var/vm (apfs, local, journaled, nobrowse) map auto_home on /System/Volumes/Data/home (autofs, automounted, nobrowse) pachctl@osxfuse0 on /Users/testuser/mydirectory (osxfuse, nodev, nosuid, synchronous, mounted by testuser)
Access your mountpoint.
For example, in macOS, open Finder, press
CMD + SHIFT + G
, and type the mountpoint location. If you have mounted the repo to~/mydirectory
, type~/mydirectory
.Edit the files as needed.
When ready, add your changes to the HPE ML Data Management repo by stopping the
pachctl mount
command withCTRL+C
or by runningpachctl unmount <mountpoint>
(orunmount /<path-to-mount>
, orfusermount -u /<path-to-mount>
).If you have mounted a writable HPE ML Data Management share, interrupting the
pachctl mount
command results in the upload of your changes to the corresponding repo and branch, which is equivalent to running thepachctl put file
command. You can check that HPE ML Data Management runs a new job for this work by listing current jobs withpachctl list job
.