Pachyderm
tl; dr; A combinator data component that installs Pachyderm, a data lineage and pipelining solution.
Introduction
Pachyderm is an open-source-driven solution that provides data lineage and pipelines. Data lineage is important for _provenance_; knowing the origin of downstream assets. In ML, the assets are often models and the provenance describes how the model became to be. Precise knowledge of what a model was trained upon is important for disaster recovery, auditing, and robustness.
Pipelines encode a process. This can be anything from automating pre-processing, to training and deploying models. Pachyderm's solution is unique beacuse it is backed by data lineage; i.e. data driven pipelines, not process driven ones.
Test Drive
The fastest way to get started is to use the test drive functionality provided by TestFaster. Click on the "Launch Test Drive" button below (opens a new window).
Quick Start Pachyderm Tutorial
Once the test drive has launched, click the two links to the left to get started with Pachyderm:
- Click the Jupyter link and launch the
demo.ipynb
notebook. - Click on the Dashboard link to launch the Pachyderm Enterprise Dashboard.
Usage
Prerequisites
Start by preparing your Kubernetes cluster using one of the infrastructure components or use your own cluster.
Component Usage
module "pachyderm" {
source = "combinator-ml/pachyderm/k8s"
# Optional settings go here
}
See the full configuration options below.
Requirements
Name | Version |
---|---|
helm | ~> 2.1.2 |
kubernetes | ~> 2.2.0 |
null | ~> 3.1.0 |
Providers
Name | Version |
---|---|
helm | ~> 2.1.2 |
kubernetes | ~> 2.2.0 |
Modules
No Modules.
Resources
Name |
---|
helm_release |
kubernetes_namespace |
Inputs
Name | Description | Type | Default | Required |
---|---|---|---|---|
namespace | (Optional) The namespace to install the release into. | string |
"pachyderm" |
no |
values | (Optional) List of values in raw yaml to pass to helm. See https://github.com/pachyderm/helmchart/blob/master/pachyderm/values.yaml. | list(string) |
[ |
no |
Outputs
Name | Description |
---|---|
namespace | Namespace is the kubernetes namespace of the release. |