Skip to content


tl; dr; A combinator data component that installs Pachyderm, a data lineage and pipelining solution.


Pachyderm is an open-source-driven solution that provides data lineage and pipelines. Data lineage is important for _provenance_; knowing the origin of downstream assets. In ML, the assets are often models and the provenance describes how the model became to be. Precise knowledge of what a model was trained upon is important for disaster recovery, auditing, and robustness.

Pipelines encode a process. This can be anything from automating pre-processing, to training and deploying models. Pachyderm's solution is unique beacuse it is backed by data lineage; i.e. data driven pipelines, not process driven ones.

Test Drive

The fastest way to get started is to use the test drive functionality provided by TestFaster. Click on the "Launch Test Drive" button below (opens a new window).

💻 Launch Test Drive 💻

Quick Start Pachyderm Tutorial

Once the test drive has launched, click the two links to the left to get started with Pachyderm:

  1. Click the Jupyter link and launch the demo.ipynb notebook.
  2. Click on the Dashboard link to launch the Pachyderm Enterprise Dashboard.



Start by preparing your Kubernetes cluster using one of the infrastructure components or use your own cluster.

Component Usage

module "pachyderm" {
  source  = "combinator-ml/pachyderm/k8s"
  # Optional settings go here

See the full configuration options below.


Name Version
helm ~> 2.1.2
kubernetes ~> 2.2.0
null ~> 3.1.0


Name Version
helm ~> 2.1.2
kubernetes ~> 2.2.0


No Modules.




Name Description Type Default Required
namespace (Optional) The namespace to install the release into. string "pachyderm" no
values (Optional) List of values in raw yaml to pass to helm. See list(string)
"tls:\n certName: null # Disable TLS\n create: null # Disable TLS\npachd:\n logLevel: debug\n storage:\n backend: LOCAL\n"


Name Description
namespace Namespace is the kubernetes namespace of the release.