Skip to content

Overview

Kubeflow Pipelines (aka KFP) provides a platform for building and deploying portable and scalable machine-learning workflows on Kubernetes. The pipeline is a set of automated steps that defines how data should be processed and transformed and how ML models are trained and deployed. It is one of the most popular components of the Kubeflow platform.

A pipeline is a definition of a workflow that composes one or more components together to form a computational directed acyclic graph (DAG). At runtime, each component execution corresponds to a single container execution, which may create ML artifacts.


Capabilities

Some of the key capabilities of KFP are described below.

Automation

KFP automates the process of building and deploying ML models, making it easier to develop and maintain high-quality models over time. This can save time and effort and reduce the risk of errors.

Reproducibility

KFP is defined using a version-controlled specification (i.e. pipeline as code), making it easy to reproduce and debug. The workflows are defined as Python functions decorated with Kubeflow’s domain-specific language (DSL). This is important for ensuring the reliability and reproducibility of ML models.

Collaboration

KFP support multiple users and allow you to track and manage the progress of your ML workflows. This can make it easier to collaborate with other team members and stakeholders.

Scalability

KFP can be run on large-scale compute resources, allowing you to scale your ML workflows to handle large amounts of data and compute resources. KFP compute-intensive jobs as Kubernetes pods and jobs, leveraging the power of Kubernetes for distribution and scaling.

Portability

KFP pipelines can be deployed on any Kubernetes cluster, making it easy to move your ML workflows between different environments.

Note

Users may wish to review this Get Started Guide that leverages and emphasizes all the capabilities listed above.


Typical Steps

Author Pipeline

Users author components and pipelines using the KFP Python SDK.

Compile Pipeline

Compile the pipelines to an intermediate representation YAML

Execute Pipeline

Submit the pipeline to run on a KFP-conformant deployment


Interfaces

Users can interact and visualize with their KFP Pipelines via three interfaces:

Web Console

Users can click on "Pipelines" in their Kubeflow web console. Users can perform the following tasks from the web console:

  • Upload a pipeline
  • Create an experiment to group one or more of your pipeline runs.
  • Create and start a run within the experiment.
  • Explore the configuration, graph, and output of your pipeline run.
  • Compare the results of one or more runs within an experiment.
  • Schedule runs by creating a recurring run.

Python SDK

The KFP SDK provides a set of Python packages that users can use to specify and run your ML workflows. The current version of the SDK is v2.

Note

Users may wish to review this Get Started Guide to understand how the KFP SDK is used to define the pipeline.

REST API

The API is primarily meant to be used for integration with CI/CD systems. For example, the automation may require triggering the KFP pipeline when new data comes in.