Skip to content

Design

Rafay's MLOps platform is an "enterprise-ready" MLOps platform based on Kubeflow tightly integrated with a number of related add-ons and community software. It is heavily optimized for specific infrastructure types/providers resulting in an extremely streamlined deployment and ongoing lifecycle management.

The architecture diagram represents Rafay's MLOps Platform that is designed and optimized to be operated on Google Cloud Platform.

Architecture


Infrastructure

At the core of the architecture is Google Kubernetes Engine (GKE) operating in a GCP Project. Data is persisted and accessed through Google Cloud Storage (GCS), while relational data is managed via Google Cloud SQL, and Redis is deployed for online feature storage.

Within the GKE cluster, Istio is used for managing the service mesh and traffic control. Feast, a feature store, is integrated into the platform for managing and serving machine learning features.

User Components

The platform is designed to be used by data scientists, researchers, and MLOps engineers. They are provided access to Kubeflow, a machine learning toolkit, along with its associated components such as Jupyter notebooks for interactive computing, Kubeflow Pipelines for orchestrating machine learning workflows, and MLflow for tracking experiments and model lifecycle management.

Authentication

The platform ensures secure access and user authentication through a corporate Identity Provider such as Okta. This integration is performed using Dex for managing authentication within the Kubernetes environment.


High Level Steps

The image below describes the simple 3-steps that an administrator has to follow to

High Level Steps

Please watch this video if you would like to watch the end-to-end experience for an administrator to configure and deploy this in their Rafay Org.