Skip to content

Setup

This section describes the steps that the platform team has to follow to deploy and operate the MLOps platform on their GCP infrastructure. This offering uses Rafay Environment Manager to provision and manage infrastructure in GCP.

The high level steps that the administrator has to follow to get this operational on their infrastructure are:

  1. Load System Template into Org
  2. Customize Template
  3. Deploy template

The sequence diagram below describes the high level steps visually.

sequenceDiagram 
    participant plat as Platform Team 
    participant rafay as Environment Manager 
    participant idp as Identity Provider 
    participant gcp as Google Cloud

    plat->>rafay: Load System Template
    plat->>rafay: Customize Template
    plat->>rafay: Deploy Environment Template
    rafay->>gcp: Provision Infrastructure
    rafay->>idp: Integrate MLOps<br> with Corporate IdP (OKTA)
    rafay-->>plat: Setup Complete 

Select and Share the GKE System Template

  • As an Org Admin, navigate to Settings > Template Catalog.

Template Catalog

  • Select the GCP category, where the Kubeflow on GCP template is listed.

Kubeflow Template

  • Click Get Started.

Get Started

  • Provide the following details:
  • A unique name for the shared template.
  • A version name (e.g., v1).
  • Select an existing project or create a new project to share the template with.
  • Click Continue.

Project Shared

  • The platform redirects you to the selected project (kubeflow-gcp).
  • Navigate to Agents and select an Agent required to drive the workflow. Note, it is recommend to use a newly deployed agent running the latest version

Agent EM

  • Save the template as a draft or set it as an Active Version. Learn more about version management here.

Once Complete, you will see the new environment card in your organization under Environments -> Environments

Environment Template

Input Variables

The following input variables can be configured within the template to customize the template before deployment.

Name Description Value Value Type Restricted Values
Ingress Domain Selecting Rafay will use a Rafay provided domain. Selecting Custom will allow the user to provide their own domain for hosting the Kubeflow UI endpoint. Rafay text Rafay, Custom
Kubeflow Host Name The Kubeflow hostname that will be used for the Kubeflow UI endpoint. This is only required when "Ingress Domain" is set to Custom.Domain text
Kubeflow Host Cert The host certificate for the Kubeflow URL domain that will be used. This is only required when "Ingress Domain" is set to Custom. text
Kubeflow Host Key The host certificate key for the Kubeflow URL domain that will be used. This is only required when "Ingress Domain" is set to Custom. text
Okta Client ID Client ID for Okta text
Okta Client Secret Client Secret for Okta text
Okta Domain Domain of Okta text
GCP Project The GCP Project this cluster and associated resources should be located in text
GCP Region The GCP region this cluster and associated resources should be located in us-west1 text
GCP SQL Username User will be created on SQL instance mlops-db text
GCP SQL User Password GCP SQL Password for the new user text
GCP Kubeflow Bucket Name Kubeflow Bucket resource to create kubeflow_bucket text
GCP MLflow Bucket Name MLflow Bucket resource to create mlflow_bucket text
GCP MLflow Service Account GCP MLFlow service account for workload identity gcp-mlflow-tracking-sa text
GCP Redis Instance Memory Size GCP Redis Instance Memory (GB) 1 text
GCP Redis Instance Tier GCP Redis Instance Tier BASIC text
GCP SQL Instance Name Name of GCP SQL Instance mlops-instance text
GCP SQL Instance Tier Tier of GCP SQL Instance db-f1-micro text
GCP SQL Root Password Root Password of the GCP SQL resource text
cluster_name Name of the Cluster where the installation will be performed text
GKE Network Name GKE Network Name default text
Istio SVC Type Istio Service Type LoadBalancer text LoadBalancer, ClusterIP, NodePort
Cert Manager Enabled Enable Cert Manager false text true, false
Enable Culling Cull Notebooks after a period of inactivity true text true, false
Cull Idle Time Time before Notebook Culling (minutes) 30 text
Kubeflow Static User Email Kubeflow Static User Email user@example.com text
Kubeflow Static User Password Kubeflow Static User Password user text
Kubeflow MySQL Port Kubeflow MySQL Port 3306 text
Manage Feast Redis Externally Flag to indicate if Redis is hosted externally in gke or locally in cluster false text true, false
Feast Redis Instance Name GCP Feast Redis Instance resource to create feast-online-store text
Feast Redis Port Feast Redis Port 6379 text
Feast MySQL Port Feast MySQL Port 3306 text
Pipeline External S3 Host External Pipeline S3 Host storage.googleapis.com text
Pipeline External S3 Region External Pipeline S3 Region auto text
Istio SVC LoadBalancer Type Istio Service LoadBalancer Type External text Internal, External