Setup
This section describes the steps that the platform team has to follow to deploy and operate the MLOps platform on their GCP infrastructure. This offering uses Rafay Environment Manager to provision and manage infrastructure in GCP.
The high level steps that the administrator has to follow to get this operational on their infrastructure are:
- Load System Template into Org
- Customize Template
- Deploy template
The sequence diagram below describes the high level steps visually.
sequenceDiagram
participant plat as Platform Team
participant rafay as Environment Manager
participant idp as Identity Provider
participant gcp as Google Cloud
plat->>rafay: Load System Template
plat->>rafay: Customize Template
plat->>rafay: Deploy Environment Template
rafay->>gcp: Provision Infrastructure
rafay->>idp: Integrate MLOps<br> with Corporate IdP (OKTA)
rafay-->>plat: Setup Complete
Select and Share the GKE System Template
- As an Org Admin, navigate to
Settings > Template Catalog
.
- Select the GCP category, where the Kubeflow on GCP template is listed.
- Click Get Started.
- Provide the following details:
- A unique name for the shared template.
- A version name (e.g., v1).
- Select an existing project or create a new project to share the template with.
- Click Continue.
- The platform redirects you to the selected project (
kubeflow-gcp
). - Navigate to Agents and select an Agent required to drive the workflow. Note, it is recommend to use a newly deployed agent running the latest version
- Save the template as a draft or set it as an Active Version. Learn more about version management here.
Once Complete, you will see the new environment card in your organization under Environments -> Environments
¶
Input Variables¶
The following input variables can be configured within the template to customize the template before deployment.
Name | Description | Value | Value Type | Restricted Values |
---|---|---|---|---|
Ingress Domain | Selecting Rafay will use a Rafay provided domain. Selecting Custom will allow the user to provide their own domain for hosting the Kubeflow UI endpoint. | Rafay | text | Rafay, Custom |
Kubeflow Host Name | The Kubeflow hostname that will be used for the Kubeflow UI endpoint. This is only required when "Ingress Domain" is set to Custom.Domain | text | ||
Kubeflow Host Cert | The host certificate for the Kubeflow URL domain that will be used. This is only required when "Ingress Domain" is set to Custom. | text | ||
Kubeflow Host Key | The host certificate key for the Kubeflow URL domain that will be used. This is only required when "Ingress Domain" is set to Custom. | text | ||
Okta Client ID | Client ID for Okta | text | ||
Okta Client Secret | Client Secret for Okta | text | ||
Okta Domain | Domain of Okta | text | ||
GCP Project | The GCP Project this cluster and associated resources should be located in | text | ||
GCP Region | The GCP region this cluster and associated resources should be located in | us-west1 | text | |
GCP SQL Username | User will be created on SQL instance | mlops-db | text | |
GCP SQL User Password | GCP SQL Password for the new user | text | ||
GCP Kubeflow Bucket Name | Kubeflow Bucket resource to create | kubeflow_bucket | text | |
GCP MLflow Bucket Name | MLflow Bucket resource to create | mlflow_bucket | text | |
GCP MLflow Service Account | GCP MLFlow service account for workload identity | gcp-mlflow-tracking-sa | text | |
GCP Redis Instance Memory Size | GCP Redis Instance Memory (GB) | 1 | text | |
GCP Redis Instance Tier | GCP Redis Instance Tier | BASIC | text | |
GCP SQL Instance Name | Name of GCP SQL Instance | mlops-instance | text | |
GCP SQL Instance Tier | Tier of GCP SQL Instance | db-f1-micro | text | |
GCP SQL Root Password | Root Password of the GCP SQL resource | text | ||
cluster_name | Name of the Cluster where the installation will be performed | text | ||
GKE Network Name | GKE Network Name | default | text | |
Istio SVC Type | Istio Service Type | LoadBalancer | text | LoadBalancer, ClusterIP, NodePort |
Cert Manager Enabled | Enable Cert Manager | false | text | true, false |
Enable Culling | Cull Notebooks after a period of inactivity | true | text | true, false |
Cull Idle Time | Time before Notebook Culling (minutes) | 30 | text | |
Kubeflow Static User Email | Kubeflow Static User Email | user@example.com | text | |
Kubeflow Static User Password | Kubeflow Static User Password | user | text | |
Kubeflow MySQL Port | Kubeflow MySQL Port | 3306 | text | |
Manage Feast Redis Externally | Flag to indicate if Redis is hosted externally in gke or locally in cluster | false | text | true, false |
Feast Redis Instance Name | GCP Feast Redis Instance resource to create | feast-online-store | text | |
Feast Redis Port | Feast Redis Port | 6379 | text | |
Feast MySQL Port | Feast MySQL Port | 3306 | text | |
Pipeline External S3 Host | External Pipeline S3 Host | storage.googleapis.com | text | |
Pipeline External S3 Region | External Pipeline S3 Region | auto | text | |
Istio SVC LoadBalancer Type | Istio Service LoadBalancer Type | External | text | Internal, External |