Skip to content

System GKE Template

Overview

This system template allows you to configure, templatize, and provision a GKE cluster using GCP’s native IaC OpenTofu provider. The templates are designed to support both Day 0 (initial setup) and Day 2 (ongoing management and maintenance) operations.

The template will enable users to provision & manage the lifecycle of GKE, and the add-ons defined in cluster blueprints. As part of the template output, the end user is provided with a kubeconfig file that includes cluster-wide privileges and enables secure access to the cluster.

Intial Setup

The platform team is responsible for performing the initial configuration and setup of the MKS template. The sequence diagram below outlines the high-level steps. In this process, the platform team will configure and share the template from the system catalog to the project they manage and then share the template downstream with the end user.

sequenceDiagram
    participant Admin as Platform Admin
    participant Catalog as System Catalog
    participant Project as End User Project

    Admin->>Catalog: Selects MKS Template from System Catalog
    Admin->>Project: Shares Template with Predefined Controls
    Project-->>Admin: Template Available in End User's Project

End User Flow

The end user launches a shared template, provides required input values, and deploys the cluster.

sequenceDiagram
    participant User as End User
    participant Project as Rafay Project
    participant Cluster as GCP Infra

    User->>Project: Launches Shared Template for GKE
    User->>Project: Provides Required Input Values (API Key, Secret, GCP Service Account Details)
    User->>Project: Clicks "Deploy"
    Project->>Cluster: Provisions a GKE Cluster on GCP Infra
    Cluster-->>User: Cluster Deployed Successfully

The templates are designed to support both:

  • Day 0 operations: Initial setup
  • Day 2 operations: Ongoing management

Resources:

This system template will deploy the following resources:

  • GKE Cluster on the GCP Infrastructure.

Pre-Requisites

  1. GCP Credentials:

    • Ensure necessary permissions to create and manage GCP resources.
    • Refer to the required IAM roles listed here.
    • Service Account JSON credentials (gcp-credentials.json) must be provided at the time of launching the template.
  2. Rafay Configuration:

    At template launch, supply the following configuration values:

    • API_KEY: The API key for the Rafay controller.
    • API_SECRET: The API secret for the Rafay controller.
    • gcp-credentials.json: The GCP service account JSON file.

    Configuration

  3. Agent Configuration:

    An agent must be configured in the project where the template will be used.Follow these instructions to deploy an agent. Existing agents can also be reused.


Input Variables for GKE System Template

Name Default Value Description
Logging components* [] List of services to monitor: SYSTEM_COMPONENTS, APISERVER, CONTROLLER_MANAGER, SCHEDULER, and WORKLOADS. Empty list disables logging.
Enable Cloud TPU* false Enable Cloud TPU resources in the cluster. WARNING: changing this after cluster creation is destructive!
GCP Project ID of shared VPC's host The project ID of the shared VPC's host (for shared VPC support).
Default max pods per node* 110 The maximum number of pods to schedule per node.
Datapath provider* DATAPATH_PROVIDER_UNSPECIFIED Allowed: [DATAPATH_PROVIDER_UNSPECIFIED, LEGACY_DATAPATH, ADVANCED_DATAPATH]. Sets the desired datapath provider for this cluster.
Node pools* [ { name="default-node-pool", ... } ] List of maps containing node pool configurations (e.g., machine type, disk size, autoscaling settings).
Use control plane's external IP* true When false, the cluster's private endpoint is used, and access through the public endpoint is disabled.
Disable default SNAT* false Whether to disable the default SNAT for private use of public IP addresses.
Enable legacy authorization* false Enable the ABAC authorizer. Provides static permissions beyond those in RBAC or IAM. Defaults to false.
Services secondary range name null The name of the secondary subnet IP range for services.
Enable network policy addon* false Enable the network policy addon.
Enable Dataplane V2 metrics* false Whether advanced datapath metrics are enabled.
Issue a client certificate* false Issues a client certificate for authentication. Changing this after cluster creation is destructive.
Network name* default The VPC network to host the cluster.
Services IP address range* null IP address range for services IPs. Can be blank, netmask (e.g., /14), or CIDR (e.g., 10.96.0.0/14).
Enable private cluster* false Whether to enable private cluster network access.
Enable binary authorization* false Enable Binary Authorization Admission controller.
Network policy provider* CALICO Allowed: [PROVIDER_UNSPECIFIED, CALICO]. Sets the network policy provider for the cluster.
Authorized networks* [] List of master authorized networks in CIDR format.
Enable Backup for GKE* false Enable Backup for GKE agent in the cluster.
Gateway API Channel* CHANNEL_DISABLED Allowed: [CHANNEL_DISABLED, CHANNEL_STANDARD]. Configures the gateway API channel for the cluster.
Private Endpoint Subnetwork* null Subnetwork for master's private endpoint.
Pods IP address range (route-based) null IP address range for Kubernetes pods. Defaults to an automatically assigned CIDR.
Enable Compute Engine Persistent Disk CSI Driver* true Whether to enable the Google Compute Engine Persistent Disk Container Storage Interface (CSI) Driver.
Blueprint Version* latest Version of the blueprint assigned to the cluster. Use latest for system blueprints.
Enable Maintenance window* false Whether to enable maintenance windows.
Kubernetes version* 1.30 Allowed: [1.28, 1.29, 1.30, 1.31]. Sets the Kubernetes version for the cluster.
Pods secondary range name null Name of the secondary subnet IP range for pods.
Maintenance exclusions [] List of maintenance exclusions with start/end times and exclusion scopes.
Enable Intranode visibility* false Enable intra-node visibility for VPC network traffic.
Ray Operator Config* { enabled=false, ... } Configuration for the Ray Operator Addon.
Regional cluster* true Whether the cluster is regional. Setting to false creates a zonal cluster.
Region* us-central1 The region to host the cluster.
Rafay project name* $(environment.project.name)$ Name of the Rafay project. Defaults to the environment name.
Cluster description* Rafay managed cluster Description of the cluster.
Cloud monitoring components* [] List of services to monitor in GCP.
Google Groups for RBAC* null Name of the RBAC security group for use with Google security groups in Kubernetes RBAC.
Cluster Name* $(environment.name)$ Name of the cluster. Defaults to the environment name.
Subnetwork name* default The subnetwork to host the cluster.
GCP Project ID* dev-382813 The project ID to host the cluster (required).
Enable Secret manager* false Enable the Secret Manager add-on for this cluster.
Maintenance Recurrence* FREQ=WEEKLY;BYDAY=MO,... Recurrence frequency for maintenance windows in RFC5545 format.
Firewall Rules* [ { name="my-custom-rule", ... } ] Firewall rules to be created for clusters.
Maintenance start time* 05:00 Start time for daily or recurring maintenance operations (RFC3339 format).
Rest Endpoint* <UPDATE HERE> Select the endpoint of the controller.
API Key* Enter the API key of the controller.
API Secret* Enter the API secret of the controller.
gcp-credentials.json* Provide the cloud credentials for creating the GKE cluster.

Launch Time

The estimated time to launch an GKE cluster using this template is approximately 15 to 20 minutes.