Skip to content

Dec

Upcoming Release

Stay tuned for updates on our upcoming release, featuring new features, enhancements, and bug fixes.
For more details, check out the upcoming release notes here.

v2.12 Update 1 - SaaS

26 Dec, 2024

The section below provides a brief description of the new functionality and enhancements in this release.


GPU Platform as a Service (GPU PaaS)

The primary objective of Rafay's GPU PaaS offering is to empower GPU Cloud providers and enterprise customers to deliver GPU self-service capabilities to their end users. To facilitate this, two additional portals are available:

  • PaaS Studio: Designed for administrators to create and manage pre-defined configurations (profiles), such as GPU compute, that can be made accessible to end users

  • Developer Hub: Intended for end users to log in and launch instances based on the pre-defined profiles for their consumption

Note

Limited Access: This capability is selectively enabled for customer organizations. Please contact support if you would like this feature to be enabled for your organization.

These additional portals are, by default, accessible only to Organization Admins. Non-Org Admin users must be explicitly assigned a role to gain access to these portals. Additional roles have been introduced and must be granted explicitly for non-Org Admin users to access these portals.

For more information on this feature , please refer here

Catalog for System Templates

System Catalog

With this, the platform will offer a catalog of pre-built system templates designed to simplify and enhance the user experience for administrators. These templates are fully supported by Rafay, with regular updates and new features added over time.

With the system templates, Rafay Admins have to follow two simple steps to provide a self service experience for their end users:

  1. Configure, customize the system template (i.e. provide credentials, specify defaults and determine what values end users can/cannot override) in a project owned by the Platform team
  2. Publish by sharing the template with end user projects

Info

Administrators can also use the published system template with instance and service profiles in the newly introduced PaaS to provide end users with a self service experience.

The system catalog will significantly improves the process of consuming capabilities available in the platform.

Key Features:

  • Shareable Templates: Organization Administrators can share the catalog templates with specific projects, enabling the creation of environment instances based on them

  • Customizable: Users can clone and customize catalog templates to align with their unique workflows. For example, a ServiceNow approval step can be added before an environment provisioning is initiated.

Note

Limited Access: System templates are selectively enabled for customer organizations. Please contact support if you would like these templates enabled for your organization.


System Templates

The following templates are available with this release. Additional system templates will be made available progressively along with incremental updates to the existing templates.

Cluster Lifecycle

# Template Name Description
1 system-gke Standardize Cluster Provisioning and Management with Google Kubernetes Engine (GKE)
2 system-mks Standardize Cluster Provisioning and Management on Private Cloud with Rafay's Kubernetes Distribution
3 system-vsphere-mks Standardize Cluster Provisioning and Management on VMware vSphere with Rafay's Kubernetes Distribution

Multi-Tenancy

# Template Name Description
1 system-vcluster-anyk8s Implement vCluster-based multi-tenancy with required security controls to reduce infrastructure costs

AI/ML

# Template Name Description
1 system-kuberay-anyk8s Enable Self-Service Deployment of Ray vClusters for Data Scientists and ML Engineers
2 system-mks-kubeflow Implement a Kubeflow-based ML platform on private cloud
3 system-gcp-kubeflow Implement a Kubeflow-based ML platform on GCP
3 system-kubeflowprofile-anyk8s Create and Manage Kubeflow Profiles with Collaboration Controls
4 system-notebook-anyk8s Enable Self-Service Deployment of Jupyter Notebooks for Data Scientists and ML Engineers

Note

The templates are designed to support both Day 0 (initial setup) and Day 2 (ongoing management and maintenance) operations.

For more information on this feature , please refer here


v1.1.39 - Terraform Provider

13 Dec, 2024

An updated version of the Terraform provider is now available.

Enhancements

This release includes several enhancements, improving configuration options across different resources:

  • rafay_mks_cluster: Added configuration for installer_ttl (Conjurer TTL).

  • rafay_config_context, rafay_environment_template, rafay_resource_template: Added configuration for selectors to alias a variable and restrict the override scope.


Bug Fixes

Bug ID Description
RC-38662 Terraform: pod_identity_associations is failing.
RC-38516 EKS: Terraform provider crashes if the user provides the same name to multiple VPC subnets.

v2.12 - SaaS

09 Dec, 2024


Upstream Kubernetes for Bare Metal and VMs

Exposing Kubelet Arguments for Clusters

Support is being added for configuring additional Kubelet arguments for upstream Kubernetes clusters (Rafay's Managed Kubernetes offering), offering users greater flexibility to fine-tune the behavior of Kubelet (the node-level component that manages pods and containers.)

With this enhancement, administrators will be able to tailor Kubelet configurations to meet specific operational needs, ensuring optimal performance. These configurations can be applied both at the cluster level and the node level during Day 2 operations.

UI Configuration Example for Using Kubelet Args

Day 0 and Day 2 Support

  • Day 0: Configure Kubelet arguments during the initial cluster setup

Day 0

  • Day 2: Modify Kubelet arguments for existing clusters

Day 2

Node-Level Kubelet Args

For Day 2 configuration, you can also modify Kubelet arguments at the node level to meet specific operational needs.

Node-Level

RCTL Cluster Configuration Example for Using Kubelet Args

Below is an example configuration showcasing how to use the RCTL Cluster specification to configure kubeletExtraArgs at both the cluster and node levels.

apiVersion: infra.k8smgmt.io/v3
kind: Cluster
metadata:
  name: demo-mks
  project: demo
spec:
  blueprint:
    name: minimal
    version: latest
  config:
    autoApproveNodes: true
    dedicatedMastersEnabled: false
    highAvailability: false
    kubeletExtraArgs:  # Cluster-Level  
      max-pods: '220'  
    installerTtl: 365
    kubernetesVersion: v1.30.4
    location: sanjose-us
    network:
      cni:
        name: Calico
        version: 3.26.1
      podSubnet: 10.244.0.0/16
      serviceSubnet: 10.96.0.0/12
    nodes:
    - arch: amd64
      hostname: demo-mks-kubelet-1
      kubeletExtraArgs:  # Node-Level
        max-pods: '100'   
      labels:
        testType: scale-test
      operatingSystem: Ubuntu20.04
      privateip: <Ip Address>
      roles:
      - Worker
      - Master
      ssh:
        ipAddress: <Ip Address>
        port: "22"
        privateKeyPath: /Users/demo/.ssh/id_rsa
        username: ubuntu
  type: mks

For more information on this feature, please refer here

CNI Customization

Support for CNI customization using the blueprint add-on approach is being added with this enhancement. This feature enables users to seamlessly modify Primary CNI configurations, ensuring they align with the specific networking requirements of clusters.

Important

The following primary CNIs have been qualified: Cilium, Calico, Kube-OVN

Note: You must add the following labels to the respective CNI add-ons

key: rafay.type
value: cni

key: rafay.cni.name
value: cilium   # or calico or kube-ovn

Add-on Example

Addon Labels

Cluster with Primary CNI

Cluster

For more information on this feature, please refer here

Node Approval Enhancement

Enhancements to the node approval process have been introduced in this release to streamline and improve it as part of the node addition process. The approval time has been significantly reduced, with the approval of a node now taking approximately 80 seconds, making the process much faster with this enhancement.

Conjurer Enhancement

In this release, we have enhanced the behavior of the conjurer -d command to ensure better handling of cron jobs during cleanup.

  • Previously, the conjurer -d command unintentionally cleaned up user-created cron jobs in addition to those created during the provisioning process
  • With this enhancement:
  • The conjurer -d command now exclusively cleans up cron jobs brought up as part of the provisioning process
  • User-created cron jobs are no longer affected during cleanup

UI Enhancement: Force Delete Option for Nodes

This enhancement introduces a force delete option for nodes, designed to address situations where a node is in a bad state, and standard deletion attempts fail. The force delete option repeatedly tries to reach the node, and if unsuccessful, it removes the node directly from the Rafay controller. This ensures that unresponsive nodes can be efficiently cleaned up.

Force Delete


Amazon EKS

Access Entries

With this enhancement, enhanced Amazon EKS access management controls to manage the access of AWS IAM principals (users, groups, and roles) to Amazon EKS clusters has been added. This includes a new set of controls, called access entries, for managing the access of IAM principals to Kubernetes clusters.

Important

Permissions Required to Leverage New Modes (API and API_AND_CONFIG_MAP):

  • eks:ListAccessPolicies
  • eks:ListAccessEntries
  • eks:ListAssociatedAccessPolicies
  • eks:AssociateAccessPolicy
  • eks:CreateAccessEntry
  • eks:UpdateAccessEntry
  • eks:DescribeAccessEntry
  • eks:DisassociateAccessPolicy
  • eks:DeleteAccessEntry

Day 0

Access entries can be configured during cluster creation

Access Day 0

Day 2

Access entries and authentication modes can be updated post-cluster creation

Access Day 2

For more information on this feature , please refer here

CNI Customization

Support for CNI customization using the blueprint add-on approach is being added with this enhancement. This feature enables users to seamlessly modify Primary CNI configurations, ensuring they align with the specific networking requirements of clusters.

Important

The following primary CNIs have been qualified: Cilium, Calico.

Note: You must add the following labels to the respective CNI add-ons:

key: rafay.type
value: cni

key: rafay.cni.name
value: cilium   # or calico

Note

Cilium Support:
- Cilium is supported in Day 2 only, as the cluster is initially provisioned using the default AWS CNI.
- Once the cluster provisioning is complete, users can retrieve the cluster endpoint.
- This endpoint is required to configure the Cilium add-on by updating the endpoint in the values.yaml file of the Cilium add-on.
- After updating, create a blueprint and apply it to update Cilium on the cluster.

For more information on this feature , please refer here

Pod Identity Enhancement

  • Adding Pod Identity Association Support for Managed Add-ons:
    This release enables EKS Managed Add-ons to leverage IAM permissions through EKS Pod Identity Associations, enhancing flexibility and security for managed add-ons.

Add the global configuration to enable automatic Pod Identity Associations for all add-ons:

addonsConfig:
  autoApplyPodIdentityAssociations: true

Below is a sample configuration for managed add-ons with Pod Identity Associations defined for specific add-ons:

addons:
- name: eks-pod-identity-agent
  version: latest
- name: vpc-cni
  podIdentityAssociations:
  - namespace: kube-system
    permissionPolicyARNs:
    - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
    serviceAccountName: aws-node
  version: latest
- name: coredns
  version: latest
- name: kube-proxy
  version: latest
- name: aws-ebs-csi-driver
  version: latest

GitOps System Sync

Deletion of Clusters and Environments

In this release, we have introduced termination protection settings to prevent accidental deletion of clusters and environments during a GitOps System Sync (Git to System) operation.

Key Details:

  • With this setting, users have the flexibility to allow or prevent deletion of clusters or environments through GitOps interface
  • Termination protection is controlled at the organizational level under System > Settings
  • Platform Admins can enable or disable this flag based on organizational requirements
  • Default Setting: Termination protection is enabled by default (this is in line with the current behavior)

Termination Protection

For more information on this feature , please refer here


Environment Manager

Auto-population of Variables

A new feature now allows users to retrieve variables defined in the IaC linked to a resource template directly through the UI, eliminating the need to manually add variables one by one. Once the variables are fetched, users are presented with two options:

  • Merge Input Variables: Use this option to add only the new or missing variables, leaving any already-defined variables intact
  • Replace Input Variables: Select this option to redefine the variables completely, starting from scratch.

Auto-populate variables

Selector for Variables (Alias)

This feature has been introduced to address the following use cases:

  • The same variable names are defined across multiple resource templates
  • A more user-friendly variable name is preferred during an environment launch

To achieve this, a Selector feature is now available at the environment template level, allowing users to customize and manage variable names for these scenarios.

Selector for variables


Console Logins & API Access

IP Whitelist

This feature enhances security by allowing customers to restrict console logins and API access to authorized IP addresses. With IP whitelisting, administrators can specify exact IP addresses or CIDR IP ranges, ensuring that only users from approved networks or addresses can access the console or perform API calls.

For more information on this feature , please refer here

Note

This capability is enabled selectively for customers and is not available to all organizations by default.


Visibility & Monitoring

GPU Operator

Prior to this release, the Rafay integrated GPU monitoring stack required the GPU Operator to be installed in the gpu-operator-resources namespace. With this enhancement, customers can now specify the namespace where the GPU Operator is installed, similar to other Prometheus-related configurations (e.g., Kube State Metrics, Node Exporter).

GPU Operator

For more information on this feature , please refer here


Cost Management

Profiles

In a previous release, the Use Cluster Credential option was introduced for AWS Cost Profiles. With this enhancement, the feature is now extended to Azure and GCP as well.

This improvement eliminates the need to configure multiple Cost Profiles or Blueprints in scenarios where clusters operate across multiple AWS accounts. The Cloud Credential used for cluster provisioning will also be utilized to retrieve custom pricing information for cost calculations.


Catalog

Additions to System Catalog

The System Catalog has been updated to add support for the following repositories.

Category Description
CNI kube-ovn

Bug Fixes

Bug ID Description
RC-38353 RHEL9 master node fails to add to RHEL8 cluster
RC-37906 Upstream K8s: /usr/bin/conjurer -d removes existing cronjobs
RC-33380 Workload updates "modified at" field even if no changes were made before clicking "Publish"
RC-39014 Workflow gets stuck when configuring an expression that evaluates to JSON in the format "prefix-(expr)"