Dec

Upcoming Release

Stay tuned for updates on our upcoming release, featuring new features, enhancements, and bug fixes.
For more details, check out the upcoming release notes here.

v2.12 Update 1 - SaaS¶

26 Dec, 2024

The section below provides a brief description of the new functionality and enhancements in this release.

GPU Platform as a Service (GPU PaaS)¶

The primary objective of Rafay's GPU PaaS offering is to empower GPU Cloud providers and enterprise customers to deliver GPU self-service capabilities to their end users. To facilitate this, two additional portals are available:

PaaS Studio: Designed for administrators to create and manage pre-defined configurations (profiles), such as GPU compute, that can be made accessible to end users
Developer Hub: Intended for end users to log in and launch instances based on the pre-defined profiles for their consumption

Note

Limited Access: This capability is selectively enabled for customer organizations. Please contact support if you would like this feature to be enabled for your organization.

These additional portals are, by default, accessible only to Organization Admins. Non-Org Admin users must be explicitly assigned a role to gain access to these portals. Additional roles have been introduced and must be granted explicitly for non-Org Admin users to access these portals.

For more information on this feature , please refer here

Template Catalog¶

With this, the platform will offer a catalog of pre-built system templates designed to simplify and enhance the user experience for administrators. These templates are fully supported by Rafay, with regular updates and new features added over time. With these templates, Rafay Admins have to follow two simple steps to provide a self service experience for their end users:

Configure, customize the system template (i.e. provide credentials, specify defaults and determine what values end users can/cannot override) in a project owned by the Platform team
Publish by sharing the template with end user projects

Info

Administrators can also use the published system template with instance and service profiles in the newly introduced PaaS to provide end users with a self service experience.

The system catalog will significantly improves the process of consuming capabilities available in the platform.

Key Features:

Shareable Templates: Organization Administrators can share the catalog templates with specific projects, enabling the creation of environment instances based on them
Customizable: Users can clone and customize catalog templates to align with their unique workflows. For example, a ServiceNow approval step can be added before an environment provisioning is initiated.

Note

Limited Access: System templates are selectively enabled for customer organizations. Please contact support if you would like these templates enabled for your organization.

Templates¶

The following templates are available with this release. Additional system templates will be made available progressively along with incremental updates to the existing templates.

Cluster Lifecycle

#	Template Name	Description
1	system-gke	Standardize Cluster Provisioning and Management with Google Kubernetes Engine (GKE)
2	system-mks	Standardize Cluster Provisioning and Management on Private Cloud with Rafay's Kubernetes Distribution
3	system-vsphere-mks	Standardize Cluster Provisioning and Management on VMware vSphere with Rafay's Kubernetes Distribution

Multi-Tenancy

#	Template Name	Description
1	system-vcluster-anyk8s	Implement vCluster-based multi-tenancy with required security controls to reduce infrastructure costs

AI/ML

#	Template Name	Description
1	system-kuberay-anyk8s	Enable Self-Service Deployment of Ray vClusters for Data Scientists and ML Engineers
2	system-mks-kubeflow	Implement a Kubeflow-based ML platform on private cloud
3	system-gcp-kubeflow	Implement a Kubeflow-based ML platform on GCP
3	system-kubeflowprofile-anyk8s	Create and Manage Kubeflow Profiles with Collaboration Controls
4	system-notebook-anyk8s	Enable Self-Service Deployment of Jupyter Notebooks for Data Scientists and ML Engineers

Note

The templates are designed to support both Day 0 (initial setup) and Day 2 (ongoing management and maintenance) operations.

For more information on this feature , please refer here

v1.1.39 - Terraform Provider¶

13 Dec, 2024

An updated version of the Terraform provider is now available.

Enhancements¶

This release includes several enhancements, improving configuration options across different resources:

rafay_mks_cluster: Added configuration for installer_ttl (Conjurer TTL).
rafay_config_context, rafay_environment_template, rafay_resource_template: Added configuration for selectors to alias a variable and restrict the override scope.

Bug Fixes¶

Bug ID	Description
RC-38662	Terraform: `pod_identity_associations` is failing.
RC-38516	EKS: Terraform provider crashes if the user provides the same name to multiple VPC subnets.

v2.12 - SaaS¶

09 Dec, 2024

Upstream Kubernetes for Bare Metal and VMs¶

Exposing Kubelet Arguments for Clusters¶

Support is being added for configuring additional Kubelet arguments for upstream Kubernetes clusters (Rafay's Managed Kubernetes offering), offering users greater flexibility to fine-tune the behavior of Kubelet (the node-level component that manages pods and containers.)

With this enhancement, administrators will be able to tailor Kubelet configurations to meet specific operational needs, ensuring optimal performance. These configurations can be applied both at the cluster level and the node level during Day 2 operations.

UI Configuration Example for Using Kubelet Args¶

Day 0 and Day 2 Support

Day 0: Configure Kubelet arguments during the initial cluster setup

Day 2: Modify Kubelet arguments for existing clusters

Node-Level Kubelet Args

For Day 2 configuration, you can also modify Kubelet arguments at the node level to meet specific operational needs.

RCTL Cluster Configuration Example for Using Kubelet Args¶

Below is an example configuration showcasing how to use the RCTL Cluster specification to configure kubeletExtraArgs at both the cluster and node levels.


apiVersion: infra.k8smgmt.io/v3
kind: Cluster
metadata:
  name: demo-mks
  project: demo
spec:
  blueprint:
    name: minimal
    version: latest
  config:
    autoApproveNodes: true
    dedicatedMastersEnabled: false
    highAvailability: false
    kubeletExtraArgs:  # Cluster-Level  
      max-pods: '220'  
    installerTtl: 365
    kubernetesVersion: v1.30.4
    location: sanjose-us
    network:
      cni:
        name: Calico
        version: 3.26.1
      podSubnet: 10.244.0.0/16
      serviceSubnet: 10.96.0.0/12
    nodes:
    - arch: amd64
      hostname: demo-mks-kubelet-1
      kubeletExtraArgs:  # Node-Level
        max-pods: '100'   
      labels:
        testType: scale-test
      operatingSystem: Ubuntu20.04
      privateip: <Ip Address>
      roles:
      - Worker
      - Master
      ssh:
        ipAddress: <Ip Address>
        port: "22"
        privateKeyPath: /Users/demo/.ssh/id_rsa
        username: ubuntu
  type: mks

For more information on this feature, please refer here

CNI Customization¶

Support for CNI customization using the blueprint add-on approach is being added with this enhancement. This feature enables users to seamlessly modify Primary CNI configurations, ensuring they align with the specific networking requirements of clusters.

Important

The following primary CNIs have been qualified: Cilium, Calico, Kube-OVN

Note: You must add the following labels to the respective CNI add-ons


key: rafay.type
value: cni

key: rafay.cni.name
value: cilium   # or calico or kube-ovn

Add-on Example

Cluster with Primary CNI

Info

To learn more about this feature, please read our recent blogs: Deploying Custom CNI (Kube-OVN) in Rafay MKS Upstream Kubernetes Cluster Using the Blueprint Add-On Approach

For more information on this feature, please refer here

Node Approval Enhancement¶

Enhancements to the node approval process have been introduced in this release to streamline and improve it as part of the node addition process. The approval time has been significantly reduced, with the approval of a node now taking approximately 80 seconds, making the process much faster with this enhancement.

Conjurer Enhancement¶

In this release, we have enhanced the behavior of the conjurer -d command to ensure better handling of cron jobs during cleanup.

Previously, the conjurer -d command unintentionally cleaned up user-created cron jobs in addition to those created during the provisioning process
With this enhancement:
The conjurer -d command now exclusively cleans up cron jobs brought up as part of the provisioning process
User-created cron jobs are no longer affected during cleanup

UI Enhancement: Force Delete Option for Nodes¶

This enhancement introduces a force delete option for nodes, designed to address situations where a node is in a bad state, and standard deletion attempts fail. The force delete option repeatedly tries to reach the node, and if unsuccessful, it removes the node directly from the Rafay controller. This ensures that unresponsive nodes can be efficiently cleaned up.

Amazon EKS¶

Access Entries¶

With this enhancement, enhanced Amazon EKS access management controls to manage the access of AWS IAM principals (users, groups, and roles) to Amazon EKS clusters has been added. This includes a new set of controls, called access entries, for managing the access of IAM principals to Kubernetes clusters.

Important

Permissions Required to Leverage New Modes (API and API_AND_CONFIG_MAP):

eks:ListAccessPolicies
eks:ListAccessEntries
eks:ListAssociatedAccessPolicies
eks:AssociateAccessPolicy
eks:CreateAccessEntry
eks:UpdateAccessEntry
eks:DescribeAccessEntry
eks:DisassociateAccessPolicy
eks:DeleteAccessEntry

Day 0

Access entries can be configured during cluster creation

Day 2

Access entries and authentication modes can be updated post-cluster creation

For more information on this feature , please refer here

CNI Customization¶

Support for CNI customization using the blueprint add-on approach is being added with this enhancement. This feature enables users to seamlessly modify Primary CNI configurations, ensuring they align with the specific networking requirements of clusters.

Important

The following primary CNIs have been qualified: Cilium, Calico.

Note: You must add the following labels to the respective CNI add-ons:


key: rafay.type
value: cni

key: rafay.cni.name
value: cilium   # or calico

Note

Cilium Support:
- Cilium is supported in Day 2 only, as the cluster is initially provisioned using the default AWS CNI.
- Once the cluster provisioning is complete, users can retrieve the cluster endpoint.
- This endpoint is required to configure the Cilium add-on by updating the endpoint in the values.yaml file of the Cilium add-on.
- After updating, create a blueprint and apply it to update Cilium on the cluster.

For more information on this feature , please refer here

Pod Identity Enhancement¶

Adding Pod Identity Association Support for Managed Add-ons:
This release enables EKS Managed Add-ons to leverage IAM permissions through EKS Pod Identity Associations, enhancing flexibility and security for managed add-ons.

Add the global configuration to enable automatic Pod Identity Associations for all add-ons:


addonsConfig:
  autoApplyPodIdentityAssociations: true

Below is a sample configuration for managed add-ons with Pod Identity Associations defined for specific add-ons:


addons:
- name: eks-pod-identity-agent
  version: latest
- name: vpc-cni
  podIdentityAssociations:
  - namespace: kube-system
    permissionPolicyARNs:
    - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
    serviceAccountName: aws-node
  version: latest
- name: coredns
  version: latest
- name: kube-proxy
  version: latest
- name: aws-ebs-csi-driver
  version: latest

GitOps System Sync¶

Deletion of Clusters and Environments¶

In this release, we have introduced termination protection settings to prevent accidental deletion of clusters and environments during a GitOps System Sync (Git to System) operation.

Key Details:¶

With this setting, users have the flexibility to allow or prevent deletion of clusters or environments through GitOps interface
Termination protection is controlled at the organizational level under System > Settings
Platform Admins can enable or disable this flag based on organizational requirements
Default Setting: Termination protection is enabled by default (this is in line with the current behavior)

For more information on this feature , please refer here

Environment Manager¶

Auto-population of Variables¶

A new feature now allows users to retrieve variables defined in the IaC linked to a resource template directly through the UI, eliminating the need to manually add variables one by one. Once the variables are fetched, users are presented with two options:

Merge Input Variables: Use this option to add only the new or missing variables, leaving any already-defined variables intact
Replace Input Variables: Select this option to redefine the variables completely, starting from scratch.

Selector for Variables (Alias)¶

This feature has been introduced to address the following use cases:

The same variable names are defined across multiple resource templates
A more user-friendly variable name is preferred during an environment launch

To achieve this, a Selector feature is now available at the environment template level, allowing users to customize and manage variable names for these scenarios.

Console Logins & API Access¶

IP Whitelist¶

This feature enhances security by allowing customers to restrict console logins and API access to authorized IP addresses. With IP whitelisting, administrators can specify exact IP addresses or CIDR IP ranges, ensuring that only users from approved networks or addresses can access the console or perform API calls.

For more information on this feature , please refer here

Note

This capability is enabled selectively for customers and is not available to all organizations by default.

Visibility & Monitoring¶

GPU Operator¶

Prior to this release, the Rafay integrated GPU monitoring stack required the GPU Operator to be installed in the gpu-operator-resources namespace. With this enhancement, customers can now specify the namespace where the GPU Operator is installed, similar to other Prometheus-related configurations (e.g., Kube State Metrics, Node Exporter).

For more information on this feature , please refer here

Cost Management¶

Profiles¶

In a previous release, the Use Cluster Credential option was introduced for AWS Cost Profiles. With this enhancement, the feature is now extended to Azure and GCP as well.

This improvement eliminates the need to configure multiple Cost Profiles or Blueprints in scenarios where clusters operate across multiple AWS accounts. The Cloud Credential used for cluster provisioning will also be utilized to retrieve custom pricing information for cost calculations.

Catalog¶

Additions to System Catalog¶

The System Catalog has been updated to add support for the following repositories.

Category	Description
CNI	kube-ovn

Bug Fixes¶

Bug ID	Description
RC-38353	RHEL9 master node fails to add to RHEL8 cluster
RC-37906	Upstream K8s: `/usr/bin/conjurer -d` removes existing cronjobs
RC-33380	Workload updates "modified at" field even if no changes were made before clicking "Publish"
RC-39014	Workflow gets stuck when configuring an expression that evaluates to JSON in the format "prefix- $(expr)$ "