Stay tuned for updates on our upcoming release, featuring new features, enhancements, and bug fixes.
For more details, check out the upcoming release notes here.
The primary objective of Rafay's GPU PaaS offering is to empower GPU Cloud providers and enterprise customers to deliver GPU self-service capabilities to their end users. To facilitate this, two additional portals are available:
PaaS Studio: Designed for administrators to create and manage pre-defined configurations (profiles), such as GPU compute, that can be made accessible to end users
Developer Hub: Intended for end users to log in and launch instances based on the pre-defined profiles for their consumption
Note
Limited Access: This capability is selectively enabled for customer organizations. Please contact support if you would like this feature to be enabled for your organization.
These additional portals are, by default, accessible only to Organization Admins. Non-Org Admin users must be explicitly assigned a role to gain access to these portals. Additional roles have been introduced and must be granted explicitly for non-Org Admin users to access these portals.
For more information on this feature , please refer here
With this, the platform will offer a catalog of pre-built system templates designed to simplify and enhance the user experience for administrators. These templates are fully supported by Rafay, with regular updates and new features added over time. With these templates, Rafay Admins have to follow two simple steps to provide a self service experience for their end users:
Configure, customize the system template (i.e. provide credentials, specify defaults and determine what values end users can/cannot override) in a project owned by the Platform team
Publish by sharing the template with end user projects
Info
Administrators can also use the published system template with instance and service profiles in the newly introduced PaaS to provide end users with a self service experience.
The system catalog will significantly improves the process of consuming capabilities available in the platform.
Key Features:
Shareable Templates: Organization Administrators can share the catalog templates with specific projects, enabling the creation of environment instances based on them
Customizable: Users can clone and customize catalog templates to align with their unique workflows. For example, a ServiceNow approval step can be added before an environment provisioning is initiated.
Note
Limited Access: System templates are selectively enabled for customer organizations. Please contact support if you would like these templates enabled for your organization.
The following templates are available with this release. Additional system templates will be made available progressively along with incremental updates to the existing templates.
Cluster Lifecycle
#
Template Name
Description
1
system-gke
Standardize Cluster Provisioning and Management with Google Kubernetes Engine (GKE)
2
system-mks
Standardize Cluster Provisioning and Management on Private Cloud with Rafay's Kubernetes Distribution
3
system-vsphere-mks
Standardize Cluster Provisioning and Management on VMware vSphere with Rafay's Kubernetes Distribution
Multi-Tenancy
#
Template Name
Description
1
system-vcluster-anyk8s
Implement vCluster-based multi-tenancy with required security controls to reduce infrastructure costs
AI/ML
#
Template Name
Description
1
system-kuberay-anyk8s
Enable Self-Service Deployment of Ray vClusters for Data Scientists and ML Engineers
This release includes several enhancements, improving configuration options across different resources:
rafay_mks_cluster: Added configuration for installer_ttl (Conjurer TTL).
rafay_config_context, rafay_environment_template, rafay_resource_template: Added configuration for selectors to alias a variable and restrict the override scope.
Support is being added for configuring additional Kubelet arguments for upstream Kubernetes clusters (Rafay's Managed Kubernetes offering), offering users greater flexibility to fine-tune the behavior of Kubelet (the node-level component that manages pods and containers.)
With this enhancement, administrators will be able to tailor Kubelet configurations to meet specific operational needs, ensuring optimal performance. These configurations can be applied both at the cluster level and the node level during Day 2 operations.
Day 0: Configure Kubelet arguments during the initial cluster setup
Day 2: Modify Kubelet arguments for existing clusters
Node-Level Kubelet Args
For Day 2 configuration, you can also modify Kubelet arguments at the node level to meet specific operational needs.
RCTL Cluster Configuration Example for Using Kubelet Args¶
Below is an example configuration showcasing how to use the RCTLCluster specification to configure kubeletExtraArgs at both the cluster and node levels.
Support for CNI customization using the blueprint add-on approach is being added with this enhancement. This feature enables users to seamlessly modify Primary CNI configurations, ensuring they align with the specific networking requirements of clusters.
Important
The following primary CNIs have been qualified: Cilium, Calico, Kube-OVN
Note: You must add the following labels to the respective CNI add-ons
key:rafay.typevalue:cnikey:rafay.cni.namevalue:cilium# or calico or kube-ovn
Enhancements to the node approval process have been introduced in this release to streamline and improve it as part of the node addition process. The approval time has been significantly reduced, with the approval of a node now taking approximately 80 seconds, making the process much faster with this enhancement.
This enhancement introduces a force delete option for nodes, designed to address situations where a node is in a bad state, and standard deletion attempts fail. The force delete option repeatedly tries to reach the node, and if unsuccessful, it removes the node directly from the Rafay controller. This ensures that unresponsive nodes can be efficiently cleaned up.
With this enhancement, enhanced Amazon EKS access management controls to manage the access of AWS IAM principals (users, groups, and roles) to Amazon EKS clusters has been added. This includes a new set of controls, called access entries, for managing the access of IAM principals to Kubernetes clusters.
Important
Permissions Required to Leverage New Modes (API and API_AND_CONFIG_MAP):
eks:ListAccessPolicies
eks:ListAccessEntries
eks:ListAssociatedAccessPolicies
eks:AssociateAccessPolicy
eks:CreateAccessEntry
eks:UpdateAccessEntry
eks:DescribeAccessEntry
eks:DisassociateAccessPolicy
eks:DeleteAccessEntry
Day 0
Access entries can be configured during cluster creation
Day 2
Access entries and authentication modes can be updated post-cluster creation
For more information on this feature , please refer here
Support for CNI customization using the blueprint add-on approach is being added with this enhancement. This feature enables users to seamlessly modify Primary CNI configurations, ensuring they align with the specific networking requirements of clusters.
Important
The following primary CNIs have been qualified: Cilium, Calico.
Note: You must add the following labels to the respective CNI add-ons:
key:rafay.typevalue:cnikey:rafay.cni.namevalue:cilium# or calico
Note
Cilium Support:
- Cilium is supported in Day 2 only, as the cluster is initially provisioned using the default AWS CNI.
- Once the cluster provisioning is complete, users can retrieve the cluster endpoint.
- This endpoint is required to configure the Cilium add-on by updating the endpoint in the values.yaml file of the Cilium add-on.
- After updating, create a blueprint and apply it to update Cilium on the cluster.
For more information on this feature , please refer here
Adding Pod Identity Association Support for Managed Add-ons:
This release enables EKS Managed Add-ons to leverage IAM permissions through EKS Pod Identity Associations, enhancing flexibility and security for managed add-ons.
Add the global configuration to enable automatic Pod Identity Associations for all add-ons:
In this release, we have introduced termination protection settings to prevent accidental deletion of clusters and environments during a GitOps System Sync (Git to System) operation.
A new feature now allows users to retrieve variables defined in the IaC linked to a resource template directly through the UI, eliminating the need to manually add variables one by one. Once the variables are fetched, users are presented with two options:
Merge Input Variables: Use this option to add only the new or missing variables, leaving any already-defined variables intact
Replace Input Variables: Select this option to redefine the variables completely, starting from scratch.
This feature has been introduced to address the following use cases:
The same variable names are defined across multiple resource templates
A more user-friendly variable name is preferred during an environment launch
To achieve this, a Selector feature is now available at the environment template level, allowing users to customize and manage variable names for these scenarios.
This feature enhances security by allowing customers to restrict console logins and API access to authorized IP addresses. With IP whitelisting, administrators can specify exact IP addresses or CIDR IP ranges, ensuring that only users from approved networks or addresses can access the console or perform API calls.
For more information on this feature , please refer here
Note
This capability is enabled selectively for customers and is not available to all organizations by default.
Prior to this release, the Rafay integrated GPU monitoring stack required the GPU Operator to be installed in the gpu-operator-resources namespace. With this enhancement, customers can now specify the namespace where the GPU Operator is installed, similar to other Prometheus-related configurations (e.g., Kube State Metrics, Node Exporter).
For more information on this feature , please refer here
In a previous release, the Use Cluster Credential option was introduced for AWS Cost Profiles. With this enhancement, the feature is now extended to Azure and GCP as well.
This improvement eliminates the need to configure multiple Cost Profiles or Blueprints in scenarios where clusters operate across multiple AWS accounts. The Cloud Credential used for cluster provisioning will also be utilized to retrieve custom pricing information for cost calculations.