Dec
Upcoming Release
Stay tuned for updates on our upcoming release, featuring new features, enhancements, and bug fixes.
For more details, check out the upcoming release notes here.
v2.12 Update 1 - SaaS¶
26 Dec, 2024
The section below provides a brief description of the new functionality and enhancements in this release.
GPU Platform as a Service (GPU PaaS)¶
The primary objective of Rafay's GPU PaaS offering is to empower GPU Cloud providers and enterprise customers to deliver GPU self-service capabilities to their end users. To facilitate this, two additional portals are available:
-
PaaS Studio: Designed for administrators to create and manage pre-defined configurations (profiles), such as GPU compute, that can be made accessible to end users
-
Developer Hub: Intended for end users to log in and launch instances based on the pre-defined profiles for their consumption
Note
Limited Access: This capability is selectively enabled for customer organizations. Please contact support if you would like this feature to be enabled for your organization.
These additional portals are, by default, accessible only to Organization Admins. Non-Org Admin users must be explicitly assigned a role to gain access to these portals. Additional roles have been introduced and must be granted explicitly for non-Org Admin users to access these portals.
For more information on this feature , please refer here
Catalog for System Templates¶
System Catalog¶
With this, the platform will offer a catalog of pre-built system templates designed to simplify and enhance the user experience for administrators. These templates are fully supported by Rafay, with regular updates and new features added over time.
With the system templates, Rafay Admins have to follow two simple steps to provide a self service experience for their end users:
- Configure, customize the system template (i.e. provide credentials, specify defaults and determine what values end users can/cannot override) in a project owned by the Platform team
- Publish by sharing the template with end user projects
Info
Administrators can also use the published system template with instance and service profiles in the newly introduced PaaS to provide end users with a self service experience.
The system catalog will significantly improves the process of consuming capabilities available in the platform.
Key Features:
-
Shareable Templates: Organization Administrators can share the catalog templates with specific projects, enabling the creation of environment instances based on them
-
Customizable: Users can clone and customize catalog templates to align with their unique workflows. For example, a ServiceNow approval step can be added before an environment provisioning is initiated.
Note
Limited Access: System templates are selectively enabled for customer organizations. Please contact support if you would like these templates enabled for your organization.
System Templates¶
The following templates are available with this release. Additional system templates will be made available progressively along with incremental updates to the existing templates.
Cluster Lifecycle
# | Template Name | Description |
---|---|---|
1 | system-gke | Standardize Cluster Provisioning and Management with Google Kubernetes Engine (GKE) |
2 | system-mks | Standardize Cluster Provisioning and Management on Private Cloud with Rafay's Kubernetes Distribution |
3 | system-vsphere-mks | Standardize Cluster Provisioning and Management on VMware vSphere with Rafay's Kubernetes Distribution |
Multi-Tenancy
# | Template Name | Description |
---|---|---|
1 | system-vcluster-anyk8s | Implement vCluster-based multi-tenancy with required security controls to reduce infrastructure costs |
AI/ML
# | Template Name | Description |
---|---|---|
1 | system-kuberay-anyk8s | Enable Self-Service Deployment of Ray vClusters for Data Scientists and ML Engineers |
2 | system-mks-kubeflow | Implement a Kubeflow-based ML platform on private cloud |
3 | system-gcp-kubeflow | Implement a Kubeflow-based ML platform on GCP |
3 | system-kubeflowprofile-anyk8s | Create and Manage Kubeflow Profiles with Collaboration Controls |
4 | system-notebook-anyk8s | Enable Self-Service Deployment of Jupyter Notebooks for Data Scientists and ML Engineers |
Note
The templates are designed to support both Day 0 (initial setup) and Day 2 (ongoing management and maintenance) operations.
For more information on this feature , please refer here
v1.1.39 - Terraform Provider¶
13 Dec, 2024
An updated version of the Terraform provider is now available.
Enhancements¶
This release includes several enhancements, improving configuration options across different resources:
-
rafay_mks_cluster
: Added configuration forinstaller_ttl
(Conjurer TTL). -
rafay_config_context
,rafay_environment_template
,rafay_resource_template
: Added configuration for selectors to alias a variable and restrict the override scope.
Bug Fixes¶
Bug ID | Description |
---|---|
RC-38662 | Terraform: pod_identity_associations is failing. |
RC-38516 | EKS: Terraform provider crashes if the user provides the same name to multiple VPC subnets. |
v2.12 - SaaS¶
09 Dec, 2024
Upstream Kubernetes for Bare Metal and VMs¶
Exposing Kubelet Arguments for Clusters¶
Support is being added for configuring additional Kubelet arguments for upstream Kubernetes clusters (Rafay's Managed Kubernetes offering), offering users greater flexibility to fine-tune the behavior of Kubelet (the node-level component that manages pods and containers.)
With this enhancement, administrators will be able to tailor Kubelet configurations to meet specific operational needs, ensuring optimal performance. These configurations can be applied both at the cluster level and the node level during Day 2 operations.
UI Configuration Example for Using Kubelet Args¶
Day 0 and Day 2 Support
- Day 0: Configure Kubelet arguments during the initial cluster setup
- Day 2: Modify Kubelet arguments for existing clusters
Node-Level Kubelet Args
For Day 2 configuration, you can also modify Kubelet arguments at the node level to meet specific operational needs.
RCTL Cluster Configuration Example for Using Kubelet Args¶
Below is an example configuration showcasing how to use the RCTL Cluster
specification to configure kubeletExtraArgs
at both the cluster and node levels.
apiVersion: infra.k8smgmt.io/v3
kind: Cluster
metadata:
name: demo-mks
project: demo
spec:
blueprint:
name: minimal
version: latest
config:
autoApproveNodes: true
dedicatedMastersEnabled: false
highAvailability: false
kubeletExtraArgs: # Cluster-Level
max-pods: '220'
installerTtl: 365
kubernetesVersion: v1.30.4
location: sanjose-us
network:
cni:
name: Calico
version: 3.26.1
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
nodes:
- arch: amd64
hostname: demo-mks-kubelet-1
kubeletExtraArgs: # Node-Level
max-pods: '100'
labels:
testType: scale-test
operatingSystem: Ubuntu20.04
privateip: <Ip Address>
roles:
- Worker
- Master
ssh:
ipAddress: <Ip Address>
port: "22"
privateKeyPath: /Users/demo/.ssh/id_rsa
username: ubuntu
type: mks
For more information on this feature, please refer here
CNI Customization¶
Support for CNI customization using the blueprint add-on approach is being added with this enhancement. This feature enables users to seamlessly modify Primary CNI configurations, ensuring they align with the specific networking requirements of clusters.
Important
The following primary CNIs have been qualified: Cilium, Calico, Kube-OVN
Note: You must add the following labels to the respective CNI add-ons
key: rafay.type
value: cni
key: rafay.cni.name
value: cilium # or calico or kube-ovn
Add-on Example
Cluster with Primary CNI
Info
To learn more about this feature, please read our recent blogs: Deploying Custom CNI (Kube-OVN) in Rafay MKS Upstream Kubernetes Cluster Using the Blueprint Add-On Approach
For more information on this feature, please refer here
Node Approval Enhancement¶
Enhancements to the node approval process have been introduced in this release to streamline and improve it as part of the node addition process. The approval time has been significantly reduced, with the approval of a node now taking approximately 80 seconds, making the process much faster with this enhancement.
Conjurer Enhancement¶
In this release, we have enhanced the behavior of the conjurer -d
command to ensure better handling of cron jobs during cleanup.
- Previously, the
conjurer -d
command unintentionally cleaned up user-created cron jobs in addition to those created during the provisioning process - With this enhancement:
- The
conjurer -d
command now exclusively cleans up cron jobs brought up as part of the provisioning process - User-created cron jobs are no longer affected during cleanup
UI Enhancement: Force Delete Option for Nodes¶
This enhancement introduces a force delete option for nodes, designed to address situations where a node is in a bad state, and standard deletion attempts fail. The force delete option repeatedly tries to reach the node, and if unsuccessful, it removes the node directly from the Rafay controller. This ensures that unresponsive nodes can be efficiently cleaned up.
Amazon EKS¶
Access Entries¶
With this enhancement, enhanced Amazon EKS access management controls to manage the access of AWS IAM principals (users, groups, and roles) to Amazon EKS clusters has been added. This includes a new set of controls, called access entries, for managing the access of IAM principals to Kubernetes clusters.
Important
Permissions Required to Leverage New Modes (API and API_AND_CONFIG_MAP):
eks:ListAccessPolicies
eks:ListAccessEntries
eks:ListAssociatedAccessPolicies
eks:AssociateAccessPolicy
eks:CreateAccessEntry
eks:UpdateAccessEntry
eks:DescribeAccessEntry
eks:DisassociateAccessPolicy
eks:DeleteAccessEntry
Day 0
Access entries can be configured during cluster creation
Day 2
Access entries and authentication modes can be updated post-cluster creation
For more information on this feature , please refer here
CNI Customization¶
Support for CNI customization using the blueprint add-on approach is being added with this enhancement. This feature enables users to seamlessly modify Primary CNI configurations, ensuring they align with the specific networking requirements of clusters.
Important
The following primary CNIs have been qualified: Cilium, Calico.
Note: You must add the following labels to the respective CNI add-ons:
key: rafay.type
value: cni
key: rafay.cni.name
value: cilium # or calico
Note
Cilium Support:
- Cilium is supported in Day 2 only, as the cluster is initially provisioned using the default AWS CNI.
- Once the cluster provisioning is complete, users can retrieve the cluster endpoint.
- This endpoint is required to configure the Cilium add-on by updating the endpoint
in the values.yaml
file of the Cilium add-on.
- After updating, create a blueprint and apply it to update Cilium on the cluster.
For more information on this feature , please refer here
Pod Identity Enhancement¶
- Adding Pod Identity Association Support for Managed Add-ons:
This release enables EKS Managed Add-ons to leverage IAM permissions through EKS Pod Identity Associations, enhancing flexibility and security for managed add-ons.
Add the global configuration to enable automatic Pod Identity Associations for all add-ons:
addonsConfig:
autoApplyPodIdentityAssociations: true
Below is a sample configuration for managed add-ons with Pod Identity Associations defined for specific add-ons:
addons:
- name: eks-pod-identity-agent
version: latest
- name: vpc-cni
podIdentityAssociations:
- namespace: kube-system
permissionPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
serviceAccountName: aws-node
version: latest
- name: coredns
version: latest
- name: kube-proxy
version: latest
- name: aws-ebs-csi-driver
version: latest
GitOps System Sync¶
Deletion of Clusters and Environments¶
In this release, we have introduced termination protection settings to prevent accidental deletion of clusters and environments during a GitOps System Sync (Git to System) operation.
Key Details:¶
- With this setting, users have the flexibility to allow or prevent deletion of clusters or environments through GitOps interface
- Termination protection is controlled at the organizational level under System > Settings
- Platform Admins can enable or disable this flag based on organizational requirements
- Default Setting: Termination protection is enabled by default (this is in line with the current behavior)
For more information on this feature , please refer here
Environment Manager¶
Auto-population of Variables¶
A new feature now allows users to retrieve variables defined in the IaC linked to a resource template directly through the UI, eliminating the need to manually add variables one by one. Once the variables are fetched, users are presented with two options:
- Merge Input Variables: Use this option to add only the new or missing variables, leaving any already-defined variables intact
- Replace Input Variables: Select this option to redefine the variables completely, starting from scratch.
Selector for Variables (Alias)¶
This feature has been introduced to address the following use cases:
- The same variable names are defined across multiple resource templates
- A more user-friendly variable name is preferred during an environment launch
To achieve this, a Selector feature is now available at the environment template level, allowing users to customize and manage variable names for these scenarios.
Console Logins & API Access¶
IP Whitelist¶
This feature enhances security by allowing customers to restrict console logins and API access to authorized IP addresses. With IP whitelisting, administrators can specify exact IP addresses or CIDR IP ranges, ensuring that only users from approved networks or addresses can access the console or perform API calls.
For more information on this feature , please refer here
Note
This capability is enabled selectively for customers and is not available to all organizations by default.
Visibility & Monitoring¶
GPU Operator¶
Prior to this release, the Rafay integrated GPU monitoring stack required the GPU Operator to be installed in the gpu-operator-resources namespace. With this enhancement, customers can now specify the namespace where the GPU Operator is installed, similar to other Prometheus-related configurations (e.g., Kube State Metrics, Node Exporter).
For more information on this feature , please refer here
Cost Management¶
Profiles¶
In a previous release, the Use Cluster Credential option was introduced for AWS Cost Profiles. With this enhancement, the feature is now extended to Azure and GCP as well.
This improvement eliminates the need to configure multiple Cost Profiles or Blueprints in scenarios where clusters operate across multiple AWS accounts. The Cloud Credential used for cluster provisioning will also be utilized to retrieve custom pricing information for cost calculations.
Catalog¶
Additions to System Catalog¶
The System Catalog has been updated to add support for the following repositories.
Category | Description |
---|---|
CNI | kube-ovn |
Bug Fixes¶
Bug ID | Description |
---|---|
RC-38353 | RHEL9 master node fails to add to RHEL8 cluster |
RC-37906 | Upstream K8s: /usr/bin/conjurer -d removes existing cronjobs |
RC-33380 | Workload updates "modified at" field even if no changes were made before clicking "Publish" |
RC-39014 | Workflow gets stuck when configuring an expression that evaluates to JSON in the format "prefix-(expr)" |