Skip to content

v3.7- Self-Hosted

03 Dec, 2025

Amazon EKS & Azure AKS

Kubernetes v1.33 Support

Benefit

Stay current with the latest Kubernetes release, unlocking new features and ongoing support.

Support for Kubernetes v1.33 is being added for Amazon EKS and Azure AKS. This includes:

  • Provisioning of new clusters with v1.33
  • Upgrading existing clusters from earlier versions to v1.33

AKS 1.33 Cluster Provisioning

AKS 1.33 Cluster Provisioning

AKS 1.33 Cluster Upgrade

AKS 1.33 Cluster Upgrade

EKS 1.33 Cluster Provisioning

EKS 1.33 Cluster Provisioning

AmazonLinux2 Deprecation in EKS 1.33

AWS has deprecated AmazonLinux2 in EKS 1.33. By default, when you create a new cluster with v1.33, the default node AMI family is set to AmazonLinux2023.

For more information, see the AWS documentation on Amazon Linux 2 AMI deprecation.

For existing clusters using AmazonLinux2:

Before upgrading to EKS 1.33, you must:

  1. Add a new node group based on AmazonLinux2023
  2. Test and validate the new node group with your application pods
  3. Migrate pods from the old node group to the new node group
  4. Delete the old node group after migration is complete
  5. Then proceed with the EKS 1.33 upgrade

Important: If any existing EKS cluster with AmazonLinux2 is upgraded to EKS 1.33 directly without adding node groups based on AmazonLinux2023, the node group upgrade will fail with the following error:

internal error: failed to get aws session, InvalidParameterException: AMI Type AL2_x86_64 is only supported for kubernetes versions 1.32 or earlier { RespMetadata: { StatusCode: 400, RequestID: "462022b3-da9b-4d90-a2b9-2d12b9586de3" }, Message_: "AMI Type AL2_x86_64 is only supported for kubernetes versions 1.32 or earlier" }

Upstream Kubernetes for Bare Metal and VMs

Kubernetes v1.34 Support

Benefit

Stay current with the latest Kubernetes release, unlocking new features and ongoing support.

Support is being added for Kubernetes v1.34 for Rafay's Kubernetes distribution. This includes:

  • Provisioning of new clusters with v1.34
  • Upgrading existing clusters from earlier versions to v1.34

1.34 k8s version

Note

For more details on Kubernetes v1.34 support in Rafay MKS, refer to our blog post.

The following new Kubernetes patch versions have been added in this release:

  • v1.33.5
  • v1.32.9
  • v1.31.13

These patch versions include the latest bug fixes from upstream and are available for both new cluster provisioning and cluster upgrades from earlier minor or patch versions. Previous patch versions for v1.31, v1.32, and v1.33 have been removed for new cluster creation only.

v1.30.x EOL

Kubernetes v1.30.12 is End of Life and will be removed soon for new cluster creation. We recommend upgrading existing clusters from v1.30.12 to above supported versions.

Enhanced Debug Capabilities

Benefit

Direct debug log download functionality has been added for upstream clusters, providing enhanced debugging capabilities. Users can now directly download logs from nodes to troubleshoot issues more effectively.

Previously users had to ssh to the nodes to see these logs, now this option is available on the every node card with an option to view/download debug logs.

Note

The debug log view/download functionality requires the node to be healthy and accessible on the network. This feature will not work for nodes that are down or not reachable. For more information, see Node Debug Log Connectivity Issues.

Debug Logs Option

Debug Logs download

For more information about debug logs, see Debug Logs.

RHEL 10 Support

Benefit

Support has been added for Red Hat Enterprise Linux (RHEL) 10 operating system. This allows customers to leverage RHEL 10 based nodes for Rafay MKS clusters.

RHEL 10

ARM Support Enhancement

Benefit

Complete ARM architecture support has been added for Ubuntu 24.04 LTS and Ubuntu 22.04 LTS on both master and worker nodes. This enables customers to leverage ARM-based infrastructure for their Kubernetes workloads.

Previously only worker node support was added and only minimal blueprint support was available.

Platform Version v1.1.0

Benefit

New platform version v1.1.0 provides enhanced cluster management capabilities with integrated core components and automated utilities.

The new platform version v1.1.0 has been added, which includes integration of the Orchestration proxy core component (which proxies communication between the control agent on the cluster and the Rafay Controller) and cluster utils version controller component. The cluster utils component includes utilities pushed to the nodes for self-healing, certificate rotation, and monitoring.

This will help us to push changes to the cluster utils via the platform version upgrade, enabling seamless updates and improved cluster maintenance capabilities. The cluster utils will also have version control.

v1.1.0

For more information about platform versioning, see Platform Versioning.

Automated Binary Cleanup on Cluster and Node Deletion

Benefit

Ensures complete cleanup of binaries and libraries installed during cluster provisioning, preventing leftover artifacts and reducing manual cleanup efforts.

When deleting a cluster or node, the platform now automatically runs conjurer -d to remove all binaries and libraries that were installed during cluster bring-up or conjurer execution. This enhancement ensures:

  • Cluster Deletion: During cluster deletion, conjurer -d is executed at the end of the process to remove all binaries installed on the nodes as part of cluster provisioning or conjurer runs.

  • Node Deletion: When deleting a node that is up and running, the platform removes the node from the platform and automatically runs conjurer -d on the node to remove all installed libraries that were added during cluster bring up.

This automated cleanup process helps maintain clean infrastructure and prevents accumulation of leftover binaries and libraries on nodes.


RCTL Enhancement

Force Delete

Benefit

A new force delete option has been added to RCTL to allow force delete the cluster object if delete failed for any reason.

A new force delete option has been added to RCTL to allow force delete the cluster object if delete failed for any reason.

Usage:

./rctl delete cluster <cluster-name> --force

Note

You need to download the new RCTL binary to consume this flag.


Add-ons, Workloads & Blueprints

Cluster Overrides

Benefit

Improves usability and provides more flexibility for administrators.

Two key UI enhancements:

  • Custom Input for Managed Add-ons – Resource selector previously only supported drop downs for custom add-ons. Now, a free-form Custom Input option is available, enabling overrides for managed add-ons directly from the UI.

Cluster Overrides

  • Enhanced Placement UI – Placement configuration has been redesigned for clarity, especially in scenarios where admins create overrides for clusters across projects (e.g., using labels).

Cluster Overrides

Artifact Files

Benefit

Improves reliability of add-on deployments by automatically retrying artifact fetches before failing.

When artifact fetches (from Git or Helm repos) fail due to transient network issues, the system now retries automatically. This avoids the need to create new add-on versions solely for re-attempting artifact pulls.

Retries Overrides

Support for Kustomize Add-ons

Benefit

Simplifies management and deployment of by leveraging native Kustomize capabilities.

Support is being added for Kustomize with this enhancement. Kustomize is a natively supported Kubernetes tool that allows users to customize raw, template-free YAML files for multiple environments.

With this update, a new add-on type called "Kustomize", in addition to the existing Helm 3 and K8s YAML options is being made available. The workflow remains consistent with the current experience for Helm 3 and YAML-based add-ons, providing users with a familiar and unified deployment model.

Kustomize

Ability to Pull Artifacts from Remote URLs

Benefit

Provides more flexibility and reduces operational overhead by enabling direct deployment of Kubernetes manifests from external sources.

In some cases, customers want to deploy Kubernetes manifests directly from the internet without manually downloading and managing them in internal repositories. To support this, a new option — "Pull files from URL" — is being introduced.

In addition to the existing "Upload files manually" and "Pull files from repository" options, users can now specify a remote URL to automatically fetch and deploy artifacts, streamlining the workflow and minimizing manual steps.

Remote URL


Environment Manager

Reconciliation on Environment Runs

Benefit

Improves operational efficiency and reduces deployment times.

Previously, redeploying an environment always redeployed all resources, even if unchanged. With selective resource reconciliation, admins can now specify the reconcile_resources field to redeploy only targeted resources.

Example: In a failover where only DNS updates are required, admins can reconcile the DNS resource without redeploying the entire environment. If no resources are specified, the system redeploys all resources by default.

Environment Deletion and Failure States

Benefit

Provides clearer lifecycle management and better visibility into environment operations.

Environments now support three actions: Deploy, Destroy, and Delete (which performs a Destroy followed by removal of the environment object).

Enhancements include:

  • New distinct status states
  • Ability to filter environments by Active, Inactive, Deploy Failed, and Delete Failed in both the Environments list and Dashboard
  • For environments in Delete Failed status, admins can review logs, clean up resources, and use a new remove object action to manually delete the environment

Namespace

Ephemeral Storage Resource Quota Limits

Benefit

Improved resource management and cost control through enforcement of ephemeral storage quotas at the namespace level.

A previous release introduced ephemeral storage limits as namespace quotas via non-UI interfaces. This has now been extended to the UI, making configuration easier.

Storage Quotas


Cost Management

Chargeback Reports

Benefit

Enables metadata-enriched chargeback reports for better visibility and more accurate cost allocation across tenants.

Earlier release introduced Chargeback summary reports aggregated by namespace. These now support custom label-based metadata enrichment, enabling more precise chargeback reporting for multi-tenant clusters. This capability is also available through the UI.

Chargeback


Agents

PV Requirement

Benefit

Agents no longer require a Persistent Volume (PV), simplifying deployment and reducing infrastructure overhead.

In previous releases, agents required a PV. Starting with agents v3.7+, this requirement has been removed, allowing for more flexible and streamlined deployments.


OCI Helm Repository

Chart Version Support Enhancement

Benefit

Enhanced compatibility with OCI Helm repositories by supporting chart versions containing special characters like "_" or "+" ensuring seamless integration with these versioning schemes.

An enhancement has been added to the OCI Helm repository functionality to handle chart versions that include "_" (underscore) or "+" (plus) characters in the version scheme. This ensures that such versions work seamlessly with the platform's Helm repository integration.

Previously, chart versions containing these special characters was causing issues during chart deployment. With this enhancement, the platform now properly handles and processes these version formats without any compatibility issues.


Template Catalog

System Templates

AKS System Template

Benefit

Ready to use templates to do complete LCM of cluster to enable self service consumption of managed k8s compute.

The AKS System Template is now available in the Template Catalog.

For more information, see AKS Cluster Template.

GKE Autopilot Template

Benefit

Ready to use templates to do complete LCM of cluster to enable self service consumption of managed k8s compute.

The GKE Autopilot Template is now available in the Template Catalog.

For more information, see GKE Autopilot Template.

Template Catalog Update

Benefit

These templates will be selectively available to Orgs template catalog, providing organizations with access to specialized AI/ML and data processing templates.

The following templates will be selectively available to Orgs template catalog:

  • vLLM Inference on K8s
  • vCluster with KubeRay
  • KubeRay
  • Kubeflow on Private Cloud
  • Kubeflow Profile Management
  • Kubeflow on GCP
  • Jupyter Notebook

Terraform Provider

v1.1.52 - Terraform Provider

17 Oct, 2025

This update of the Terraform provider includes the following improvements and bug fixes.

Enhancements

The following resources have been updated with additional fields and enhanced functionality:

1. rafay_mks_cluster - Platform Version Update

Benefit

Enhanced cluster management capabilities with integrated core components and automated utilities.

The platform_version field has been updated to include platform version v1.1.0, which incorporates cluster core components including: - cluster-utils - Utilities for self-healing, certificate rotation, and monitoring - Orchestration proxy - Proxies communication between the control agent and Rafay Controller

Core components are now version controlled and their upgrade lifecycle is managed through this platform version flow.

For more information, see Platform Version v1.1.0.

2. rafay_import_cluster - Proxy Configuration Support

Benefit

Enables cluster import through proxy environments, improving connectivity options for enterprise deployments.

Added support for proxy configuration through a new proxy_config field, allowing users to leverage their proxy infrastructure when importing clusters to the Rafay platform.


v1.1.53 - Terraform Provider

This update of the Terraform provider includes the following improvements and bug fixes.

Enhancements

The following enhancements have been added in this release:

  • rafay_addon and rafay_workload: Enhanced to add Web YAML and Kustomize support in these resources. You can find the example references in the documentation.

  • Credential Fetching: Enhanced to support fetching credentials like API key from secret management tools like Vault. For example you can configure the provider to fetch credentials from Vault using:

provider "rafay" {
  api_key       = data.vault_kv_secret_v2.rafay.data.api_key
  rest_endpoint = data.vault_kv_secret_v2.rafay.data.endpoint
  project       = data.vault_kv_secret_v2.rafay.data.project
}

For more information, see the direct credentials in provider configuration documentation.


Bug Fixes

Bug ID Component Description
RC-44769 RCTL Cluster Resolved an issue where the RCTL command for listing v3 clusters had issue.
RC-43141 Env Manager Resolved an issue where environment input variables were incorrectly reverted to their previous values
RC-42554 Workloads Resolved an issue where values.yaml in a Workload Template Artifact did not update to the latest version unless the template's Chart version was changed
RC-40894 Backup & Restore Resolved an issue that prevented deletion of the backup agent in certain edge-case scenarios
RC-43576 Zero-Trust Access The relay agent no longer requires deployment of privileged containers
RC-44366 Workloads Fixed an issue where listing v3 workloads incorrectly included non-v3 type workloads
RC-44894 ZTKA Fixed an issue where the relay-agent service would restart when handling a high volume of requests. Note that this fix requires a blueprint sync and a base blueprint version of 3.7 or higher
RC-44915 Upstream K8s Fixed an issue where Day 2 master node addition failed post an upgrade from Kubernetes 1.32 to 1.33.
RC-44692 Upstream K8s Fixed an issue where node addition failed when nodes were in different subnets while using Calico CNI.

Note

To leverage the RCTL fixes, please download the latest RCTL binary.