Skip to content

Oct

v1.1.52 - Terraform Provider

17 Oct, 2025

This update of the Terraform provider includes the following improvements and bug fixes.

Enhancements

The following resources have been updated with additional fields and enhanced functionality:

1. rafay_mks_cluster - Platform Version Update

Benefit

Enhanced cluster management capabilities with integrated core components and automated utilities.

The platform_version field has been updated to include platform version v1.1.0, which incorporates cluster core components including: - cluster-utils - Utilities for self-healing, certificate rotation, and monitoring - Orchestration proxy - Proxies communication between the control agent and Rafay Controller

Core components are now version controlled and their upgrade lifecycle is managed through this platform version flow.

For more information, see Platform Version v1.1.0.

2. rafay_import_cluster - Proxy Configuration Support

Benefit

Enables cluster import through proxy environments, improving connectivity options for enterprise deployments.

Added support for proxy configuration through a new proxy_config field, allowing users to leverage their proxy infrastructure when importing clusters to the Rafay platform.


v3.7 - SaaS

15 Oct, 2025


Amazon EKS & Azure AKS

Kubernetes v1.33 Support

Benefit

Stay current with the latest Kubernetes release, unlocking new features and ongoing support.

Support for Kubernetes v1.33 is being added for Amazon EKS and Azure AKS. This includes:

  • Provisioning of new clusters with v1.33
  • Upgrading existing clusters from earlier versions to v1.33

AKS 1.33 Cluster Provisioning

AKS 1.33 Cluster Provisioning

AKS 1.33 Cluster Upgrade

AKS 1.33 Cluster Upgrade

EKS 1.33 Cluster Provisioning

EKS 1.33 Cluster Provisioning

AmazonLinux2 Deprecation in EKS 1.33

AWS has deprecated AmazonLinux2 in EKS 1.33. By default, when you create a new cluster with v1.33, the default node AMI family is set to AmazonLinux2023.

For more information, see the AWS documentation on Amazon Linux 2 AMI deprecation.

For existing clusters using AmazonLinux2:

Before upgrading to EKS 1.33, you must:

  1. Add a new node group based on AmazonLinux2023
  2. Test and validate the new node group with your application pods
  3. Migrate pods from the old node group to the new node group
  4. Delete the old node group after migration is complete
  5. Then proceed with the EKS 1.33 upgrade

Important: If any existing EKS cluster with AmazonLinux2 is upgraded to EKS 1.33 directly without adding node groups based on AmazonLinux2023, the node group upgrade will fail with the following error:

internal error: failed to get aws session, InvalidParameterException: AMI Type AL2_x86_64 is only supported for kubernetes versions 1.32 or earlier { RespMetadata: { StatusCode: 400, RequestID: "462022b3-da9b-4d90-a2b9-2d12b9586de3" }, Message_: "AMI Type AL2_x86_64 is only supported for kubernetes versions 1.32 or earlier" }

Upstream Kubernetes for Bare Metal and VMs

Enhanced Debug Capabilities

Benefit

Direct debug log download functionality has been added for upstream clusters, providing enhanced debugging capabilities. Users can now directly download logs from nodes to troubleshoot issues more effectively.

Previously users had to ssh to the nodes to see these logs, now this option is available on the every node card with an option to view/download debug logs.

Debug Logs Option

Debug Logs download

For more information about debug logs, see Debug Logs.

RHEL 10 Support

Benefit

Support has been added for Red Hat Enterprise Linux (RHEL) 10 operating system. This allows customers to leverage RHEL 10 based nodes for Rafay MKS clusters.

RHEL 10

ARM Support Enhancement

Benefit

Complete ARM architecture support has been added for Ubuntu 24.04 LTS and Ubuntu 22.04 LTS on both master and worker nodes. This enables customers to leverage ARM-based infrastructure for their Kubernetes workloads.

Previously only worker node support was added and only minimal blueprint support was available.

Platform Version v1.1.0

Benefit

New platform version v1.1.0 provides enhanced cluster management capabilities with integrated core components and automated utilities.

The new platform version v1.1.0 has been added, which includes integration of the Orchestration proxy core component (which proxies communication between the control agent on the cluster and the Rafay Controller) and cluster utils version controller component. The cluster utils component includes utilities pushed to the nodes for self-healing, certificate rotation, and monitoring.

This will help us to push changes to the cluster utils via the platform version upgrade, enabling seamless updates and improved cluster maintenance capabilities. The cluster utils will also have version control.

v1.1.0

For more information about platform versioning, see Platform Versioning.


RCTL Enhancement

Force Delete

Benefit

A new force delete option has been added to RCTL to allow force delete the cluster object if delete failed for any reason.

A new force delete option has been added to RCTL to allow force delete the cluster object if delete failed for any reason.

Usage:

./rctl delete cluster <cluster-name> --force

Note

You need to download the new RCTL binary to consume this flag.


Add-ons, Workloads & Blueprints

Cluster Overrides

Benefit

Improves usability and provides more flexibility for administrators.

Two key UI enhancements:

  • Custom Input for Managed Add-ons – Resource selector previously only supported drop downs for custom add-ons. Now, a free-form Custom Input option is available, enabling overrides for managed add-ons directly from the UI.

Cluster Overrides

  • Enhanced Placement UI – Placement configuration has been redesigned for clarity, especially in scenarios where admins create overrides for clusters across projects (e.g., using labels).

Cluster Overrides

Artifact Files

Benefit

Improves reliability of add-on deployments by automatically retrying artifact fetches before failing.

When artifact fetches (from Git or Helm repos) fail due to transient network issues, the system now retries automatically. This avoids the need to create new add-on versions solely for re-attempting artifact pulls.

Retries Overrides


Environment Manager

Reconciliation on Environment Runs

Benefit

Improves operational efficiency and reduces deployment times.

Previously, redeploying an environment always redeployed all resources, even if unchanged. With selective resource reconciliation, admins can now specify the reconcile_resources field to redeploy only targeted resources.

Example: In a failover where only DNS updates are required, admins can reconcile the DNS resource without redeploying the entire environment. If no resources are specified, the system redeploys all resources by default.

Environment Deletion and Failure States

Benefit

Provides clearer lifecycle management and better visibility into environment operations.

Environments now support three actions: Deploy, Destroy, and Delete (which performs a Destroy followed by removal of the environment object).

Enhancements include:

  • New distinct status states
  • Ability to filter environments by Active, Inactive, Deploy Failed, and Delete Failed in both the Environments list and Dashboard
  • For environments in Delete Failed status, admins can review logs, clean up resources, and use a new remove object action to manually delete the environment

Namespace

Ephemeral Storage Resource Quota Limits

Benefit

Improved resource management and cost control through enforcement of ephemeral storage quotas at the namespace level.

A previous release introduced ephemeral storage limits as namespace quotas via non-UI interfaces. This has now been extended to the UI, making configuration easier.

Storage Quotas


Cost Management

Chargeback Reports

Benefit

Enables metadata-enriched chargeback reports for better visibility and more accurate cost allocation across tenants.

Earlier release introduced Chargeback summary reports aggregated by namespace. These now support custom label-based metadata enrichment, enabling more precise chargeback reporting for multi-tenant clusters. This capability is also available through the UI.

Chargeback


Agents

PV Requirement

Benefit

Agents no longer require a Persistent Volume (PV), simplifying deployment and reducing infrastructure overhead.

In previous releases, agents required a PV. Starting with agents v3.7+, this requirement has been removed, allowing for more flexible and streamlined deployments.


OCI Helm Repository

Chart Version Support Enhancement

Benefit

Enhanced compatibility with OCI Helm repositories by supporting chart versions containing special characters like "_" or "+" ensuring seamless integration with these versioning schemes.

An enhancement has been added to the OCI Helm repository functionality to handle chart versions that include "_" (underscore) or "+" (plus) characters in the version scheme. This ensures that such versions work seamlessly with the platform's Helm repository integration.

Previously, chart versions containing these special characters was causing issues during chart deployment. With this enhancement, the platform now properly handles and processes these version formats without any compatibility issues.


Template Catalog Update

Benefit

These templates will be selectively available to Orgs template catalog, providing organizations with access to specialized AI/ML and data processing templates.

The following templates will be selectively available to Orgs template catalog:

  • vLLM Inference on K8s
  • vCluster with KubeRay
  • KubeRay
  • Kubeflow on Private Cloud
  • Kubeflow Profile Management
  • Kubeflow on GCP
  • Jupyter Notebook

Bug Fixes

Bug ID Component Description
RC-43141 Env Manager Resolved an issue where environment input variables were incorrectly reverted to their previous values
RC-42554 Workloads Resolved an issue where values.yaml in a Workload Template Artifact did not update to the latest version unless the template’s Chart version was changed
RC-40894 Backup & Restore Resolved an issue that prevented deletion of the backup agent in certain edge-case scenarios
RC-43576 Zero-Trust Access The relay agent no longer requires deployment of privileged containers

Known issues

Bug ID Component Description
RC-44765 Agent Issue occurs when an agent pool is specified in the environment configuration