Skip to content

Ankur Pandita

Managing Environments at Scale with Fleet Plans

As organizations scale their cloud infrastructure, managing dozens or even hundreds of environments becomes increasingly complex. Whether you are rolling out security patches, updating configuration variables, or deploying new template versions, performing these operations manually on each environment is time-consuming, error-prone, and simply unsustainable.

Fleet Plans solve this challenge—a powerful feature that eliminates the need to manage environments individually by enabling bulk operations across multiple environments in parallel.

Fleet Plans General Flow

Fleet Plans provide a streamlined workflow for managing multiple environments at scale, enabling bulk operations with precision and control.

Note: Fleet Plans currently support day 2 operations only, focusing on managing and updating existing environments rather than initial provisioning.

Granular Control of Your EKS Auto Mode Managed Nodes with Custom Node Classes and Node Pools

With a couple of releases back, we added EKS Auto Mode support in our platform for doing either quick configuration or custom configuration. In this blog, we will explore how you can create an EKS cluster using quick configuration and then dive deep into creating custom node classes and node pools using addons to deploy them on EKS Auto Mode enabled clusters.

Dynamic Resource Allocation for GPU Allocation on Rafay's MKS (Kubernetes 1.34)

This blog demonstrates how to leverage Dynamic Resource Allocation (DRA) for efficient GPU allocation using Multi-Instance GPU (MIG) strategy on Rafay's Managed Kubernetes Service (MKS) running Kubernetes 1.34.

In our previous blog series, we covered various aspects of Dynamic Resource Allocation (DRA) in Kubernetes:

DRA is GA in Kubernetes 1.34

With Kubernetes 1.34, Dynamic Resource Allocation (DRA) is Generally Available (GA) and enabled by default on MKS clusters. This means you can immediately start using DRA features without additional configuration.

Prerequisites

Before we begin, ensure you have:

  • A Rafay MKS cluster running Kubernetes 1.34 (see MKS v1.34 Blog)
  • GPU nodes with compatible NVIDIA GPUs (A100, H100, or similar MIG-capable GPUs)
  • Container Device Interface (CDI) enabled (automatically enabled in MKS for Kubernetes 1.34)
  • Basic understanding of Dynamic Resource Allocation concepts (covered in our previous blog series)
  • Active Rafay account with appropriate permissions to manage MKS clusters and addons

Kubernetes v1.34 for Rafay MKS

As part of our continuous effort to bring the latest Kubernetes versions to our users, support for Kubernetes v1.34 will be added soon to the Rafay Operations Platform for MKS cluster types.

Both new cluster provisioning and in-place upgrades of existing clusters are supported. As with most Kubernetes releases, this version also deprecates and removes a number of features. To ensure there is zero impact to our customers, we have made sure that every feature in the Rafay Kubernetes Operations Platform has been validated on this Kubernetes version. This will be promoted from Preview to Production in a few days and will be made available to all customers.

Kubernetes v1.34 Release

Upstream Kubernetes on RHEL 10 using Rafay

Our upcoming release update will add support for a number of new features and enhancements. This blog is focused on the upcoming support for Upstream Kubernetes on nodes based on Red Hat Enterprise Linux (RHEL) v10.0. Both new cluster provisioning and in-place upgrades of Kubernetes clusters will be supported for lifecycle management.

RHEL 9.2

Introducing Platform Version with Rafay MKS clusters.

Our upcoming release introduces support for a number of new features and enhancements. One such enhancement is the introduction of Platform Versioning for Rafay MKS clusters a major feature in our v3.5 release. This new capability is designed to simplify and standardize the upgrade lifecycle of critical components in upstream Kubernetes clusters managed by Rafay MKS.

Why Platform Version?

Upgrading Kubernetes clusters is essential, but the core components—such as etcd, CRI, and Salt Minion also require updates for:

  • Security patches
  • Compatibility with new Kubernetes features
  • Performance improvements

Platform Versioning introduces a structured, reliable, and repeatable upgrade path for these foundational components, reducing risk and operational overhead.

What is a Platform Version?

A Platform Version defines a tested and validated set of component versions that can be safely upgraded together. This ensures compatibility and stability across your clusters.

We are introducing v1.0.0 as the very first Platform Version for new clusters. This version includes:

  • CRI: v2.0.4
  • etcd: v3.5.21
  • Salt Minion: v3006.9

Note

For existing clusters, the initial platform version will be shown as v0.1.0, which is assigned for reference purposes to older clusters that were created before platform versioning was introduced. Please perform the upgrade to v1.0.0 during scheduled downtime, as it involves updates to core components such as etcd and CRI.

How Does Platform Versioning Work?

You can upgrade the Platform Version in two ways:

  • During a Kubernetes version upgrade
  • As a standalone platform upgrade

This flexibility allows you to keep your clusters secure and up to date, regardless of your Kubernetes upgrade schedule.

Platform Version

Controlled and Responsive Update Cadence

Platform Versions are not released frequently. New versions are published only when:

  • A high severity CVE or vulnerability is addressed
  • A major performance or compatibility feature is introduced
  • There are significant version changes in core components

This approach ensures that upgrades are meaningful and necessary, minimizing disruption.

Whenever a new Platform Version is released, existing clusters can seamlessly upgrade to the latest version, ensuring they benefit from the latest security patches and improvements without manual intervention.

Evolving Platform Versions and Expanding Coverage

We are committed to continuously improving Platform Versioning. In future releases, we will introduce new platform versions to to expand the scope of Platform Versioning by including more critical components as part of the platform version. For this initial release, we have started with three foundational components etcd, CRI, and Salt Minion because of their critical importance to cluster stability. Over time, we will enhance Platform Versioning to cover additional components, ensuring your clusters remain robust, secure, and up to date.

Platform Version Documentation

For detailed documentation, see: Platform Version Docs

In Summary

Platform Versioning makes it easier than ever to keep your clusters current and secure by managing the upgrade lifecycle of foundational components like etcd, CRI, and Salt Minion.

Whether you apply it alongside a Kubernetes version bump or independently, Platform Versioning ensures your infrastructure remains stable, secure, and optimized now and in the future.

Kubernetes v1.33 for Rafay MKS

As part of our upcoming May release , alongside other enhancements and features, we are adding support for Kubernetes v1.33 with Rafay MKS (i.e., upstream Kubernetes for bare metal and VM-based environments).

Both new cluster provisioning and in-place upgrades of existing clusters are supported. As with most Kubernetes releases, v1.33 deprecates and removes several features. To ensure zero impact to our customers, we have validated every feature of the Rafay Kubernetes Operations Platform on this Kubernetes version.

Kubernetes v1.33 Release

Using Hubble and Cilium with Rafay MKS based Kubernetes Cluster for Data Centers

In our first blog about Hubble for Cilium, we reviewed a real life example highlighting where traditional monitoring tools fall short. We then looked at how Hubble + Cilium can address these gaps. In the second blog, we discussed how Rafay provides our customers with a a tight, turnkey integration with Cilium for various cluster types (i.e. Rafay MKS for Data Centers and Public Cloud Distributions such as Amazon EKS).

In this get started guide, we will review how a platform engineer can configure, deploy and use Hubble for Cilium on a Rafay MKS Kubernetes cluster operating in a data center (aka on-premises environment). The three high level steps are:

  1. Provision an Upstream Kubernetes Cluster in your data center using Rafay MKS
  2. Configure and Deploy Cilium CNI as a software add-on in a Cluster Blueprint (i.e. Bring Your Own CNI)
  3. Use Hubble to observe network flows

Hubble Intro

Upstream Kubernetes on Flatcar Linux using Rafay

This blog is Part 3 of our series on Flatcar Linux and Kubernetes

  • In Part 1, we introduced Flatcar Linux and why it is a great fit for Kubernetes.
  • In Part 2, we covered how to install a Flatcar instance locally.
  • In this Part 3, we focus on deploying and managing Upstream Kubernetes on Flatcar Linux using Rafay MKS.

Our upcoming February release will introduce a number of new features and enhancements.We will write about these in separate blogs. This blog is focused on support for Upstream Kubernetes based on Rafay MKS on nodes running Flatcar Linux. The Rafay platform enables users to seamlessly provision new clusters and perform in-place upgrades of Kubernetes clusters, simplifying lifecycle management.

For more details on Flatcar Linux, visit the official Flatcar Linux website.

Flatcar Logo


Provision Cluster

Rafay MKS based Upstream Kubernetes clusters can be configured and provisioned on Flatcar Linux using all the supported interfaces i.e.

  • Web Console
  • API
  • CLI (declarative spec)
  • GitOps
  • Rafay Terraform/OpenTofu Provider

In this blog, we will demonstrate this using the web console and the Rafay RCTL CLI.