Skip to content

Naveen Chakrapani

Introducing "Schedules" on the Rafay Platform: Simplifying Cost Optimization and Compliance for Platform Teams

Platform teams today are increasingly tasked with balancing cost efficiency, compliance, and operational agility across complex cloud environments. Actions such as cost-optimization measures and compliance-related tasks are critical, yet executing these tasks consistently and effectively can be challenging.

With the recent introduction of the “Schedules” capability on the Rafay Platform, platform teams can now orchestrate one-time or recurring actions across environments in a standardized, centralized manner. This new feature enables teams to implement cost-saving policies, manage compliance actions, and ensure operational efficiency—all from a single interface. Here’s a closer look at how this feature can streamline your workflows and add value to your platform operations.

Schedules

Enhancing Security and Compliance in Break Glass Workflows with Rafay

Maintaining stringent security and compliance standards is more critical than ever today. Implementing break glass workflows for developers presents unique challenges that require careful consideration to prevent unauthorized access and ensure regulatory compliance.

In the previous blog, we introduced the concept of break glass workflows and why organizations require it. This blog post delves into how Rafay enables Platform teams to orchestrate secure and compliant break glass workflows within their organizations. Watch a video recording of this feature in Rafay.

Declarative configuration for Cluster Overrides

Cluster overrides

By default, K8s objects require certain values be set inside their specs that match the cluster's configuration. If this were to done within the add-on (or workload) manifest, it would require that many duplicate add-ons (or workloads) would need to be created for a fleet of clusters. To mitigate this, the platform supports cluster overrides. These allow the customer to use a single add-on (or workload) org wide and dynamically inject values into a manifest as it is being deployed to the cluster.

Examples include:

  • Use of a different license key for a security tool based on the business unit

  • Configuration of different resource requests for a monitoring tool based on environment type (test or prod)

  • Dynamic configuration of cluster name during deployment of a load balancer (e.g. AWS Load Balancer)

In-place Upgrades to Amazon EKS v1.28 Clusters using Rafay

In our recent release, we added support for in-place upgrades of EKS clusters based on Kubernetes v1.28.

Our customers have shared with us that they would like to provision new EKS clusters using new Kubernetes versions so that they do not have to plan/schedule for Kubernetes upgrades for these clusters right away. As a result, we generally introduce support for new cluster provisioning for the new Kubernetes version first and then follow up with support for zero touch in-place upgrades.

Note

Organizations that wish to perform sophisticated checks for API deprecation etc are strongly recommended to use Rafay's Fleet Operations for Amazon EKS.

Rightsizing exercises with Cost Explorer

As organizations increase their K8s footprint and onboard more applications, it becomes extremely critical to have an unified (cross account, cross cloud) view of resource utilization metrics across clusters. Without this, organizations will be running blind to their K8s cost structure and it will be impossible to operate their infrastructure in a cost effective manner.

A recent release introduced a new integrated capability within the platform referred to as "Cost Explorer". This capability provides organizations with necessary information to effectively undertake "cluster rightsizing" and "application rightsizing" exercises.

Implementing Chargeback/Showback for multi-tenant clusters

As organizations embrace multi-tenancy i.e. share clusters among applications/teams to reduce cluster sprawl and spend, it is imperative that granular resource utilization metrics are collected and aggregated from their clusters. Tracking and reporting costs on a per application/team basis (referred to as chargeback/showback) is essential for a number of reasons including:

  • Billing internal teams/applications (their cost center IDs) based on their consumption
  • Gaining visibility into the cost structure to determine inefficiencies and drive cost optimization exercises
  • Forecasting future spend

Rafay's integrated Cost Management solution makes it extremely simple for customers to standardize collection of metrics in a consistent manner across clusters (cloud, on-premise) and implement chargeback/showback models.

How Platform Teams can enable developers to use their preferred Kubernetes Tools

There are cases where developers may prefer to use tools on their laptops such as Lens Desktop to visualize resources and interact with Kubernetes clusters. The use of a desktop based app such as Lens can be a better user experience for developers over the Kubectl CLI.

In this blog, we will describe how Platform Teams can use Rafay’s Zero-Trust Access service to enable developers to use popular Kubernetes visualization apps to troubleshoot their applications. Watch a video showing how a developer can use Lens Desktop with Rafay's Zero Trust Kubectl Access service to securely and remotely access Kubernetes clusters.

Spinning up cost effective clusters for training sessions

We have been running a number of internal and external (with partners/customers) enablement sessions over the last few weeks to provide "hands-on, labs based training" on some recently introduced capabilities in the Rafay Kubernetes Operations Platform.

Here's what we setup for those enablement sessions:

  • Each attendee was provided with their own Kubernetes cluster
  • We spun up ~25 "ephemeral" Kubernetes clusters on Digital Ocean (for life of the session)
  • We needed the clusters to be provisioned in just a few minutes for the training exercise
  • Each attendee had their own dedicated "Project" in the Rafay Org

A question that we frequently got asked after those enablement sessions was "I would love to run similar sessions with my extended team, how much did it cost to run those clusters?".

Our total spend for ~25 ephemeral clusters on Digital Ocean for these enablement sessions was less than $15. It was no wonder there has been so much interest in this.

We decided that it would help everyone if we shared the automation scripts and the methodology we have been using to provision Digital Ocean clusters and to import them to Rafay's platform here.

Digital Ocean

Multi-tenancy: Best practices for shared Kubernetes clusters

Some of the key questions that platform teams have to think about very early on in their K8s journey are:

  • How many clusters should I have? What is the right number for my organization?
  • Should I set up dedicated or shared clusters for my application teams?
  • What are the governance controls that need to be in place?

The model that customers are increasingly adopting is to standardize on shared clusters as the default and create a dedicated cluster only when certain considerations are met.

graph LR
  A[Request for compute from Application teams] --> B[Evaluate against list of considerations] --> C[Dedicated or shared clusters];

A few example scenarios for which Platform teams often end up setting up dedicated clusters are:

  • Application has low latency requirements (target SLA/SLO is significantly different from others)
  • Application has specific requirements that are unique to it (e.g. GPU worker nodes, CNI plugin)
  • Based on Type of environment - ‘Prod’ has a dedicated clusters and 'Dev', 'Test' environments have shared clusters

With shared clusters (which is the most cost efficient and therefore the default model in most customer environments), there are certain challenges that platform teams have to solve for around security and operational efficiencies.