Skip to content

2025

Comparing HPA and KEDA: Choosing the Right Tool for Kubernetes Autoscaling

In Kubernetes, autoscaling is key to ensuring application performance while managing infrastructure costs. Two powerful tools that help achieve this are the Horizontal Pod Autoscaler (HPA) and Kubernetes Event-Driven Autoscaling (KEDA). While they share the goal of scaling workloads, their approaches and capabilities are actually very different and distinct.

In this introductory blog, we will provide a bird's eye view of how they compare, and when you might choose one over the other.

HPA vs KEDA

Support for Parallel Execution with Rafay's Integrated GitOps Pipeline

At Rafay, we are continuously evolving our platform to deliver powerful capabilities that streamline and accelerate the software delivery lifecycle. One such enhancement is the recent update to our GitOps pipeline engine, designed to optimize execution time and flexibility — enabling a better experience for platform teams and developers alike.

Integrated Pipeline for Diverse Use Cases

Rafay provides a tightly integrated pipeline framework that supports a range of common operational use cases, including:

  • System Synchronization: Use Git as the single source of truth to orchestrate controller configurations
  • Application Deployment: Define and automate your app deployment process directly from version-controlled pipelines
  • Approval Workflows: Insert optional approval gates to control when and how specific pipeline stages are triggered, offering an added layer of governance and compliance

This comprehensive design empowers platform teams to standardize delivery patterns while still accommodating organization-specific controls and policies.

From Sequential to Parallel Execution with DAG Support

Historically, Rafay’s GitOps pipeline executed all stages sequentially, regardless of interdependencies. While effective for simpler workflows, this model imposed time constraints for more complex operations.

With our latest update, the pipeline engine now supports Directed Acyclic Graphs (DAGs) — allowing stages to execute in parallel, wherever dependencies allow.

Simplifying Blueprint and Add-on Management with Draft Versions

Managing infrastructure at scale demands both agility and precision—especially when it comes to version control. At Rafay, we have long supported versioning for key configuration objects such as Blueprints and Add-ons, enabling platform teams to roll out changes systematically and maintain operational consistency.

However, as many teams have discovered, managing these versions during testing and validation phases can introduce unnecessary complexity. We are excited to announce a major usability enhancement: Support for Draft Versions.

Why Versioning Matters

Versioning in Rafay’s platform delivers several key advantages:

  • Change Tracking: Keep a historical record of changes made to Blueprints and Add-ons over time
  • Staged Rollouts: Gradually deploy updates across environments and clusters to minimize risk
  • Compliance Assurance: Demonstrate adherence to organizational policies and track Day-2 changes in a controlled way

These capabilities are especially crucial for teams responsible for maintaining secure, production-grade Kubernetes environments

The Challenge: Version Sprawl During Testing

While versioning is powerful, it has traditionally introduced friction during the testing and validation phase. Each time a platform engineer made a minor change to an Add-on or Blueprint, a new version needed to be created—even if the version wasn’t production-ready.

This led to:

  • Version fatigue, with large volumes of partially validated versions cluttering the system
  • Increased manual overhead and inefficiency for platform teams
  • Risk of accidental usage of incomplete configurations in downstream projects

A Fresh New Look: Rafay Console Gets a UI/UX Makeover for Enhanced Usability

At Rafay, we believe that user experience is as critical as the powerful automation capabilities we deliver. With that commitment in mind, we’ve been working for the last few months on a revamp of the Rafay Console User Interface (UI). The changes are purposeful and designed to streamline navigation, increase operational clarity, and elevate your productivity. Whether you’re managing clusters, deploying workloads, or orchestrating environments, the new interface will put everything you need right at your fingertips.

The new UI will launch as part of our v3.5 Release scheduled to rollout end of May 2025. We understand change is hard and it can take a few hours for users to get used to the new experience. Note that existing projects and configurations remain unchanged, and users can continue managing their infrastructure and applications without interruption.

In this blog, we provide a closer look at the most impactful improvements and how they will benefit our users.

Migration

Introduction to User Namespaces in Kubernetes

In Kubernetes, some features arrive quietly, but leave a massive impact. Kubernetes v1.33 is turning out to be one such release where there are some features with massive impact. In the previous blog, my colleague described how you can provision and operate Kubernetes v1.33 clusters on bare metal and VM based environments using Rafay.

In this blog, we will discuss a new feature in v1.33 called User Namespaces. This feature is not a headline grabber such as a service mesh etc, but is a game changer for container security.

Container in a Jail

Kubernetes v1.33 for Rafay MKS

As part of our upcoming May release , alongside other enhancements and features, we are adding support for Kubernetes v1.33 with Rafay MKS (i.e., upstream Kubernetes for bare metal and VM-based environments).

Both new cluster provisioning and in-place upgrades of existing clusters are supported. As with most Kubernetes releases, v1.33 deprecates and removes several features. To ensure zero impact to our customers, we have validated every feature of the Rafay Kubernetes Operations Platform on this Kubernetes version.

Kubernetes v1.33 Release

Powering Multi-Tenant, Serverless AI Inference for Cloud Providers

The AI revolution is here, and Large Language Models (LLMs) are at its forefront. Cloud providers are uniquely positioned to offer powerful AI inference services to their enterprise and retail customers. However, delivering these services in a scalable, multi-tenant, and cost-effective serverless manner presents significant operational challenges.

Rafay enables cloud providers deliver Serverless Inference to 100s of users and enterprises.

Info

Earlier this week, we announced our Multi-Tenant Serverless Inference offering for GPU & Sovereign Cloud Providers. Learn more about this here.

Multi Tenant

Family vs. Lineage: Unpacking Two Often-Confused Ideas in the LLM World

LLMs have begun to resemble sprawling family trees. Folks that are relatively new to LLMs will notice two words appear constantly in technical blogs: "family" and "lineage".

They sound interchangeable and users frequently conflate them. But, they describe different slices of an LLM’s life story.

Important

Understanding the differences is more than trivia. This determines how you pick models, tune them, and keep inference predictable at scale.

LLM Family vs Lineage

Why “Family” Matters in the World of LLMs

When GPU bills run into six digits and every millisecond of latency counts, platform teams learn that vocabulary choices and hidden-unit counts aren’t the only things that separate one model checkpoint from another.

LLMs travel in families—lineages of models that share a common architecture, tokenizer, and training recipe. Think of them the way you might think of Apple’s M-series chips or Toyota’s Prius line: the tuning changes, the size varies, but the underlying design stays stable enough that tools, drivers, and workflows remain interchangeable.

In this blog, we will learn about what we mean by a family for LLMs and why this matters for Inference.

LLM Family

Choosing Your Engine for LLM Inference: The Ultimate vLLM vs. TensorRT LLM Guide

This is the next blog in the series of blogs on LLMs and Generative AI. When deploying large language models (LLMs) for inference, it is critical to consider: efficiency, scalability, and performance. Users will likely be very familiar with two market leading options: vLLM and Nvidia's TensorRT LLM.

In this blog, we dive into their pros and cons, helping users select the most appropriate option for their use case.

vLLM vs TensorRT LLM