Skip to content

Mohan Atreya

A Fresh New Look: Rafay Console Gets a UI/UX Makeover for Enhanced Usability

At Rafay, we believe that user experience is as critical as the powerful automation capabilities we deliver. With that commitment in mind, we’ve been working for the last few months on a revamp of the Rafay Console User Interface (UI). The changes are purposeful and designed to streamline navigation, increase operational clarity, and elevate your productivity. Whether you’re managing clusters, deploying workloads, or orchestrating environments, the new interface will put everything you need right at your fingertips.

The new UI will launch as part of our v3.5 Release scheduled to rollout end of May 2025. We understand change is hard and it can take a few hours for users to get used to the new experience. Note that existing projects and configurations remain unchanged, and users can continue managing their infrastructure and applications without interruption.

In this blog, we provide a closer look at the most impactful improvements and how they will benefit our users.

Migration

Introduction to User Namespaces in Kubernetes

In Kubernetes, some features arrive quietly, but leave a massive impact. Kubernetes v1.33 is turning out to be one such release where there are some features with massive impact. In the previous blog, my colleague described how you can provision and operate Kubernetes v1.33 clusters on bare metal and VM based environments using Rafay.

In this blog, we will discuss a new feature in v1.33 called User Namespaces. This feature is not a headline grabber such as a service mesh etc, but is a game changer for container security.

Container in a Jail

Powering Multi-Tenant, Serverless AI Inference for Cloud Providers

The AI revolution is here, and Large Language Models (LLMs) are at its forefront. Cloud providers are uniquely positioned to offer powerful AI inference services to their enterprise and retail customers. However, delivering these services in a scalable, multi-tenant, and cost-effective serverless manner presents significant operational challenges.

Rafay enables cloud providers deliver Serverless Inference to 100s of users and enterprises.

Info

Earlier this week, we announced our Multi-Tenant Serverless Inference offering for GPU & Sovereign Cloud Providers. Learn more about this here.

Multi Tenant

Family vs. Lineage: Unpacking Two Often-Confused Ideas in the LLM World

LLMs have begun to resemble sprawling family trees. Folks that are relatively new to LLMs will notice two words appear constantly in technical blogs: "family" and "lineage".

They sound interchangeable and users frequently conflate them. But, they describe different slices of an LLM’s life story.

Important

Understanding the differences is more than trivia. This determines how you pick models, tune them, and keep inference predictable at scale.

LLM Family vs Lineage

Why “Family” Matters in the World of LLMs

When GPU bills run into six digits and every millisecond of latency counts, platform teams learn that vocabulary choices and hidden-unit counts aren’t the only things that separate one model checkpoint from another.

LLMs travel in families—lineages of models that share a common architecture, tokenizer, and training recipe. Think of them the way you might think of Apple’s M-series chips or Toyota’s Prius line: the tuning changes, the size varies, but the underlying design stays stable enough that tools, drivers, and workflows remain interchangeable.

In this blog, we will learn about what we mean by a family for LLMs and why this matters for Inference.

LLM Family

Choosing Your Engine for LLM Inference: The Ultimate vLLM vs. TensorRT LLM Guide

This is the next blog in the series of blogs on LLMs and Generative AI. When deploying large language models (LLMs) for inference, it is critical to consider: efficiency, scalability, and performance. Users will likely be very familiar with two market leading options: vLLM and Nvidia's TensorRT LLM.

In this blog, we dive into their pros and cons, helping users select the most appropriate option for their use case.

vLLM vs TensorRT LLM

Demystifying Quantization: Why It Matters for LLMs and Inference Efficiency

As Large Language Models (LLMs) like GPT, LLaMA, and DeepSeek reach hundreds of billions of parameters, the demand for high-speed, low-cost inference has skyrocketed. Quantization is a technique that helps drastically reduces model size and computational requirements by using lower-precision numbers. In this blog, we will discuss quantization and why it is essential.

Quantization

Compiling a LLM for High Performance Inference

This is the next blog in the blog series on LLMs and Inference. In the previous blog on LLMs and Inference, we discussed about the safetensors format for LLMs. In this blog, we will walk through a critical step for LLM Inference.

Compiling a Large Language Model (LLM) generally refers to optimizing the model’s computational graph and kernel execution to improve inference or training performance on specific hardware (like GPUs or TPUs). Think of this as the next logical step that is performed after loading a model.

LLM Compilation

End-User Self-Service for Automated User Profile Creation in SageMaker Domains

As organizations expand their use of Amazon SageMaker to empower data scientists and machine learning (ML) engineers, managing access to development environments becomes a critical concern. In the last blog, we discussed how SageMaker Domains can provide isolated, secure, and fully-featured environments for users.

However, manually creating user profiles for every user quickly becomes a bottleneck—especially in large or fast-growing organizations. Asking users to submit an IT ticket and wait for days before it can be fulfilled is unacceptable in today's fast paced environment.

In this blog, we will describe how organizations use Rafay's GPU PaaS to provide their users with a self-service experience to onboard themselves into SageMaker Domains without waiting on IT or platform teams. This not only improves efficiency and user experience but also ensures consistency and compliance across the organization.

SageMaker AI Self Service