Amazon EKS Auto Mode simplifies cluster management by automatically handling worker node scaling, patching, and other operational tasks. This guide walks you through migrating your existing EKS clusters to Auto Mode using Rafay.
Cloud providers building GPU or Neo Cloud services face a universal challenge: how to turn resource consumption into revenue with accuracy, automation, and operational efficiency. In our previous blog, we demonstrated how to programmatically retrieve usage data from Rafay’s Usage Metering APIs and generate structured CSVs for downstream processing in an external billing platform.
In this follow-up blog, we take the next step toward a complete billing workflow—automatically transforming usage into billable cost using SKU-specific pricing. With GPU clouds scaling faster than ever and enterprise AI workloads becoming increasingly dynamic, providers must ensure their billing engine is consistent, transparent, and tightly integrated with their platform. The enhancements described in this blog are designed exactly for that.
The Kubernetes community has officially started the countdown to retire Ingress NGINX, one of the most widely used ingress controllers in the ecosystem.
SIG Network and the Security Response Committee have announced that Ingress NGINX will move to best-effort maintenance until March 2026, after which there will be no new releases, no bug fixes, and no security updates. 
At the same time, the broader networking story in Kubernetes is evolving: Gateway API is now positioned as the successor to Ingress. In this blog, we describe why this is happening, when a replacement make sense, and how/when you should migrate.
As the demand for AI training and inference surges, GPU Clouds are increasingly looking to offer higher-level, turnkey AI services—not just raw GPU instances. Some customers may be familiar with Run:AI from Nvidia as a AI workload orchestration and optimization platform. Delivering Run:AI as a scalable, repeatable SKU—something customers can select and provision with a few clicks—requires deep automation, lifecycle management, and tenant isolation capabilities. This is exactly what Rafay provides.
With Rafay, GPU Clouds can deliver Run:AI as a self-service SKU, ensuring customers receive a fully configured Run:AI environment—complete with GPU infrastructure, a Kubernetes cluster, necessary operators, and a ready-to-use Run:AI tenant—all deployed automatically. This blog explains how Rafay enables cloud providers to industrialize Run:AI provisioning into a consistent, production-ready SKU.
With a couple of releases back, we added EKS Auto Mode support in our platform for doing either quick configuration or custom configuration. In this blog, we will explore how you can create an EKS cluster using quick configuration and then dive deep into creating custom node classes and node pools using addons to deploy them on EKS Auto Mode enabled clusters.
In Part-1, we explored how Rafay GPU PaaS empowers developers to use fractional GPUs, allowing multiple workloads to share GPU compute efficiently. This enabled better utilization and cost control — without compromising isolation or performance.
In Part-2, we will show how you can enhance this by provide users the means to select fractional GPU memory. While fractional GPUs provide a share of the GPU’s compute cores, different workloads have dramatically different GPU memory needs. With this update, developers can now choose exactly how much GPU memory they want for their pods — bringing fine-grained control, better scheduling, and cost efficiency.
Enterprises and GPU Cloud providers are rapidly evolving toward a self-service model for developers and data scientists. They want to provide instant access to high-performance compute — especially GPUs — while keeping utilization high and costs under control.
Rafay GPU PaaS enables enterprises and GPU Clouds to achieve exactly that: developers and data scientists can spin up resources such as Developer Pods or Jupyter Notebooks backed by fractional GPUs, directly from an intuitive self-service interface.
This is Part-1 in a multi-part series on end user, self service access to Fractional GPU based AI/ML resources.
This blog is part of our DRA series, continuing from our earlier posts: Introduction to DRA, Enabling DRA with Kind, and MIG with DRA . This post focuses on pre-DRA vs post-DRA GPU management on Rafay upstream Kubernetes clusters.
In the previous blog, we learnt the basics about NIM (NVIDIA Inference Microservices). In this follow-on blog, we will do a deep dive into the NIM Kubernetes Operator, a Kubernetes-native extension that automates the deployment and management of NVIDIA’s NIM containers. By combining the strengths of Kubernetes orchestration with NVIDIA’s optimized inference stack, the NIM Operator makes it dramatically easier to deliver production-grade generative AI at scale.
Generative AI is moving from experiments to production, and the bottleneck is no longer training—it’s serving: getting high-quality model inference running reliably, efficiently, and securely across clouds, data centers, and the edge.
NVIDIA’s answer is NIM (NVIDIA Inference Microservices). NIM a set of prebuilt, performance-tuned containers that expose industry-standard APIs for popular model families (LLMs, vision, speech) and run anywhere there’s an NVIDIA GPU. Think of NIM as a “batteries-included” model-serving layer that blends TensorRT-LLM optimizations, Triton runtimes, security hardening, and OpenAI-compatible APIs into one deployable unit.