In the previous blog, we introduced the concept of Dynamic Resource Allocation (DRA) that just went GA in Kubernetes v1.34 which was released in August 2025.
In this post, we’ll will configure DRA on a Kubernetes 1.34 cluster.
Info
We have optimized the steps for users to experience this on their macOS or Windows laptops in less than 15 minutes. The steps in this blog are optimized for macOS users.
Artificial intelligence (AI) and high-performance computing (HPC) workloads are evolving at unprecedented speed. Enterprises today require infrastructure that can scale elastically, provide consistent performance, and ensure secure multi-tenant operation. NVIDIA’s Performance Reference Architecture (PRA), built on HGX platforms with Shared NVSwitch GPU Passthrough Virtualization, delivers precisely this capability.
This is the introductory blog in a multi part series. In this blog, we explain why PRA is critical for modern enterprises and service providers, highlight the benefits of adoption, and outline the key steps required to successfully deploy and support the PRA design/architecture.
When it came to selecting an immutable operating system for Rafay's Kubernetes Distribution (Rafay MKS), we found ourselves evaluating two strong contenders: Talos and Flatcar Linux. Both offered immutability and a focus on running containers, but in the end, Flatcar Linux won out for our needs. In this blog, we provide a deeper look into why we made that choice, and how the pros and cons stacked up.
Whether you're training deep learning models, running simulations, or just curious about your GPU's performance, nvidia-smi is your go-to command-line tool. Short for NVIDIA System Management Interface, this utility provides essential real-time information about your NVIDIA GPU’s health, workload, and performance.
In this blog, we’ll explore what nvidia-smi is, how to use it, and walk through a real output from a system using an NVIDIA T1000 8GB GPU.
In the previous blog, we reviewed the limitations of Kubernetes GPU scheduling. These often result in:
Resource fragmentation – large portions of GPU memory remain idle and unusable.
Topology blindness – multi-GPU workloads may be scheduled suboptimally.
Cost explosion – teams overprovision GPUs to work around scheduling inefficiencies.
In this post, we’ll look at how a new GA feature in Kubernetes v1.34 — Dynamic Resource Allocation (DRA) — aims to solve these problems and transform GPU scheduling in Kubernetes.
Kubernetes has cemented its position as the de-facto standard for orchestrating containerized workloads in the enterprise. In recent years, its role has expanded beyond web services and batch processing into one of the most demanding domains of all: AI/ML workloads.
Organizations now run everything from lightweight inference services to massive, distributed training pipelines on Kubernetes clusters, relying heavily on GPU-accelerated infrastructure to fuel innovation.
But there’s a problem. In this blog, we will explore why the current model falls short, what a more advanced GPU allocation approach looks like, and how it can unlock efficiency, performance, and cost savings at scale.
In the world of FinOps, precise cost allocation is more than just a “nice to have”, it’s the foundation for accurate chargeback, accountability, and informed decision-making. With Rafay’s latest release, Chargeback Summary Reports aggregated by namespace now support custom label-based metadata enrichment.
This enhancement empowers FinOps teams to add business-relevant metadata (like team or cost_center) directly into their cost reports making it easier to trace expenses to the right owners and justify resource consumption.
In large, multi-tenant Kubernetes environments, namespaces often represent workloads owned by different teams, applications, or business units. Without enriched metadata, a FinOps practitioner might see “Namespace A” incurring costs, but need extra steps to figure out which team or cost center is responsible.
Now, you can define specific label keys (e.g., team, cost_center) in the chargeback report configuration, and Rafay will automatically include them as additional columns in the report—populated with values from the namespace labels. This directly embeds organizational context into your cost visibility.
Note:
This enhancement applies to namespace-based aggregation in chargeback reports (not namespace-label-based aggregation). This is because if a primary label value (e.g., cost_center) is the same across multiple namespaces but secondary label values (e.g., team) differ, the report will not be able to aggregate on primary labels in such cases.
Modern enterprises rarely run applications in a single cluster. A production fleet might include on-prem clusters in Singapore and London, a regulated environment in AWS us-east-1, and a developer sandbox in someone’s laptop. GitOps with Argo CD is the natural way to keep all those clusters in the desired state—but the moment clusters live in different security domains (fire-walled data centers, private VPCs, or even air-gapped networks) the simple argocd cluster add story breaks down:
Bespoke bastion hosts or VPN tunnels for every hop
Long-lived bearer-token Secrets stashed in Argo’s namespace
High latency between the GitOps engine and far-flung clusters, turning reconciliations into a slog
Rafay’s Zero-Trust Kubectl Access (ZTKA) solves all three problems in one stroke. By front-loading the connection with a hardened Kube API Access Proxy—and issuing just-in-time (JIT), short-lived ServiceAccounts inside every cluster.
In this blog, we will describe how Rafay Zero Trust Kubectl Access Proxy gives Argo CD a secure path to every cluster in the fleet, even when those clusters sit deep behind corporate firewalls.
When developers are halfway around the world from their clusters, every kubectl get pods can feel like it’s moving through molasses. Rafay’s Zero-Trust Kubectl (ZTKA) service fixes the security risks and the lag by adding a network of regional proxies between the user and the cluster.
Zero-Trust Kubectl in a Nutshell
Rafay ZTKA routes all CLI and web-terminal traffic through its Kube API Access Proxy. The key design goals are:
Friction-free for users (“vanilla kubectl”),
Zero infrastructure to manage for platform teams,
Centralized RBAC + audit, and “great performance” even for clusters behind firewalls. 
Under the hood, users authenticate to Rafay; Rafay spins up just-in-time service accounts inside the target cluster and tears them down after idle timeouts, eliminating credential sprawl.
Many organizations typically rely on pull-based GitOps tools (e.g. Argo CD) to detect and remediate drift on their Kubernetes clustes. This approach allows clusters to diverge before reconciling them on the next polling interval. For the last 4 years, Rafay customers have benefited from an architecturally different approach that focuses on true drift prevention, backed by robust detection capabilities across both cluster blueprints and application workloads.
Info
In a previous blog, we discussed how ArgoCD's reconcilation works and its best practices.