Skip to content

Mohan Atreya

GPU Sharing Strategies in Kubernetes

In the previous blogs, we discussed why GPUs are managed differently in Kubernetes and how the GPU Operator can help streamline management. In Kubernetes, although you can request fractional CPU units for workloads, you cannot request fractional GPU units.

Pod manifests must request GPU resources in integers which results in an entire physical GPU allocated to one container even if the container only requires a fraction of the resources. In this blog, we will describe two popular and commonly used strategies to share a GPU on Kubernetes.

Why do we need a GPU Operator for Kubernetes

This is a follow up from the previous blog where we discussed device plugins for GPUs in Kubernetes. We reviewed why the Nvidia device plugin was necessary for GPU support in Kubernetes. A GPU Operator is needed in Kubernetes to automate and simplify the management of GPUs for workloads running on Kubernetes.

In this blog, we will look at how a GPU operator helps automate and streamline operations through the lens of a market leading implementation by Nvidia.

Without and With GPU Operator

Using GPUs in Kubernetes

Unlike CPU and Memory, GPUs are not natively supported in Kubernetes. Kubernetes manages CPU and memory natively. This means it can automatically schedule containers based on these resources, allocates them to Pods, and handles resource isolation and over-subscription.

GPUs are considered specialized hardware and require the use of device plugins to support GPUs in Kubernetes. Device Plugins help make Kubernetes GPU-aware allowing it to Discover, Allocate and Schedule GPUs for containerized workloads. Without a device plugin, Kubernetes is unaware of the GPUs available on the nodes and cannot assign them to Pods. In this blog, we will discuss why GPUs are not natively supported and understand how device plugins help address this gap.

Device Plugin K8s

Rafay Newsletter-September 2024

Welcome to the September 2024 edition of the Rafay customer newsletter. This month, we’re excited to bring you the latest product enhancements and insightful content crafted to help you make the most of your AI/ML, Kubernetes, and cloud-native operations.

Every month, we push out a number of incremental updates to our product documentation, new functionality, our YouTube channel, tech blogs etc. Our users tell us that it will be great if we summarized all the updates for the month in the form of a newsletter that they can read or listen to in 10 minutes.

Newsletter Sep 2024

Why do we need Custom Schedulers for Kubernetes?

The Kubernetes scheduler is the brain that is responsible for assigning pods to nodes based on resource availability, constraints, and affinity/anti-affinity rules. For small to medium-sized clusters running simple stateless applications like web services or APIs, the default Kubernetes scheduler is a great fit. The default Kubernetes scheduler manages resource allocation, ensures even distribution of workloads across nodes, and supports features like node affinity, pod anti-affinity, and automatic rescheduling.

The default scheduler is extremely well-suited for long-running applications like web services, APIs, and microservices. Learn more about the scheduling framework.

Unfortunately, AI/ML workloads have very different requirements that the default scheduler cannot satisfy!

k8s Scheduling Framework

Break Glass Workflows for Developer Access to Kubernetes Clusters - Introduction

In any large-scale, production-grade Kubernetes setup, maintaining the security and integrity of the clusters is critical. However, there are exceptional circumstances—such as production outages or critical bugs—where developers need emergency access to a Kubernetes cluster to resolve issues.

This is where a "Break Glass" process comes into play. It is a controlled procedure that grants temporary, elevated access to developers in critical situations, with the appropriate safeguards in place to minimize risks.

Break Glass

Pod Identity versus IRSA for Amazon EKS - Part 1

When managing containerized applications on Amazon Elastic Kubernetes Service (EKS), a critical concern is securely granting permissions to your applications so that they can securely access AWS resources. Traditionally, AWS has provided mechanisms like IAM Roles for Service Accounts (IRSA) to enable fine-grained permissions management within EKS clusters. However, EKS Pod Identity, a newer feature, offers a more refined and efficient solution.

In this blog, we’ll explore how EKS Pod Identity differs from IRSA, and why it represents a significant improvement for identity management in Amazon EKS based environments. Let's assume our EKS cluster resident application needs to securely access data in an AWS s3 bucket.

App Accessing AWS S3

Bringing DevOps and Automation to Machine Learning via MLOps

The vast majority of organizations are new to AI/ML. As a result, most in-house systems and processes supporting this is likely ad-hoc. Industry analysts like Gartner forecast that organizations will need to quickly transition from Pilots to Production with AI/ML in order to make it across the chasm.

Most organizations already have reasonably mature DevOps processes and systems in place. So, going mainstream with AI should be a walk in the park. Correct? Turns out that this is not really true “IT leaders responsible for AI are discovering the AI pilot paradox, where launching pilots is deceptively easy but deploying them into production is notoriously challenging.” by Chirag Dekate, Gartner

In this blog, we will try and answer the following question:

Why do we need a new process called MLOps when most organizations already have reasonably mature DevOps practices? How is MLOps different from DevOps?

DevOps vs MLOps