Skip to content

Kubernetes

GPU Sharing Strategies in Kubernetes

In the previous blogs, we discussed why GPUs are managed differently in Kubernetes and how the GPU Operator can help streamline management. In Kubernetes, although you can request fractional CPU units for workloads, you cannot request fractional GPU units.

Pod manifests must request GPU resources in integers which results in an entire physical GPU allocated to one container even if the container only requires a fraction of the resources. In this blog, we will describe two popular and commonly used strategies to share a GPU on Kubernetes.

GPU Sharing in Kubernetes

Why do we need a GPU Operator for Kubernetes

This is a follow up from the previous blog where we discussed device plugins for GPUs in Kubernetes. We reviewed why the Nvidia device plugin was necessary for GPU support in Kubernetes. A GPU Operator is needed in Kubernetes to automate and simplify the management of GPUs for workloads running on Kubernetes.

In this blog, we will look at how a GPU operator helps automate and streamline operations through the lens of a market leading implementation by Nvidia.

Without and With GPU Operator

Using GPUs in Kubernetes

Unlike CPU and Memory, GPUs are not natively supported in Kubernetes. Kubernetes manages CPU and memory natively. This means it can automatically schedule containers based on these resources, allocates them to Pods, and handles resource isolation and over-subscription.

GPUs are considered specialized hardware and require the use of device plugins to support GPUs in Kubernetes. Device Plugins help make Kubernetes GPU-aware allowing it to Discover, Allocate and Schedule GPUs for containerized workloads. Without a device plugin, Kubernetes is unaware of the GPUs available on the nodes and cannot assign them to Pods. In this blog, we will discuss why GPUs are not natively supported and understand how device plugins help address this gap.

Device Plugin K8s

Enhancing Security and Compliance in Break Glass Workflows with Rafay

Maintaining stringent security and compliance standards is more critical than ever today. Implementing break glass workflows for developers presents unique challenges that require careful consideration to prevent unauthorized access and ensure regulatory compliance.

In the previous blog, we introduced the concept of break glass workflows and why organizations require it. This blog post delves into how Rafay enables Platform teams to orchestrate secure and compliant break glass workflows within their organizations. Watch a video recording of this feature in Rafay.

Why do we need Custom Schedulers for Kubernetes?

The Kubernetes scheduler is the brain that is responsible for assigning pods to nodes based on resource availability, constraints, and affinity/anti-affinity rules. For small to medium-sized clusters running simple stateless applications like web services or APIs, the default Kubernetes scheduler is a great fit. The default Kubernetes scheduler manages resource allocation, ensures even distribution of workloads across nodes, and supports features like node affinity, pod anti-affinity, and automatic rescheduling.

The default scheduler is extremely well-suited for long-running applications like web services, APIs, and microservices. Learn more about the scheduling framework.

Unfortunately, AI/ML workloads have very different requirements that the default scheduler cannot satisfy!

k8s Scheduling Framework

Break Glass Workflows for Developer Access to Kubernetes Clusters - Introduction

In any large-scale, production-grade Kubernetes setup, maintaining the security and integrity of the clusters is critical. However, there are exceptional circumstances—such as production outages or critical bugs—where developers need emergency access to a Kubernetes cluster to resolve issues.

This is where a "Break Glass" process comes into play. It is a controlled procedure that grants temporary, elevated access to developers in critical situations, with the appropriate safeguards in place to minimize risks.

Break Glass

User Access Reports for Kubernetes

Access reviews are required and mandated by regulations such as SOX, HIPAA, GLBA, PCI, NYDFS, and SOC-2. Access reviews are critical to help organizations maintain a strong risk management posture and uphold compliance. These reviews are typically conducted on a periodic basis (e.g. monthly, quarterly or annually) depending on the organization's policies and tolerance to risk.

Providing auditors with periodic access to user access reports for Kubernetes is a critical task for any typical platform team. This becomes onerous and burdensome especially for organizations that operate 10s or 100s of Kubernetes clusters that are used by 100s of app developers and SREs. Doing this via manual processes is impractical.

General Process

In this blog, we will look at why user access reports are critical for organizations and how Rafay's customers implement this with very high levels of automation.

CIS Benchmark for Kubernetes using Rafay

The Center for Internet Security (CIS) benchmark for Kubernetes consists of secure configuration guidelines especially for Kubernetes infrastructure set-up. These benchmarks encapsulate best practice security recommendations for configuring Kubernetes to support a strong security posture. The CIS Kubernetes Benchmark is written for the open source, upstream Kubernetes distribution and intended to be as universally applicable across distributions as possible.

In this blog, we describe how our customers perform CIS benchmark scans of their fleet of Kubernetes clusters using Rafay.

Vector Databases for Generative AI on Kubernetes

Many of our customers use Kubernetes extensively for AI/ML use cases. This is one of the reasons why we have turnkey support for Nvidia GPUs on EKS, AKS, Upstream Kubernetes in on-prem data centers. Recently, we have had several customers look at adding support for Generative AI to their applications. Doing so will require looking at a slightly different technology stack.

Traditional relational databases are adept at handling structured data. They do this by storing data in tables. However, AI use cases are focused on handling unstructured data (e.g. images, audio, and text). Data like this is not well suited for storage and retrieval in a tabular format. This critical technology gap with relational databases has opened the door for a new type of database called a Vector Database that can natively store and process vector embeddings. The rapid rise of large-scale generative AI models has further propelled the demand for vector databases.

In this blog, we will review why vector databases are well suited and critical for AI and Generative AI. We will then look at how you can deploy and operate vector databases on Kubernetes using the Rafay Kubernetes Operations Platform in just one step.