Product Blog¶

June 23, 2025
in Product Blog, Approvals, Compliance
3 min read

Enforcing ServiceNow-Based Approvals with Rafay

Enterprises often require explicit approvals before critical actions can proceed especially when provisioning infrastructure or making configuration changes. With Rafay’s out-of-the-box (OOB) workflow handlers, customers can easily integrate with popular ITSM systems such as ServiceNow (SNOW).

Catalog

This post explains how to configure and use Rafay’s ServiceNow Workflow Handler to enforce approval gates.

Workflow Handlers in Rafay

Rafay enables platform teams to attach Workflow Handlers to key actions as pre-hooks or post-hooks:

Pre-hook Handlers: Triggered before an action (e.g., pause provisioning until approval is received)
Post-hook Handlers: Triggered after an action (e.g., notify stakeholders after infrastructure (environment) creation)

Typical Scenarios

Here are a few use cases where ServiceNow-based approvals come into play:

Developers request a vCluster to test their app before raising a PR
Platform admins initiate a Kubernetes upgrade for a fleet of clusters that requires approval

June 21, 2025
in Product Blog, Amazon EKS, Graviton, Cost Savings
5 min read

Slash EKS Cluster Costs by 20-30% Instantly with AWS Graviton

If you’re running Kubernetes workloads on Amazon EKS backed by Intel-based instances, you’re leaving significant savings on the table. In this blog, we will look at how many Rafay customers have been able to immediately cut compute costs by ~20-30% with minimal effort and quickly comply with internal cost saving mandates.

Graviton Ready

June 20, 2025
in Product Blog, Slinky, SLURM, Kubernetes, Self Service
4 min read

Self-Service Slurm Clusters on Kubernetes with Rafay GPU PaaS

In the previous blog, we discussed how Project Slinky bridges the gap between Slurm, the de facto job scheduler in HPC, and Kubernetes, the standard for modern container orchestration.

Project Slinky and Rafay’s GPU Platform-as-a-Service (PaaS) combined provide enterprises and cloud providers with a transformative combination that enables secure, multi-tenant, self-service access to Slurm-based HPC environments on shared Kubernetes clusters. Together, they allow cloud providers and enterprise platform teams to offer Slurm-as-a-Service on Kubernetes—without compromising on performance, usability, or control.

Design

June 19, 2025
in Product Blog, Slinky, SLURM, Kubernetes
3 min read

Project Slinky: Bringing Slurm Scheduling to Kubernetes

As high-performance computing (HPC) environments evolve, there’s an increasing demand to bridge the gap between traditional HPC job schedulers and modern cloud-native infrastructure. Project Slinky is an open-source project that integrates Slurm, the industry-standard workload manager for HPC, with Kubernetes, the de facto orchestration platform for containers.

This enables organizations to deploy and operate Slurm-based workloads on Kubernetes clusters allowing them to leverage the best of both worlds: Slurm’s mature, job-centric HPC scheduling model and Kubernetes’s scalable, cloud-native runtime environment.

Project Slinky

June 18, 2025
in Product Blog, Cilium, MetalLB, Kubernetes, Load Balancer
3 min read

Get Started with Cilium as a Load Balancer for On-Premises Kubernetes Clusters

Organizations deploying Kubernetes in on-premises data centers or hybrid cloud environments often face challenges with exposing services externally. Unlike public cloud providers that offer managed load balancers out of the box, bare metal environments require custom solutions. This is where Cilium steps in as a powerful alternative, offering native load balancing capabilities using BGP (Border Gateway Protocol).

Cilium is more than just a CNI plugin. It enables advanced networking features, such as observability, security, and load balancing—all integrated deeply with the Kubernetes networking model. Specifically, Cilium can advertise Kubernetes LoadBalancer service IPs to external routers using BGP, making these services reachable directly from external networks without needing to rely on cloud-native load balancers or manual proxy setups. This is ideal for enterprises running bare metal Kubernetes clusters, air-gapped environments, or hybrid cloud setups.

Want to dive deeper? Check out our introductory blog on Cilium’s Kubernetes load balancing capabilities. Navigate to the detailed step-by-step instructions for additional information.

June 17, 2025
in Product Blog, Cilium, MetalLB, Kubernetes, Load Balancer
4 min read

Using Cilium as a Kubernetes Load Balancer: A Powerful Alternative to MetalLB

In Kubernetes, exposing services of type LoadBalancer in on-prem or bare-metal environments typically requires a dedicated "Layer 2" or "BGP-based" software load balancer—such as MetalLB. While MetalLB has been the go-to solution for this use case, recent advances in Cilium, a powerful eBPF-based Kubernetes networking stack, offer a modern and more integrated alternative.

Cilium isn’t just a fast, scalable Container Network Interface (CNI). It also includes cilium-lb, a built-in eBPF-powered load balancer that can replace MetalLB with a more performant, secure, and cloud-native approach.

June 11, 2025
in Product Blog, Amazon SageMaker AI, Cost Management
4 min read

Cost Management for SageMaker AI: The Case for Strong Administrative Guardrails

Enterprises are increasingly leveraging Amazon SageMaker AI to empower their data science teams with scalable, managed machine learning (ML) infrastructure. However, without proper administrative controls, SageMaker AI usage can lead to unexpected cost overruns and significant waste.

In large organizations where dozens or hundreds of data scientists may be experimenting concurrently, this risk compounds quickly.

June 10, 2025
in Product Blog, BioContainer, Docker, Bioinformatics, Get Started
5 min read

Get Started with BioContainers using Rafay

In this step-by-step guide, the Bioinformatics data scientist will use Rafay's end user portal to launch a well resourced remote VM and run a series of BioContainers with Docker.

June 6, 2025
in Product Blog, BioContainer, Docker, Bioinformatics, Singularity
3 min read

BioContainers: Streamlining Bioinformatics with the Power of Portability

In today's fast-paced world of bioinformatics, the constant evolution of tools, dependencies, and operating system environments presents a significant challenge. Researchers often spend countless hours grappling with software installation, configuration, and version conflicts, hindering their ability to focus on scientific discovery. Enter biocontainers – a revolutionary approach that leverages containerization technology to package bioinformatics software and its entire environment into self-contained, portable units.

Imagine a meticulously organized lab where every experiment, regardless of its complexity, can be instantly replicated with identical results.

This is the promise of biocontainers. Built upon established container platforms like Docker and Singularity, biocontainers encapsulate everything a bioinformatics tool needs to run: the application itself, its libraries, dependencies, and even specific operating system configurations.

May 29, 2025
in Product Blog, Inventory Management
3 min read

Why Inventory Management is Table Stakes for GPU Clouds

In the world of GPU clouds, where speed, scalability, and efficiency are paramount, it’s surprising how many “Neo cloud” providers still manage their infrastructure the old-fashioned way—through spreadsheets.

As laughable as it sounds, this is the harsh reality. Inventory management, one of the most foundational aspects of a reliable cloud platform, is often overlooked or under built. And for modern GPU clouds, that’s a deal breaker.

Inventory Management