Capabilities of the Bare Metal GPU Service¶
The Bare Metal GPU Service provides a way to consume powerful, pre-configured physical machines that are optimized for advanced AI/ML workloads. These nodes offer full access to GPUs, CPUs, memory, storage, and networking resources with no virtualization overhead.
Key Capabilities¶
The following capabilities are supported as part of the Bare Metal GPU Service:
Capability | Description |
---|---|
Multi-GPU Support | Enables usage of nodes with 1, 4, or 8 high-performance GPUs for scale-out training and inference workloads. |
Kubernetes Integration | Supports Kubernetes-native workflows; users can deploy workloads using standard manifests and Helm charts. |
Custom OS Images | Ability to boot bare metal nodes with pre-approved base operating systems such as Ubuntu 22.04 LTS. |
GPU Sharing (Optional) | Offers full node access, but can also support GPU sharing configurations when enabled at cluster level. |
High-Speed Interconnects | Nodes are equipped with NVLink, NVSwitch, and NDR Infiniband for high-bandwidth GPU-to-GPU communication. |
Dedicated CPU Nodes | Allows provisioning of CPU-only nodes for non-GPU workloads such as orchestration, preprocessing, or storage. |
User-Controlled Lifecycle | End users can start, stop, and terminate nodes through self-service controls with quota enforcement. |
Custom Initialization Hooks | Supports bootstrap scripts and environment-specific initialization logic. |
Telemetry & Monitoring | Integration with monitoring dashboards and system metrics for observability (requires setup). |
Networking & Security | Supports workload isolation through Kubernetes namespaces, CNI-based policies, and secure ingress/egress. |
No Virtualization Overhead | Direct access to hardware ensures maximum performance for demanding AI/ML pipelines. |
Supported Workloads¶
The service is optimized for:
- Large Language Model (LLM) training and fine-tuning
- Multi-GPU distributed training jobs
- High-throughput inference pipelines
- Data preprocessing and feature engineering
- Serving orchestration or control plane components
Access Patterns¶
Users can consume bare metal resources through:
- Compute Profiles with the
baremetal
type - Environment Templates mapped to supported node types
- Custom Providers to inject hooks, data, and logic into provisioning workflows
Platform Setup Overview¶
The platform team is responsible for the initial configuration and enablement of the Bare Metal GPU Service. This setup includes onboarding physical nodes into the Rafay platform, defining system-level resource pools (such as public IP pools and VLANs), configuring networking interfaces (including DPUs), and enabling self-service compute profiles for specific projects.
The architecture typically involves physically provisioned servers with GPU and CPU roles, high-speed interconnects (e.g., NVLink, NDR InfiniBand), and secure tenant-facing network configurations. The platform ensures these resources are exposed to end users in a controlled and quota-enforced manner.
The following sequence diagram outlines the high-level process for preparing the platform for bare metal consumption:
sequenceDiagram
participant Admin as NCP-Admin
participant Infra as Bare Metal Infrastructure
participant Rafay as Rafay Platform
Admin->>Infra: Rack & Provision Bare Metal Servers
Admin->>Infra: Install Base OS (e.g., Ubuntu 22.04 LTS)
Admin->>Infra: Setup Networking (VLANs, IP Pools, DPU Config)
Admin->>Infra: Attach High-Speed Storage (e.g., NVMe, Ceph)
Admin->>Rafay: Register Bare Metal Node Resources
Rafay-->>Infra: Perform Hardware Discovery and Validation
Admin->>Rafay: Configure Compute Profiles (baremetal type)
Admin->>Rafay: Setup Environment Templates and Custom Init Hooks
Admin->>Rafay: Provision Workload Environments using Bare Metal Nodes
Rafay->>Infra: Bootstrap Kubernetes, System Components, GPU Drivers
Rafay-->>Admin: Nodes Ready for AI/ML Workloads
Supported Integrations¶
Integration | Availability |
---|---|
GitOps Workflows | ✅ Supported |
Service Account Injection | ✅ Supported |
Container Runtime Options (e.g., Kata) | ⚙️ Configurable (on request) |
GPU Monitoring Dashboards | ⚙️ Requires setup |
Storage Plugins (e.g., CSI) | ✅ Supported |
Summary¶
The Bare Metal GPU Service is designed for users who need full hardware access and control to maximize AI/ML performance. It supports highly parallelized workloads with multiple GPUs, dedicated networking, and deep customization for training pipelines and inference systems.