Bare Metal Requirements¶
Bare metal infrastructure must include a combination of GPU-capable and CPU-only nodes. GPU nodes are used for high-performance AI/ML workloads, while CPU-only nodes support services such as orchestration layers and storage controllers (e.g., Ceph).
Bare Metal Servers¶
- GPU nodes for training, inferencing, or LLM workloads
- CPU-only nodes for control plane components, storage, and background jobs
Infrastructure Components¶
- Top-of-Rack (ToR) switches for connecting GPU and storage nodes
- Out-of-Band (OOB) switches for BMC/iDRAC access
- Ceph or similar distributed storage setup
- Optional BlueField DPU interfaces for enhanced isolation and performance
Head Node Requirements¶
A dedicated head node is required to manage bare metal provisioning and coordination for BMaaS.
Hardware Requirements¶
- Minimum Memory: 16 GB RAM
- Minimum CPU: 8 vCPUs
- Storage: At least 500 GB of disk capacity
Operating System¶
- Supported OS: Ubuntu 22.04/24.04 or RHEL 8/9
- For RHEL-based installations, the system must have valid entitlements enabled and connectivity to the default RHEL repository servers.
Network Requirements¶
- At least one network interface on the head node must be connected to the same Layer 2 (L2) broadcast domain as the target bare metal nodes.
- This connectivity is required so the head node can receive DHCP requests from the bare metal machines.
- The head node and target nodes must reside on the same VLAN for provisioning workflows.
Operating System¶
- Base Linux OS image (e.g., Ubuntu) should be accessible over the network for bootstrapping and provisioning
Storage¶
- Distributed storage (such as Ceph) must be reachable from all GPU and CPU nodes
- Storage VLANs must be configured and routable across node groups
Network¶
Multiple VLANs must be provisioned to support different traffic types and access layers.
| VLAN Type | Description |
|---|---|
| OOB VLAN | BMC/iDRAC management network |
| TAN VLAN | Tenant Access Network, typically enabled over VxLAN |
| Storage VLAN | Network segment for Ceph or other storage traffic |
| iDRAC VLAN | Management VLAN for Dell iDRAC interfaces |
| DPU VLAN | VLAN for managing BlueField DPUs (if applicable) |
VLAN Pool Configuration¶
- VLAN pools must be preconfigured for tenant network creation
- IP address ranges should be assigned and managed via IPAM or equivalent tools
SSH Access¶
- SSH access is required for provisioning, debugging, and manual intervention
Rafay Controller Accessibility¶
- Bare metal nodes and control interfaces must have outbound access to the Rafay Controller for cluster lifecycle operations, telemetry, and observability integrations