NVIDIA Performance Reference Architecture: An Introduction¶

Artificial intelligence (AI) and high-performance computing (HPC) workloads are evolving at unprecedented speed. Enterprises today require infrastructure that can scale elastically, provide consistent performance, and ensure secure multi-tenant operation. NVIDIA’s Performance Reference Architecture (PRA), built on HGX platforms with Shared NVSwitch GPU Passthrough Virtualization, delivers precisely this capability.

This is the introductory blog in a multi part series. In this blog, we explain why PRA is critical for modern enterprises and service providers, highlight the benefits of adoption, and outline the key steps required to successfully deploy and support the PRA design/architecture.

The Case for Performance Reference Architectures (PRA)¶

As AI adoption accelerates, organizations face three major challenges:

Challenge	Description
Performance at Scale	Training and inference require full GPU bandwidth, even in virtualized environments.
Consistency Across Deployments	Enterprises cannot afford unpredictable performance across data centers or GPU generations.
Operational Efficiency & Security	Shared resources must be isolated, managed, and automated for reliability and trust.

NVIDIA’s PRA addresses these challenges by providing a blueprint for GPU virtualization that guarantees near bare-metal performance while enabling multi-tenant elasticity.

Architecture Overview¶

At the heart of PRA is the Shared NVSwitch Virtualization Model, which offers the following capabilities and benefits:

Benefits of Adopting PRA¶

Elastic GPU Tenancy – Scale workloads dynamically without compromising performance.
Predictable Operations – Standardized configurations across environments reduce operational risk.
Customer Trust – Multi-tenant isolation builds confidence for regulated or sensitive workloads esp. for service providers
Future-Proof Infrastructure – Smooth transitions across GPU generations and evolving AI requirements.

As a result, support for PRA is not just critical, rather existential for Service Providers

Why is PRA Critical?¶

Near Bare-Metal Performance¶

PRA delivers the performance enterprises demand by bypassing overheads in traditional virtualization. Tenants benefit from full NVLink bandwidth — critical for large-scale AI training and HPC.

Consistency Across Hardware Generations¶

A standardized fabric design ensures that workloads run predictably, whether on current HGX platforms or future ones, simplifying long-term infrastructure planning.

Secure Multi-Tenant Isolation¶

By removing fabric visibility from tenants, enterprises prevent unauthorized access and ensure strict GPU partitioning, essential for cloud and enterprise compliance.

Elasticity for AI Workloads¶

Organizations can scale seamlessly from single-GPU inference to multi-GPU training VMs, without re-architecting their environment.

How to Implement PRA¶

Adopting PRA requires a combination of infrastructure readiness and operational best practices.

1. Host System Requirements¶

Deploy server-grade OS images (e.g., Ubuntu Server).
Enable virtualization extensions (VT-x / AMD-V) and IOMMU in BIOS/GRUB.
Configure kernel modules and disable conflicting drivers.
Expand memory mapping regions to support VMs with >44 GB allocation.

2. NVSwitch Fabric Management¶

For production, ensure a dedicated, trusted deployment of NVIDIA Fabric Manager is available to control/manage the fabrics. This can be either a Service VM–managed configuration OR manage the NVSwitch directly on the host.

The latter approach eliminates the need for an additional Service VM, simplifying operations and reducing overhead. By centralizing fabric management at the host level, you also maintain secure and consistent control of the NVSwitch fabrics while delivering the same high performance and reliability expected in production deployments.

3. GPU Partition Lifecycle¶

Activation before passthrough and deactivation after shutdown is required for each GPU partition.
NVIDIA provides automation via Fabric Manager Partition Manager Client (fmpm) and libvirt hook scripts.

4. Networking & RDMA¶

ACS (Access Control Services) must be enabled to enforce peer-to-peer isolation.
ATS (Address Translation Services) must be enabled on ConnectX NICs for GPUDirect RDMA.

5. Guest VM Optimization¶

Configure PCI resource reallocation (pci=realloc pci=nocrs).
Apply performance tuning: vCPU pinning, NUMA awareness, huge pages, and I/O threading.
Follow NVIDIA’s resource allocation guidelines (e.g., reserving CPUs/memory for host, dividing remaining across tenant VMs).

6. Operational Automation¶

To scale effectively, enterprises should automate and integrate PRA setup steps using infrastructure and orchestration pipelines.

Provisioning and lifecycle management of VMs.
Automated GPU partition activation/deactivation.
Consistency checks for drivers and Fabric Manager versions.

Conclusion¶

NVIDIA’s Performance Reference Architecture (PRA) for VMs is not just a configuration manual — it’s a strategic framework for building the next generation of AI infrastructure. By following this blueprint, enterprises and cloud providers can deliver bare-metal performance in virtualized environments, secure multi-tenant services, and scalable platforms ready for the AI-driven future.

Service Providers and Enterprises that align with PRA will be better positioned to meet tomorrow’s performance demands with confidence.

Key Recommendations

Adopt a dedicated deployment of NVIDIA Fabric Manager for production deployments.
Automate GPU Partition Lifecycle with Fabric Manager APIs and orchestration integration.
Enable ACS and ATS to ensure secure, high-performance interconnects.
Invest in VM Performance Tuning to extract maximum efficiency for AI/ML workloads.
Standardize on PRA across environments to ensure consistent, future-ready deployments.

Free Org

Sign up for a free Org if you want to try this yourself with our Get Started guides.

Free Org
Live Demo

Schedule time with us to watch a demo in action.

Schedule Demo