Skip to content

NVIDIA Performance Reference Architecture: An Introduction

Artificial intelligence (AI) and high-performance computing (HPC) workloads are evolving at unprecedented speed. Enterprises today require infrastructure that can scale elastically, provide consistent performance, and ensure secure multi-tenant operation. NVIDIA’s Performance Reference Architecture (PRA), built on HGX platforms with Shared NVSwitch GPU Passthrough Virtualization, delivers precisely this capability.

This is the introductory blog in a multi part series. In this blog, we explain why PRA is critical for modern enterprises and service providers, highlight the benefits of adoption, and outline the key steps required to successfully deploy and support the PRA design/architecture.


The Case for Performance Reference Architectures (PRA)

As AI adoption accelerates, organizations face three major challenges:

Challenge Description
Performance at Scale Training and inference require full GPU bandwidth, even in virtualized environments.
Consistency Across Deployments Enterprises cannot afford unpredictable performance across data centers or GPU generations.
Operational Efficiency & Security Shared resources must be isolated, managed, and automated for reliability and trust.

NVIDIA’s PRA addresses these challenges by providing a blueprint for GPU virtualization that guarantees near bare-metal performance while enabling multi-tenant elasticity.


Architecture Overview

At the heart of PRA is the Shared NVSwitch Virtualization Model, which offers the following capabilities and benefits:

Benefits of Adopting PRA

  • Elastic GPU Tenancy – Scale workloads dynamically without compromising performance.
  • Predictable Operations – Standardized configurations across environments reduce operational risk.
  • Customer Trust – Multi-tenant isolation builds confidence for regulated or sensitive workloads esp. for service providers
  • Future-Proof Infrastructure – Smooth transitions across GPU generations and evolving AI requirements.

As a result, support for PRA is not just critical, rather existential for Service Providers


Why is PRA Critical?

Near Bare-Metal Performance

PRA delivers the performance enterprises demand by bypassing overheads in traditional virtualization. Tenants benefit from full NVLink bandwidth — critical for large-scale AI training and HPC.

Consistency Across Hardware Generations

A standardized fabric design ensures that workloads run predictably, whether on current HGX platforms or future ones, simplifying long-term infrastructure planning.

Secure Multi-Tenant Isolation

By removing fabric visibility from tenants, enterprises prevent unauthorized access and ensure strict GPU partitioning, essential for cloud and enterprise compliance.

Elasticity for AI Workloads

Organizations can scale seamlessly from single-GPU inference to multi-GPU training VMs, without re-architecting their environment.


How to Implement PRA

Adopting PRA requires a combination of infrastructure readiness and operational best practices.

1. Host System Requirements

  • Deploy server-grade OS images (e.g., Ubuntu Server).
  • Enable virtualization extensions (VT-x / AMD-V) and IOMMU in BIOS/GRUB.
  • Configure kernel modules and disable conflicting drivers.
  • Expand memory mapping regions to support VMs with >44 GB allocation.

2. NVSwitch Fabric Management

For production, ensure a dedicated, trusted deployment of NVIDIA Fabric Manager is available to control/manage the fabrics. This can be either a Service VM–managed configuration OR manage the NVSwitch directly on the host.

The latter approach eliminates the need for an additional Service VM, simplifying operations and reducing overhead. By centralizing fabric management at the host level, you also maintain secure and consistent control of the NVSwitch fabrics while delivering the same high performance and reliability expected in production deployments.

3. GPU Partition Lifecycle

  • Activation before passthrough and deactivation after shutdown is required for each GPU partition.
  • NVIDIA provides automation via Fabric Manager Partition Manager Client (fmpm) and libvirt hook scripts.

4. Networking & RDMA

  • ACS (Access Control Services) must be enabled to enforce peer-to-peer isolation.
  • ATS (Address Translation Services) must be enabled on ConnectX NICs for GPUDirect RDMA.

5. Guest VM Optimization

  • Configure PCI resource reallocation (pci=realloc pci=nocrs).
  • Apply performance tuning: vCPU pinning, NUMA awareness, huge pages, and I/O threading.
  • Follow NVIDIA’s resource allocation guidelines (e.g., reserving CPUs/memory for host, dividing remaining across tenant VMs).

6. Operational Automation

To scale effectively, enterprises should automate and integrate PRA setup steps using infrastructure and orchestration pipelines.

  • Provisioning and lifecycle management of VMs.
  • Automated GPU partition activation/deactivation.
  • Consistency checks for drivers and Fabric Manager versions.

Conclusion

NVIDIA’s Performance Reference Architecture (PRA) for VMs is not just a configuration manual — it’s a strategic framework for building the next generation of AI infrastructure. By following this blueprint, enterprises and cloud providers can deliver bare-metal performance in virtualized environments, secure multi-tenant services, and scalable platforms ready for the AI-driven future.

Service Providers and Enterprises that align with PRA will be better positioned to meet tomorrow’s performance demands with confidence.

Key Recommendations

  1. Adopt a dedicated deployment of NVIDIA Fabric Manager for production deployments.
  2. Automate GPU Partition Lifecycle with Fabric Manager APIs and orchestration integration.
  3. Enable ACS and ATS to ensure secure, high-performance interconnects.
  4. Invest in VM Performance Tuning to extract maximum efficiency for AI/ML workloads.
  5. Standardize on PRA across environments to ensure consistent, future-ready deployments.