Choosing the Right Fractional GPU Strategy for Cloud Providers¶

As demand for GPU-accelerated workloads soars across industries, cloud providers are under increasing pressure to offer flexible, cost-efficient, and isolated access to GPUs. While full GPU allocation remains the norm, it often leads to resource waste—especially for lightweight or intermittent workloads.

In the previous blog, we described the three primary technical approaches for fractional GPUs. In this blog, we'll explore the most viable approaches to offering fractional GPUs in a GPU-as-a-Service (GPUaaS) model, and evaluate their suitability for cloud providers serving end customers.

Best Choice: MIG (Multi-Instance GPU)¶

MIG is Ideal for Cloud Providers because of the following reasons.

Advantage	Benefit to Cloud Providers
Strong isolation	Prevents noisy neighbors and ensures SLA-grade stability
Predictable performance	Enables tiered plans with performance guarantees
Hardware-enforced partitioning	Ensures consistent resource separation across tenants
Supports metering and quotas	Maps well to billing per GPU instance / tenant
Kubernetes integration	Easy to expose via `nvidia.com/mig-<profile>` resources

Info

Ideal Customer Use Cases

Real-time inference for retail customers
Enterprise workloads with SLAs
Training with guaranteed resources
Tiered fractional GPU offerings (e.g., 1g.5gb, 2g.10gb)

Runner-up: Time Slicing¶

Time Slicing can be an effective approach for the following reasons:

Cost-effective shared plans
R&D and dev/test environments
Elastic compute for batch or exploratory ML

⚠️ Limitations¶

Weaker isolation: No resource partitioning at hardware level
Inconsistent performance under load
Harder to bill precisely per tenant

Info

Ideal Customer Use Cases

Notebooks and exploratory ML
Batch inference
Training pipelines with elastic scheduling
Internal jobs that don’t require strict SLAs

Experimental: Custom Schedulers (e.g., KAI)¶

This approach is not ideal for cloud providers because of the following reasons:

No hardware isolation or enforcement
Requires complex scheduler extensions
Difficult to align with metering and billing
Resource usage enforcement is manual or cooperative

Info

Potential Customer Use Cases

Internal research labs
Academic clouds
Trusted user environments

Conclusion: MIG is the Most Production-Ready¶

The table below provides a scoring matrix comparing the three approaches.

Criteria	MIG	Time Slicing	Custom Scheduler (KAI)
Performance Isolation	✅ Strong	⚠️ Weak	⚠️ Soft
SLA-Friendly	✅ Yes	❌ Limited	❌ Not suitable
Billing Accuracy	✅ Per MIG profile	⚠️ Difficult	⚠️ Not enforceable
Multi-Tenancy Support	✅ Excellent	⚠️ Risk of contention	⚠️ Manual only
Deployment Complexity	⚠️ Moderate	✅ Low	❌ High
Hardware Support	❌ A100/L40 only	✅ All NVIDIA GPUs	✅ All NVIDIA GPUs

For GPU cloud providers delivering fractional access to external customers, MIG (Multi-Instance GPU) offers the best combination of:

✅ Security and isolation
✅ Performance predictability
✅ Quota and billing integration
✅ Multi-tenant scalability

If your hardware supports it (A100, L40, H100), MIG should be your default fractional strategy. Use time slicing for budget-friendly plans and custom schedulers only in internal or cooperative environments.

Free Org

Sign up for a free Org if you want to try this yourself with our Get Started guides.

Free Org
Live Demo

Schedule time with us to watch a demo in action.

Schedule Demo