Overview
Introduction¶
The GenAI Service provides partners with an integrated environment to onboard, manage, and deploy GenAI models on GPU-backed infrastructure. The service includes dedicated resources that support the complete model lifecycle, from preparing compute capacity to publishing models for organizational use.
The GenAI Service is available in the self-hosted control plane and is configured by the Partner Admin.
Capabilities¶
The GenAI Service enables partners to:
- Add GPU-backed compute capacity for hosting GenAI workloads
- Onboard models from Hugging Face, NVIDIA NGC, or custom storage buckets
- Deploy models to GPU-powered compute clusters
- Configure inference endpoints
- Organize models using providers
- Make models available to organizations
- Track token usage for deployed models
GenAI Resources in the Operations Console¶
A Partner Admin accesses GenAI resources from the Operations Console under the GenAI section:
- Compute Clusters
- Endpoints
- Providers
- Storage Namespaces
- Models
- Model Deployments
- Token Usage
These resources represent the end-to-end workflow for hosting and serving GenAI models.
High-Level Workflow¶
At a high level, the onboarding workflow includes:
- Preparing GPU-backed compute capacity
- Creating an inference endpoint
- Adding providers and storage namespaces
- Creating model metadata (model cards)
- Uploading or pulling model files
- Deploying the model to an endpoint
- Sharing deployments with organizations