Skip to content

Overview

Introduction

The GenAI Service provides partners with an integrated environment to onboard, manage, and deploy GenAI models on GPU-backed infrastructure. The service includes dedicated resources that support the complete model lifecycle, from preparing compute capacity to publishing models for organizational use.

The GenAI Service is available in the self-hosted control plane and is configured by the Partner Admin.


Capabilities

The GenAI Service enables partners to:

  • Add GPU-backed compute capacity for hosting GenAI workloads
  • Onboard models from Hugging Face, NVIDIA NGC, or custom storage buckets
  • Deploy models to GPU-powered compute clusters
  • Configure inference endpoints
  • Organize models using providers
  • Make models available to organizations
  • Track token usage for deployed models

GenAI Resources in the Operations Console

A Partner Admin accesses GenAI resources from the Operations Console under the GenAI section:

  • Compute Clusters
  • Endpoints
  • Providers
  • Storage Namespaces
  • Models
  • Model Deployments
  • Token Usage

These resources represent the end-to-end workflow for hosting and serving GenAI models.


High-Level Workflow

At a high level, the onboarding workflow includes:

  1. Preparing GPU-backed compute capacity
  2. Creating an inference endpoint
  3. Adding providers and storage namespaces
  4. Creating model metadata (model cards)
  5. Uploading or pulling model files
  6. Deploying the model to an endpoint
  7. Sharing deployments with organizations