Skip to content

Models

New Model

To create a new model, click on "New Model" and follow the workflow described below. At a high level, there are two distinct steps

  1. Create Model
  2. Create Model Deployment

General

Create New Model

  • Provide a unique name and an optional description for the model.
  • Select the "use case" for the model from the dropdown list. The default is "chat"

Info

The list of available use cases in the dropdown list is shown below.

New Model-Use Cases


Provider

Select from the dropdown list of existing providers. Admins can also click on "Create New" to navigate to the workflow to create a new provider.

Create New Model


Configuration

Select type of repository where the model and its weights will be accessed. The following repository types are currently supported:

  1. NGC
  2. Huggingface
  3. Storage Namespace (weights downloaded and stored locally)

Repository Type-NGC

When this option is selected, the model weights and related information is downloaded during deployment from Nvidia's NGC Catalog. You need to provide the following details in order for the Rafay Platform to access NGC

  • NGC API Key (authentication)
  • Source
  • Revision

Optionally, you can also enable caching of the downloaded model to avoid having to download it repeatedly from NGC for autoscaling deployments and replicas.

NGC Configuration

Info

Ensure the data plane is configured with access to the Internet so that the worker nodes can download the model from NGC when required.


Repository Type-Hugging Face

When this option is selected, the model weights and related information is downloaded during deployment from Hugging Face. You need to provide the following details in order for the Rafay Platform to access Huggingface.

  • Hugging Face API Key (authentication)
  • Source
  • Revision

HF Configuration

Info

Ensure the data plane is configured with access to the Internet so that the worker nodes can download the model from Hugging Face when required.


Repository Type-Storage Namespace

When this option is selected, the model weights and related information is downloaded during deployment from the locally hosted Storage Namespace. Just select the required storage namespace from the dropdown list. Alternatively, admins can also initiate the workflow to create a new storage namespace right from the model configuration page.

Storage NS Configuration

Info

For this selection, the data plane does not require connectivity to the Internet. Since the worker nodes will retrieve the model and its weights from the locally hosted storage namespace, model deployments can be extremely quick compared to the other two options.


Upload Model Weights

Once the storage namespace has been created, the administrator needs to upload model weights to it before it can be used for model deployments.

Upload Model Weights to Storage NS

Step 1

Download the model weights to a server/laptop from the official model repository (e.g. Hugging Face etc). Model weights can be pretty large (i.e. 700 GB or more)

Important

Please ensure that the model weights are in Safetensor format. This is the only supported format at this time.

Step 2

You will use the AWS CLI to upload the model weights to the specified storage namespace (i.e. S3 compatible object storage).

  • Download and install the AWS CLI
  • Configure the AWS CLI with access keys
  • Upload the downloaded model weights to the storage namespace. Follow the "step-by-step" instructions provided in the "Upload Model Content" tab.

Upload Model Weights Instructions


List All Models

In the Ops Console, click on GenAI and then Models. This will display the list of configured and deployed models.

List of Models


View Model Details

In the Ops Console, click on a selected model to view details. Shown below is an example of a "Llama 3.1-8b Instruct" model powered by the "NIM" Inference engine.

Model Details


Delete Model

In the Ops Console, click on the "ellipses" (3 dots on the far right) under Action for an existing model.

  • Click on "Delete" to delete the model.
  • You will be prompted to confirm deletion

Confirm Delete

Info

Deletion is not reversible. All associated infrastructure and resources will be torn down during this process.


Share Model

In the Ops Console, click on the "ellipses" (3 dots on the far right) under Action for an existing model. Now, click on "Manage Sharing" to initiate a workflow to share the model with All or Select tenant orgs.

  • By default, a newly created model is not shared with any tenant org.
  • Select "All Orgs" to make the model available to all tenant orgs under management
  • Select "Select Orgs" to make the model available to selected tenant orgs.

Sharing Model