Models

New Model¶

To create a new model, click on "New Model" and follow the workflow described below. At a high level, there are two distinct steps

Create Model
Create Model Deployment

General¶

Provide a unique name and an optional description for the model.
Select the "use case" for the model from the dropdown list. The default is "chat"

Info

The list of available use cases in the dropdown list is shown below.

Provider¶

Select from the dropdown list of existing providers. Admins can also click on "Create New" to navigate to the workflow to create a new provider.

Configuration¶

Select type of repository where the model and its weights will be accessed. The following repository types are currently supported:

NGC
Huggingface
Storage Namespace (weights downloaded and stored locally)

Repository Type-NGC¶

When this option is selected, the model weights and related information is downloaded during deployment from Nvidia's NGC Catalog. You need to provide the following details in order for the Rafay Platform to access NGC

NGC API Key (authentication)
Source
Revision

Optionally, you can also enable caching of the downloaded model to avoid having to download it repeatedly from NGC for autoscaling deployments and replicas.

Info

Ensure the data plane is configured with access to the Internet so that the worker nodes can download the model from NGC when required.

Repository Type-Hugging Face¶

When this option is selected, the model weights and related information is downloaded during deployment from Hugging Face. You need to provide the following details in order for the Rafay Platform to access Huggingface.

Hugging Face API Key (authentication)
Source
Revision

Info

Ensure the data plane is configured with access to the Internet so that the worker nodes can download the model from Hugging Face when required.

Repository Type-Storage Namespace¶

When this option is selected, the model weights and related information is downloaded during deployment from the locally hosted Storage Namespace. Just select the required storage namespace from the dropdown list. Alternatively, admins can also initiate the workflow to create a new storage namespace right from the model configuration page.

Info

For this selection, the data plane does not require connectivity to the Internet. Since the worker nodes will retrieve the model and its weights from the locally hosted storage namespace, model deployments can be extremely quick compared to the other two options.

Upload Model Weights

Once the storage namespace has been created, the administrator needs to upload model weights to it before it can be used for model deployments.

Step 1

Download the model weights to a server/laptop from the official model repository (e.g. Hugging Face etc). Model weights can be pretty large (i.e. 700 GB or more)

Important

Please ensure that the model weights are in Safetensor format. This is the only supported format at this time.

Step 2

You will use the AWS CLI to upload the model weights to the specified storage namespace (i.e. S3 compatible object storage).

Download and install the AWS CLI
Configure the AWS CLI with access keys
Upload the downloaded model weights to the storage namespace. Follow the "step-by-step" instructions provided in the "Upload Model Content" tab.

List Models¶

In the Ops Console, click on GenAI and then Models. This will display the list of configured and deployed models.

View Model Details¶

In the Ops Console, click on a selected model to view details. Shown below is an example of a "Llama 3.1-8b Instruct" model powered by the "NIM" Inference engine.

Delete Model¶

In the Ops Console, click on the "ellipses" (3 dots on the far right) under Action for an existing model.

Click on "Delete" to delete the model.
You will be prompted to confirm deletion

Info

Deletion is not reversible. All associated infrastructure and resources will be torn down during this process.

In the Ops Console, click on the "ellipses" (3 dots on the far right) under Action for an existing model. Now, click on "Manage Sharing" to initiate a workflow to share the model with All or Select tenant orgs.

By default, a newly created model is not shared with any tenant org.
Select "All Orgs" to make the model available to all tenant orgs under management
Select "Select Orgs" to make the model available to selected tenant orgs.

Models

New Model¶

General¶

Provider¶

Configuration¶

Repository Type-NGC¶

Repository Type-Hugging Face¶

Repository Type-Storage Namespace¶

List Models¶

View Model Details¶

Delete Model¶

Share Model¶