Models
New Model¶
To create a new model, click on "New Model" and follow the workflow described below. At a high level, there are two distinct steps
- Create Model
- Create Model Deployment
General¶
- Provide a unique name and an optional description for the model.
- Select the "use case" for the model from the dropdown list. The default is "chat"
Info
The list of available use cases in the dropdown list is shown below.
Provider¶
Select from the dropdown list of existing providers. Admins can also click on "Create New" to navigate to the workflow to create a new provider.
Configuration¶
Select type of repository where the model and its weights will be accessed. The following repository types are currently supported:
- NGC
- Huggingface
- Storage Namespace (weights downloaded and stored locally)
Repository Type-NGC¶
When this option is selected, the model weights and related information is downloaded during deployment from Nvidia's NGC Catalog. You need to provide the following details in order for the Rafay Platform to access NGC
- NGC API Key (authentication)
- Source
- Revision
Optionally, you can also enable caching of the downloaded model to avoid having to download it repeatedly from NGC for autoscaling deployments and replicas.
Info
Ensure the data plane is configured with access to the Internet so that the worker nodes can download the model from NGC when required.
Repository Type-Hugging Face¶
When this option is selected, the model weights and related information is downloaded during deployment from Hugging Face. You need to provide the following details in order for the Rafay Platform to access Huggingface.
- Hugging Face API Key (authentication)
- Source
- Revision
Info
Ensure the data plane is configured with access to the Internet so that the worker nodes can download the model from Hugging Face when required.
Repository Type-Storage Namespace¶
When this option is selected, the model weights and related information is downloaded during deployment from the locally hosted Storage Namespace. Just select the required storage namespace from the dropdown list. Alternatively, admins can also initiate the workflow to create a new storage namespace right from the model configuration page.
Info
For this selection, the data plane does not require connectivity to the Internet. Since the worker nodes will retrieve the model and its weights from the locally hosted storage namespace, model deployments can be extremely quick compared to the other two options.
Upload Model Weights
Once the storage namespace has been created, the administrator needs to upload model weights to it before it can be used for model deployments.
Step 1
Download the model weights to a server/laptop from the official model repository (e.g. Hugging Face etc). Model weights can be pretty large (i.e. 700 GB or more)
Important
Please ensure that the model weights are in Safetensor format. This is the only supported format at this time.
Step 2
You will use the AWS CLI to upload the model weights to the specified storage namespace (i.e. S3 compatible object storage).
- Download and install the AWS CLI
- Configure the AWS CLI with access keys
- Upload the downloaded model weights to the storage namespace. Follow the "step-by-step" instructions provided in the "Upload Model Content" tab.
List All Models¶
In the Ops Console, click on GenAI and then Models. This will display the list of configured and deployed models.
View Model Details¶
In the Ops Console, click on a selected model to view details. Shown below is an example of a "Llama 3.1-8b Instruct" model powered by the "NIM" Inference engine.
Delete Model¶
In the Ops Console, click on the "ellipses" (3 dots on the far right) under Action for an existing model.
- Click on "Delete" to delete the model.
- You will be prompted to confirm deletion
Info
Deletion is not reversible. All associated infrastructure and resources will be torn down during this process.
Share Model¶
In the Ops Console, click on the "ellipses" (3 dots on the far right) under Action for an existing model. Now, click on "Manage Sharing" to initiate a workflow to share the model with All or Select tenant orgs.
- By default, a newly created model is not shared with any tenant org.
- Select "All Orgs" to make the model available to all tenant orgs under management
- Select "Select Orgs" to make the model available to selected tenant orgs.










