Models

Overview¶

A model represents the GenAI asset that partners onboard into the system.
Each model includes core metadata such as name, description, provider, and use case.
This information determines how the model is organized, how it appears in the console, and how it will later be deployed for inference.

Models created here become available in the Model Deployments workflow.

Accessing Models¶

A Partner Admin can access the Models page from Operations Console.

This screen displays all existing models with details such as provider, repository, and creation time.

Create a Model¶

Navigate to Operations Console → GenAI → Models
Click New Model to open the model creation form

The model creation screen includes three sections:
General Details, Provider, and Configuration.

General Details¶

The following fields are available under General Details: - Name: Enter a unique name for the model. - Description: Optional text describing the model. - Usecase: Select the use case that best represents the model’s supported capability.

The correct use case must be selected based on the model’s inherent capabilities.
For example:

    - Text-only LLMs should use **Chat**  
    - Image or vision models should use **Vision** or **Image**  
    - Audio processing models may use **Audio** or **Transcribe**

Providers¶

Each model must be associated with a Provider, which represents the model’s originating source (such as Llama, Qwen, NVIDIA, Google, or any custom provider). All providers created earlier in the Providers page appear in this dropdown. 1. Open the Provider dropdown. 2. Choose the appropriate provider based on the model you are onboarding. Selecting the correct provider ensures the model is grouped and displayed under the right model family. 3. If the required provider is not available in the list, a new provider can be created directly from the dropdown. This opens the same Create Provider form available under GenAI → Providers.

Configuration¶

The Configuration section defines where the model will be sourced from and where its artifacts are stored. Partners can choose from three repository options:

NVIDIA NGC: Select this option when the model is sourced from the NVIDIA NGC catalog.

Key details: * Requires an NGC API Key from the customer’s active NGC subscription. * Parameters such as Source (NGC model path) and Revision must be provided. * Supports models officially published and maintained by NVIDIA.

Use this option when accessing licensed or enterprise-grade models available through NGC.

Hugging Face: Select this option to pull the model directly from a Hugging Face repository.

Key details: * Supports public foundation models such as Llama, Qwen, and other multimodal or task-specific models. * The system retrieves the model from Hugging Face using the API Key, Source (repo path), and Revision. * No internal storage namespace is used.

Use this option when onboarding open-source models directly from Hugging Face.

Storage Namespace: Select this option when the model artifacts are stored in a customer-owned private bucket configured under Storage Namespaces.

Key details: * Used for custom, fine-tuned, or proprietary models. * Partners upload their model artifacts to their bucket (e.g., AWS S3) and reference it here. * Supports organization-specific model workflows.

Storage Namespace is recommended when developers produce their own model versions and want to host them privately rather than pulling from public ecosystems.

When Storage Namespace is chosen:

A list of previously created storage namespaces is displayed.
Selecting one allows the system to fetch model files from that bucket.
If a storage namespace is not yet available, a new one can be created inline using Create New.

This ensures models are uploaded and retrieved securely using the configured credentials and bucket settings.

Click Save

Upload Model Content¶

After saving the model card, the system opens the Upload Model Content tab. This provides the steps required to upload model artifacts into the selected storage namespace.

Create Access Key

An Access Key is required to authenticate with the storage backend linked to the model. If an access key already exists, it can be reused. Otherwise, select New Access Key to generate one. The key pair is shown only once, so it must be copied and stored securely.

Install AWS CLI¶

The AWS CLI is used to sync model content to the storage namespace. If not already installed, run:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Configure AWS Credentials¶

Use the Access Key generated on the portal to authenticate the AWS CLI.

Run:

aws configure

Provide the following:

AWS Access Key ID — Key created from the Access Key tab
AWS Secret Access Key — Secret key from the Access Key tab
Default region — Region used in your storage namespace (e.g., us-west-2)
Default output format — Optional

Users who prefer not to override the default AWS CLI profile may configure a named profile:

aws configure --profile <profile-name>

This enables separate credentials for different environments or storage namespaces.

Download Model Files Locally¶

Before syncing model content to the storage namespace, ensure the model files are available locally.

For models sourced from Hugging Face, download the model using the Hugging Face CLI:

hf download Qwen/Qwen2.5-1B-Instruct

The model files are downloaded to the local Hugging Face cache directory. Navigate to the directory containing the downloaded model files before running the sync command.

This is a one-time step per model version and avoids repeated large downloads during model onboarding.

Prepare Model Files Locally¶

Before running the sync command, ensure that the model files are available locally on the system where the AWS CLI is configured.

The upload process syncs files from the current local directory to the configured Storage Namespace. The model artifacts must already exist on disk before running the command.

Key points:

Model files must be present locally before upload
This is a one-time preparation step per model version
Large models must be fully downloaded before syncing

For models sourced from Hugging Face, the model can be downloaded locally using Hugging Face tooling. Downloaded files are stored in the local Hugging Face cache directory and can be synced directly from there.

Step 2: Upload (Sync) the Model Content¶

Once the model files are available locally and AWS CLI is configured, upload the model artifacts to the configured Storage Namespace.

Navigate to the directory that contains the downloaded model files. This directory must include all required artifacts such as model weights, tokenizer files, and configuration files.

Run the sync command from this directory:

aws s3 --endpoint-url https://<rafay-gateway-url>/gateway sync . s3://<model-name>/<revision>/

This command uploads all files from the current directory to the model’s storage location using the Rafay gateway and the selected Storage Namespace.

After the upload completes successfully, the model content is available in the bucket and the model becomes ready for deployment.

Model Files Validation¶

After the sync operation completes, the uploaded model artifacts are validated automatically.

Navigate to the Files & Versions tab of the model.
All files uploaded to the storage namespace are listed here, including model weights, configuration files, and tokenizer assets.

The presence of these files confirms that:

The upload to the storage namespace was successful
The model content is correctly associated with the model card
The model is ready for the next step in the workflow

Once the files appear in this view, the model creation and upload process is complete, and the model can proceed to deployment.

Models can be shared with one or more organizations to control where they are available for deployment.

From the Models page, open the required model and select Manage Sharing to share the model with one or more organizations.

Saving the changes makes the model visible and usable in the selected organizations’ consoles for deployment and inference.