Get Started

The template enables streamlined deployment of inference services on GPU-enabled Kubernetes clusters. In this guide, we will be using template for Inference available from the Template Catalog.

Service profiles are based on environment templates powered by Environment Manager.
Users can also create and configure Custom environment templates for use cases outside the list supported out of the box in Template Catalog.

Info

Please check Public Roadmap or contact support for details on additional templates for the Template Catalog.

Please ensure that you have a properly configured the cluster with GPUs and an Ingress Controller by following the infrastructure related instructions. The endpoint URL will be exposed via a https based Ingress on a domain.

Prerequisites¶

Before proceeding, ensure the following:

Host Cluster: Ensure that a Kubernetes host cluster is available and ready for Inference deployment.
Agent Configuration: Configure agents through Global Settings or during cluster provisioning

Inference Configuration¶

As an Org Admin, go to Environment -> Environment Templates
Select system-pod-as-a-service and click the edit icon for a specific version to make the required changes, or click on New Version
Provide the following details:
- A unique name for the template.
- A version name (e.g., 1.0).
Go to Agents and configure the required Agent to drive the workflow. Select the Agent from the dropdown list. If no Agents are shown, the Agent may need to be set up (refer to the prerequisites).

Config Context¶

The config context will typically encapsulate credentials and environment variables required for the agent to perform its job. In this case, we will configure the Rafay Agent with credentials so that it can make programmatic (API) calls to the specified Rafay Org and Hugging Face.

To get the Rafay API Key + Secret for the administrator user,

Navigate to "My Tools -> Manage Keys" and click on "New API Key".
Copy the API Key + Secret combination.

Info

Click here to learn more about API Key & Secret for programmatic access.

Now, we are ready to configure our agent's config context.

Under the "Config Contexts" tab in the environment template, edit "kubeconfig-mounter"
Expand "Environment Variables" and you should see three entries: HF Token, API Key & Controller Endpoint
Click on Edit for API Key
Paste the API Key/Secret string from the above step into the value section
Select Override to "Not Allowed" to ensure none of the downstream users have visibility or access to the config context
Save & Continue

Click on Edit for HF Token
Paste the Hugging Face Token from your Hugging Face account into the value section
Select Override to "Not Allowed" to ensure none of the downstream users have visibility or access to the config context
Save & Continue

Controller API Endpoint

For self hosted Rafay Controller deployments, the agent will need to be configured to point to its URL. In our Get Started Guide, we will be using the URL for Rafay's SaaS option.

Click on Edit for "Controller Endpoint".
Note that Rafay's SaaS Endpoint URL is already configured and can be updated if required
Select Override to "Not Allowed" to ensure none of the downstream users have visibility or access to the config context
Save & Continue

Click Save

Configure Input Variables¶

Customize and templatize all Inference configurations using input variables, including:
- Cluster and Access settings: Host server, kubeconfig, client certificate and key data, certificate authority data
- Resource settings: CPU request, CPU limit, memory request, memory limit, GPU limit, PVC storage
- Networking settings: Namespace, ingress domain, ingress IP, ingress namespace, subdomain, custom domain, custom secret
- Cluster management: Host cluster name, project, cluster name
Restrict user edits for specific variables by:
- Setting overrides to Not Allowed for selected variables
- Defining default values that cannot be edited at the time of launch

Save the template as a Draft to allow ongoing edits until the configuration is finalized. Once all changes are complete, set it as an Active Version to freeze the version. Learn more about version management.

Configure PaaS¶

Next, PaaS Admin configure a custom PaaS Service Profile to make the template available for self-service deployment:

Navigate to PaaS Studio > Service Profiles
Select the project where the template was created and click New Service Profile
Enter a Name for the profile and select the previously created Template and Version
For Service Type, select Inference Endpoints
For Will compute be auto-created, select Yes
Click Save & Continue

On the next screen, optionally, select a datacenter to define the deployment location for the inference service and filter the storage resources available for that datacenter. Select a storage instance from the list, which includes both datacenter-specific storage and open storage that is accessible across all datacenters.
Advanced Configuration optionally allows adding labels, annotations, and extra configuration entries to the inference service profile. These metadata fields help organize inference services, add supplemental information, and support internal workflows that require additional descriptive or operational details.
The Design Profile Card section optionally allows specifying an icon URL and providing a description for the inference service profile. This helps present a clear visual identity for the inference service and gives users an overview of its purpose and capabilities when browsing available profiles.

In the Input Settings section, enable or disable overrides for the required parameters.

PaaS admins can enable or disable overrides for input fields. The fields marked as enabled will be available for end users to provide their values during deployment. In this example, CPU, GPU Count, GPU Type, and Host Cluster Name are enabled, so users can enter values for these parameters when deploying inference endpoints.

Navigate to Output Settings and click Add Output. When creating the inference service profile, the PaaS Admin can define outputs such as URLs, API keys, or other required details. These outputs are displayed to the end users after they deploy the inference service, allowing them to access and use the service.
- Enter the Name, Label and Resource
- If required, add more outputs

The Schedule Settings section allows you to define automated start and stop actions for service instances created from this service profile. These schedules help ensure that instances run only during specific time windows, optimizing resource usage. To create a schedule, click Add Schedule and provide the following details:
- Name: Enter a unique name for the schedule.
- Action: Select the operation to perform. Supported actions include start and stop.
- Time Zone: Choose the time zone in which the schedule should be executed.
- Cron Expression: Define the time and frequency for the action using a cron expression.
- Enforce Time Window: When enabled, the system ensures that the instance remains active only during the scheduled time window. This option applies only to start and stop actions.

Once the schedule is configured, click Save Changes to add it to the service profile. All schedules defined here will automatically apply to service instances created using this profile.

Note: If the selected environment template version does not support scheduling actions, the Schedules Settings section will display a message stating “No Actions Available.” In this case, select a different template version or contact your administrator to enable scheduling capabilities.

The Action Settings section optionally allows configuring the actions available for the inference service, such as start or stop. The actions listed here are derived from the selected environment template, and admins can update action values, provide aliases, control override behavior, and customize how each action is presented during inference service creation.

For each action, admins can edit the display configuration to adjust labels, tooltips, and whether users can modify the action value when launching an inference service.

Note: If the environment template does not define any actions, the Action Settings section appears but remains empty.

Click Save Changes

Once the changes are saved, the inference service profile becomes available in the Developer Hub. End users can select this profile and deploy an inference service. For the detailed process flow, see User Workflow for Inference Services