Get Started
The template enables streamlined deployment of inference services on GPU-enabled Kubernetes clusters. In this guide, we will be using template for Inference available from the Template Catalog.
- Service profiles are based on environment templates powered by Environment Manager.
- Users can also create and configure Custom environment templates for use cases outside the list supported out of the box in Template Catalog.
Info
Please check Public Roadmap or contact support for details on additional templates for the Template Catalog.
Please ensure that you have a properly configured the cluster with GPUs and an Ingress Controller by following the infrastructure related instructions. The endpoint URL will be exposed via a https based Ingress on a domain.
Prerequisites¶
Before proceeding, ensure the following:
- Host Cluster: Ensure that a Kubernetes host cluster is available and ready for Inference deployment.
- Agent Configuration: Configure agents through Global Settings or during cluster provisioning
Inference Configuration¶
- As an Org Admin, go to Environment -> Environment Templates
- Select
system-pod-as-a-service
and click the edit icon for a specific version to make the required changes, or click on New Version - Provide the following details:
- A unique name for the template.
- A version name (e.g.,
1.0
).
- Go to Agents and configure the required Agent to drive the workflow. Select the Agent from the dropdown list. If no Agents are shown, the Agent may need to be set up (refer to the prerequisites).
Config Context¶
The config context will typically encapsulate credentials and environment variables required for the agent to perform its job. In this case, we will configure the Rafay Agent with credentials so that it can make programmatic (API) calls to the specified Rafay Org and Hugging Face.
To get the Rafay API Key + Secret for the administrator user,
- Navigate to "My Tools -> Manage Keys" and click on "New API Key".
- Copy the API Key + Secret combination.
Info
Click here to learn more about API Key & Secret for programmatic access.
Now, we are ready to configure our agent's config context.
- Under the "Config Contexts" tab in the environment template, edit "kubeconfig-mounter"
- Expand "Environment Variables" and you should see three entries: HF Token, API Key & Controller Endpoint
- Click on Edit for API Key
- Paste the API Key/Secret string from the above step into the value section
- Select Override to "Not Allowed" to ensure none of the downstream users have visibility or access to the config context
- Save & Continue
- Click on Edit for HF Token
- Paste the Hugging Face Token from your Hugging Face account into the value section
- Select Override to "Not Allowed" to ensure none of the downstream users have visibility or access to the config context
- Save & Continue
Controller API Endpoint
For self hosted Rafay Controller deployments, the agent will need to be configured to point to its URL. In our Get Started Guide, we will be using the URL for Rafay's SaaS option.
- Click on Edit for "Controller Endpoint".
- Note that Rafay's SaaS Endpoint URL is already configured and can be updated if required
- Select Override to "Not Allowed" to ensure none of the downstream users have visibility or access to the config context
- Save & Continue
- Click Save
Configure Input Variables¶
- Customize and templatize all Inference configurations using input variables, including:
- Cluster and Access settings: Host server, kubeconfig, client certificate and key data, certificate authority data
- Resource settings: CPU request, CPU limit, memory request, memory limit, GPU limit, PVC storage
- Networking settings: Namespace, ingress domain, ingress IP, ingress namespace, subdomain, custom domain, custom secret
- Cluster management: Host cluster name, project, cluster name
- Restrict user edits for specific variables by:
- Setting overrides to Not Allowed for selected variables
- Defining default values that cannot be edited at the time of launch
- Save the template as a Draft to allow ongoing edits until the configuration is finalized. Once all changes are complete, set it as an Active Version to freeze the version. Learn more about version management.
Configure PaaS¶
Next, PaaS Admin configure a custom PaaS Service Profile to make the template available for self-service deployment:
- Navigate to PaaS Studio > Service Profiles
- Select the project where the template was created and click New Service Profile
- Enter a Name for the profile and select the previously created Template and Version
- For Service Type, select Inference Endpoints
- For Will compute be auto-created, select Yes
- Click Save & Continue
- On the next screen, go to Input Settings, enable or disable overrides for the required parameters.
PaaS admins can enable or disable overrides for input fields. The fields marked as enabled will be available for end users to provide their values during deployment. In this example, CPU
, GPU Count
, GPU Type
, and Host Cluster Name
are enabled, so users can enter values for these parameters when deploying inference endpoints.
- Click Save Changes
- Navigate to Output Settings and click Add Output. When creating the inference service profile, the PaaS Admin can define outputs such as
URLs
,API keys
, or other required details. These outputs are displayed to the end users after they deploy the inference service, allowing them to access and use the service. - Enter the Name, Label and Resource
- If required, add more outputs and click Save Changes
Once the changes are saved, the inference service profile becomes available in the Developer Hub. End users can select this profile and deploy an inference service. For the detailed process flow, see User Workflow for Inference Services