Time Slicing

The NVIDIA GPU Operator enables oversubscription of GPUs through a set of extended options for the NVIDIA Kubernetes Device Plugin. GPU time-slicing enables workloads that are scheduled on oversubscribed GPUs to interleave with one another.

This mechanism for enabling time-slicing of GPUs in Kubernetes enables a system administrator to define a set of replicas for a GPU, each of which can be handed out independently to a pod to run workloads on. Unlike Multi-Instance GPU (MIG), there is no memory or fault-isolation between replicas, but for some workloads this is better than not being able to share at all. Internally, GPU time-slicing is used to multiplex workloads from replicas of the same underlying GPU.

Assumptions¶

You have provisioned or imported one or more Kubernetes clusters into a Project in your Rafay Org that contain one or more GPUs.
Ensure that you have NOT already deployed the Nvidia GPU Operator on the cluster.
You have setup the RCTL CLI

Deploy GPU Time Slicing¶

To deploy the GPU Time Slicing on the managed Kubernetes cluster, perform the following steps:

Step 1: Create GPU Operator Namespace¶

In this step, you will create a namespace for the GPU Operator which will be installed in a later step.

Download the namespace specification file
Update the Project name in the YAML file with the name of the project to create the resource in
Execute the following RCTL command to create the namespace

rctl --v3 apply -f 01-gpu-operator-namespace.yaml

Step 2: Add the NVIDIA Helm Repository¶

In this step, you will add the NVIDIA Helm repository into the controller. This repository will be used to pull the Helm chart of the GPU Operator.

Download the repository specification file
Update the Project name in the YAML file with the name of the project to create the resource in
Execute the following RCTL command to create the repository

rctl --v3 apply -f 02-nvidia-helm-repository.yaml

Step 3: Create GPU Operator Resource Quota Custom Addon¶

In this step, you will create a custom addon for the GPU Operator Resource Quota.

Download the resource quota addon specification file
Download the GPU Operator Resource Quota specification file
Update the Project name in the 03-addon-gpu-operator-quota.yaml file with the name of the project to create the resource in
Execute the following RCTL command to create the resource quota addon

rctl --v3 apply -f 03-addon-gpu-operator-quota.yaml

Step 4: Create GPU Operator Custom Addon¶

In this step, you will create a custom addon for the GPU Operator.

Download the GPU Operator addon specification file
Download the GPU Operator values file
Update the Project name in the nvidia-gpu-operator-values.yaml file with the name of the project to create the resource in
Execute the following RCTL command to create the GPU Operator addon

rctl --v3 apply -f 04-addon-nvidia-gpu-operator.yaml

Step 5: Create Blueprint¶

In this step, you will create a cluster blueprint which contains the previously created addons for the GPU Operator.

Download the blueprint specification file
Update the Project name in the YAML file with the name of the project to create the resource in
Execute the following RCTL command to create the repository

rctl --v3 apply -f 05-blueprint-gpu.yaml

Step 6: Label GPU Nodes¶

In this step, you will label the GPU nodes with the time-slice configuration. Be advised, the time slice configurations are defined in the GPU Operator Values file that was used previously in the GPU Operator addon.

Execute the following command for each GPU node being sure to update the node name in the command

kubectl label nodes <node-name> nvidia.com/device-plugin.config=time-slice-10

Step 7: Apply Node Taints¶

In this step, you will apply a taint to the GPU nodes to force our test application to run on the GPU nodes.

Execute the following command for each GPU node being sure to update the node name in the command

kubectl taint nodes <node-name> nvidia.com/gpu=Present:NoSchedule

Step 8: Deploy Test Application¶

Download the test application specification file
Under Applications, select Workloads, then create a New Workload with the name gpu-time-slice-testapp.
Set Package Type to k8s YAML
Select a namespace
Click CONTINUE.
Upload the 06-test-workload.yaml file that was downloaded and then go to the placement of workload.
Select the target cluster from the list of available clusters and click Save and go to publish.
Publish the workload and make sure that it gets published successfully in the target cluster before moving to the next step.

Step 9: Verify GPU Time Slicing¶

After deploying application in the cluster, let us verify that the test application is using the same GPU.

Execute the following command being sure to update the namespace where the test application was deployed

kubectl logs --all-containers -l app=gpu-workload -n <Namespace of Test Application>

In the output, you will see that the GPU UUID is the same for each pod, showing that each pod is running on the same GPU.

Optionally, you can navigate to Infrastructure -> Clusters and view the GPU count on the cluster card.

Recap¶

Congratulations! Now, you have successfully deployed GPU Time Slicing within your cluster.