Nvidia distributes their GPU Operator software via their official Helm repository. In this step, you will create a repository in your project so that the controller can retrieve the Helm charts automatically.
Open Terminal (on macOS/Linux) or Command Prompt (Windows) and navigate to the folder where you forked the Git repository
Navigate to the folder "/getstarted/gpueks/addon"
The "repository.yaml" file contains the declarative specification for the repository. In this case, the specification is of type "Helm Repository" and the "endpoint" is pointing to Nvidia's official Helm repository.
In this step, you will create a namespace for the Nvidia GPU Operator. The "namespace.yaml" file contains the declarative specification
The following items may need to be updated/customized if you made changes to these or used alternate names.
value: demo-gpu-eks
kind:ManagedNamespaceapiVersion:config.rafay.dev/v2metadata:name:gpu-operator-resourcesdescription:namespace for gpu-operatorlabels:annotations:spec:type:RafayWizardresourceQuota:placement:placementType:ClusterSpecificclusterLabels:-key:rafay.dev/clusterNamevalue:demo-gpu-eks
Open Terminal (on macOS/Linux) or Command Prompt (Windows) and navigate to the folder where you forked the Git repository
Navigate to the folder "/getstarted/gpueks/addon"
Type the command below
rctl create namespace -f namespace.yaml
If you did not encounter any errors, you can optionally verify if everything was created correctly on the controller.
Navigate to the "defaultproject" project in your Org
Select Infrastructure -> Namespaces
You should see a namespace called "gpu-operator-resources"
In this step, you will create a custom cluster blueprint with the Nvidia GPU Operator and a number of other system addons. The "blueprint.yaml" file contains the declarative specification.
Open Terminal (on macOS/Linux) or Command Prompt (Windows) and navigate to the folder where you forked the Git repository
Navigate to the folder "/getstarted/gpueks/blueprint"
Although we have a custom blueprint, we have not provided any details on what it comprises. In this step, you will create and add a new version to the custom blueprint. The YAML below is a declarative spec for the new version.
kind:BlueprintVersionmetadata:name:v1project:defaultprojectdescription:Nvidia GPU Operatorspec:blueprint:gpu-blueprintbaseSystemBlueprint:defaultbaseSystemBlueprintVersion:""addons:-name:gpu-operatorversion:v1# cluster-scoped or namespace-scopedpspScope:cluster-scopedrafayIngress:falserafayMonitoringAndAlerting:true# BlockAndNotify or DetectAndNotifydriftAction:BlockAndNotify
Type the command below to add a new version
rctl create blueprint version -f blueprint-v1.yaml
If you did not encounter any errors, you can optionally verify if everything was created correctly on the controller.
Select Infrastructure -> Blueprint
Click on the gpu-blueprint custom cluster blueprint
As of this step, you have created a "cluster blueprint" with the GPU Operator as one of the addons and applied the blueprint to the cluster.
Note that you can also reuse this cluster blueprint for as many clusters as you require in this project and also share the blueprint with other projects.