Skip to content

Part 4: Workload

What Will You Do

In this part of the self-paced exercise, you will deploy Nvidia's Triton Inference Server to your Amazon EKS cluster that has a GPU node group.

Step 1: Create Workload Namespace

We will now create a namespace for the Triton Inference Server resources.

  • Open Terminal (on macOS/Linux) or Command Prompt (Windows) and navigate to the folder where you forked the Git repository
  • Navigate to the folder "/getstarted/tritoneks/workload"
  • Type the command
rctl create ns -f triton-namespace.yaml 

This step creates a namespace in your project. The controller can create a namespace on "multiple clusters" based on its placement policy.

rctl publish ns triton


To verify that the namespace was successfully created on your EKS cluster, run the following kubectl command

kubectl get ns triton 

You should see results like the following showing the namespace on the cluster.

triton   Active   11s

Step 2: Create Metrics Repository

The Triton Inference Server requires both Prometheus and Grafana to be deployed. In this step, you will create a repository in your project so that the controller can retrieve the Helm chart to deploy these resources.

  • Open Terminal (on macOS/Linux) or Command Prompt (Windows) and navigate to the folder where you forked the Git repository
  • Navigate to the folder "/getstarted/tritoneks/workload"

The "metrics-repository.yaml" file contains the declarative specification for the repository.

If you used a different cluster name, be sure to update the spec with the new cluster name.

kind: Repository
  name: triton-metrics
  repositoryType: GitRepository
  credentialType: CredentialTypeNotSet

Type the command below

rctl create repository -f metrics-repository.yaml

If you did not encounter any errors, you can optionally verify if everything was created correctly on the controller.

  • Navigate to your Org and Project
  • Select Integrations -> Repositories and click on "triton-metrics"

Metrics Repository

Step 3: Deploy Metrics Workload

The Triton Inference Server requires both Prometheus and Grafana to be deployed. In this step, we will deploy these resources as a workload with a custom override.

  • Navigate to the console, select Applications -> Workloads
  • Click New Workload -> Create New Workload
  • Enter triton-metrics for the name
  • Select Helm 3 for the package type
  • Select Pull files from repository
  • Select Git for the repository type
  • Select triton for the namespace
  • Click Continue


  • Select triton-metrics for the repository
  • Enter main for the revision
  • Enter charts/kube-prometheus-stack for the path
  • Select Value Path
  • Click ADD PATH
  • Enter charts/kube-prometheus-stack/values.yaml for the path
  • Click Save and Go to Placement


  • Select the GPU cluster
  • Click Save and Go to Publish


  • Click Exit

  • Navigate to Applications -> Cluster Overrides

  • Select New Override
  • Enter triton-metrics for the name
  • Select Helm for the file type
  • Click Create


  • Select triton-metrics for the resource selcector
  • Select Specific Clusters for placement type
  • Select the GPU Cluster


  • Select Upload file manually
  • Enter the following text into the window

    serviceMonitorSelectorNilUsesHelmValues: false
- Click Save Changes


  • Navigate to Applications -> Workloads
  • Click on the triton-metrics workload
  • Go to the Publish tab
  • Click Publish

The workload is now published using the overrides.


  • Click Exit

Step 4: Create Triton Repository

In this step, you will create a repository in your project so that the controller can retrieve the Helm chart to deploy the Triton Inference Server.

  • Open Terminal (on macOS/Linux) or Command Prompt (Windows) and navigate to the folder where you forked the Git repository
  • Navigate to the folder "/getstarted/tritoneks/workload"

The "triton-repository.yaml" file contains the declarative specification for the repository.

If you used a different cluster name, be sure to update the spec with the new cluster name.

kind: Repository
  name: triton-server
  repositoryType: GitRepository
  credentialType: CredentialTypeNotSet

Type the command below

rctl create repository -f triton-repository.yaml

If you did not encounter any errors, you can optionally verify if everything was created correctly on the controller.

  • Navigate to your Org and Project
  • Select Integrations -> Repositories and click on "triton-server"

Triton Repository

Step 5: Deploy Triton Workload

In this step, we will deploy the Triton Inference Server with a custom override.

  • Navigate to the console, select Applications -> Workloads
  • Click New Workload -> Create New Workload
  • Enter triton-server for the name
  • Select Helm 3 for the package type
  • Select Pull files from repository
  • Select Git for the repository type
  • Select triton for the namespace
  • Click Continue


  • Select triton-server for the repository
  • Enter main for the revision
  • Enter deploy/aws/ for the path
  • Select Value Path
  • Click ADD PATH
  • Enter deploy/aws/values.yaml for the path
  • Click Save and Go to Placement


  • Select the GPU cluster
  • Click Save and Go to Publish


  • Click Exit

  • Navigate to Applications -> Cluster Overrides

  • Select New Override
  • Enter triton-server for the name
  • Select Helm for the file type
  • Click Create


  • Select triton-server for the resource selcector
  • Select Specific Clusters for placement type
  • Select the GPU Cluster


  • Select Upload file manually
  • Enter the following text into the window. Be sure to populate the values with the correct information for your environment.

  modelRepositoryPath: s3://triton-inference-server-repo/model_repository/
  numGpus: 1
  region: <AWS Region base64 encoded>
  id: <AWS_SECRET_KEY_ID base64 encoded>
  key: <AWS_SECRET_ACCESS_KEY base64 encoded>
- Click Save Changes


  • Navigate to Applications -> Workloads
  • Click on the triton-server workload
  • Go to the Publish tab
  • Click Publish

The workload is now published using the overrides.


  • Click Exit

Step 6: Verify Workload

We will now verify the Triton Inference Server is up and running.

  • Navigate to the console, select Infrastructure -> Clusters
  • Click kubectl on the GPU cluster
  • Enter the following command
kubectl get services -n triton

Locate the EXTRERNAL-IP of the Triton Inference Server in the output

kubectl get services -n triton
NAME                                            TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                                        AGE
alertmanager-operated                           ClusterIP      None             <none>                                                                   9093/TCP,9094/TCP,9094/UDP                     29m
prometheus-operated                             ClusterIP      None             <none>                                                                   9090/TCP                                       29m
triton-metrics-kube-promet-alertmanager         ClusterIP     <none>                                                                   9093/TCP                                       29m
triton-metrics-kube-promet-operator             ClusterIP   <none>                                                                   443/TCP                                        29m
triton-metrics-kube-promet-prometheus           ClusterIP    <none>                                                                   9090/TCP                                       29m
triton-server-triton-inference-server           LoadBalancer   8000:32522/TCP,8001:32350/TCP,8002:30133/TCP   14m
triton-server-triton-inference-server-metrics   ClusterIP    <none>                                                                   8080/TCP                                       14m
  • Add :8000/v2 to the end of the External-IP and enter the URL into a browser

You will see something similar to the following showing the running server

Published Workload


Congratulations! At this point, you have successfuly configured and provisioned an Amazon EKS cluster with the Triton Inference Server.