Intermediate
In this guide you will review usage metrics at the operator level by deploying a load generator to populate usage metrics within your existing Token Factory deployment.
Assumptions¶
This exercise assumes you have completed the Token Factory Basics Get Started Guide
1. Create Load Generator¶
In this section, you will create a GenAI load generator on your existing Kubernetes cluster node.
- SSH into the Kubernetes Cluster node
- Run the following commands to install the load testing tool hey
sudo apt update
sudo apt install hey -y
- Verify the installation be running the following command to see the version of "hey"
hey -v
- Create the load test script by running the following command
vi run_load_test.sh
- Add the following content to the script and save it. Be sure to update the API Key and the Endpoint URL. These values can be found within the model used in the previous Basics Get Started guide.
#!/bin/bash
PATH=/usr/bin
API_KEY="<YOUR_API_KEY>"
ENDPOINT_URL="<YOUR_ENDPOINT_URL>"
MODEL_NAME="gs-deployment"
PROMPT="What is the best opensource inference library in general?"
hey -n 300 -c 15 -t 90 -m POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d "{\"model\": \"$MODEL_NAME\", \"temperature\": 0.1, \"max_tokens\": 128, \"messages\": [{\"role\": \"user\", \"content\": \"$PROMPT\"}]}" \
"ENDPOINT_URL"
- Run the following command to make the file executable
chmod +x run_load_test.sh
- Run the following command to setup a cronJob to run the script
crontab -e
- Add the following line to the crontab to run the script every 2 minutes. Be sure to update the Script Path in the command
*/2 * * * * /<SCRIPT_PATH>/run_load_test.sh >> /var/log/loadtest.log 2>&1
- After 2 minutes, run the following command to verify the script is working
tail -f /var/log/loadtest.log
You will see output similar to the following:
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0004 secs
req write: 0.0000 secs, 0.0000 secs, 0.0006 secs
resp wait: 1.6844 secs, 0.7029 secs, 2.4664 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0004 secs
Status code distribution:
[200] 300 responses
2. View Metrics¶
Next, you will use the Ops console to view the Token Factory usage metrics across models and tenants.
- In the Ops console, navigate to GenAI -> Token Usage
You will see the Overview Dashboard
- Click on the Token Usage tab to see the token usage metrics
- Click on the Model Analytics tab to see the model analytics


