Usage Metering
Every end user has access to their personalized dashboard where they are presented with a summary and trend of their usage. Click on "Token Usage" to navigate to the usage dashboard. Operators are provided with three visualization dashboards:
- Overview
- Token Usage
- Model Analytics
Overview¶
The overview dashboard provides a centralized overview of language model API usage, costs, and efficiency metrics.
Summary Metrics¶
The top row presents five key performance indicators in card format:
- Total Spend — Shows the daily run rate and the reporting window (1 day).
- Tokens —Total tokens consumed, at a rate of x/day.
- Models — Models in use. The "Top Spender" label identifies the model
- Rate/1K — Blended average cost per 1,000 tokens across input and output.
- Efficiency — Expressed as "tokens per dollar", indicating how many tokens are generated per unit of spend.
Token Distribution¶
A horizontal stacked bar shows the split between input and output tokens: 52% input vs. 48% output. Below the bar, two detail cards break this down further: Input at 28K tokens (at a rate of 500,000 tokens per dollar) and Output at 26K tokens (at 250,000 tokens per dollar).
Cost Distribution¶
A similar horizontal bar visualizes spend by token type: 35% input cost vs. 65% output cost. The detail cards show Input Cost at 0.06 (0.0020 per 1K tokens) and Output Cost at 0.10 (0.0040 per 1K tokens). This highlights that output tokens are twice as expensive per token as input tokens, making them the dominant cost driver despite being slightly fewer in count.
Top Organizations¶
A donut chart and table show usage broken down by organization. Only one organization — Acme — is present, accounting for 100% of activity. The center of the donut displays $0 as the total (likely a rounding or display threshold artifact given the $0.16 total spend shown above).
Model Distribution¶
A second donut chart breaks down spend by model. One model is active: qwen-cpu-deployment, contributing $0.16 and 100% of usage. The purple segment fills the entire ring, reflecting single-model usage.
Token Usage¶
The Token Usage Dashboard provides a comprehensive view of token consumption and associated costs across inference workloads. It enables the operator to monitor overall usage, analyze input vs. output token distribution, and track cost and volume trends over time at the deployment or model level.
At the top level, the dashboard highlights:
- Total Tokens consumed within the selected timeframe
- Input Tokens processed by the models
- Output Tokens generated in responses
Detailed visualizations include:
- Token Usage Cost Trend – Displays how token consumption translates into cost over time, segmented by deployment or model.
- Token Count Trend – Shows token volume patterns, helping identify usage spikes, workload shifts, or scaling events.
This dashboard empowers platform and AI teams to optimize model efficiency, control operational costs, and make data-driven decisions about scaling, model selection, and workload optimization.
Model Analytics¶
This image below shows the Model Analytics tab, the third of three top-level views (alongside Overview and Token Usage). It provides model-level performance insights, concentration analysis, and cost trending. A Filters control remains accessible in the upper-right corner.
Model Insights¶
The top section is titled Model Insights with the subtitle "Key metrics & recommendations." It contains four summary cards, each color-coded by category:
- Total (green label) — This represents the aggregate model-level spend (the $0 display likely reflects a rounding or display threshold, given the $0.16 total shown elsewhere on the dashboard).
- Concentration (teal label) — This metric indicates how concentrated usage is across available models; 100% means all traffic is routed to a single deployment.
- Top (red/coral label) — This identifies the highest-cost model in the environment.
- Avg (red/coral label) — This is the average cost per model, which equals the total when only one model is active.
Info
Below the cards, an informational banner with a blue icon is labeled Single Model Usage. It reads: "All cost is concentrated in a single model. Consider exploring additional models for better flexibility and risk distribution." This serves as a proactive recommendation to platform operators to diversify their model portfolio.
Usage Trend¶
The lower-left section displays a Usage Trend area chart with the subtitle "Cost over time." The Y-axis ranges from $0 to $0.015 and the X-axis spans a single day (Feb 22, 12:00 AM through Feb 22, 12:00 PM). The chart plots cost for qwen-cpu-deployment (shown in green in the legend).
The trend line fluctuates slightly around the $0.01 mark, with minor peaks and dips indicating small variations in request volume or token counts across time intervals. The shaded green area beneath the line provides a visual mass to emphasize the cost footprint over time.
Model List¶
The lower-right section is a Model List table. A header shows "1 models" on the left and $0.16 Total Cost on the right. The table has three columns: Model, Cost & Share, and Distribution.
The single row lists qwen-cpu-deployment (with a numbered yellow badge "1"), a cost of $0.16 with "100.0% of total," and a horizontal yellow/gold bar representing its full share of the distribution. This table would scale to show multiple rows when additional models are deployed, allowing side-by-side cost comparison.
Filters¶
A number of filters are available for the administrator to visualize the data in their desired format.
Time Range¶
The user can specify the following time constraints for the data.
- Date Range
- Start and End Time
Other Criteria¶
Users can also visualize the usage data based on the following parameters:
- Tenant Org
- Project
- Model Deployment
- Endpoint
- Cost Type (input vs output tokens)



