Billing
Rafay's Serverless Inference "counts" token usage for all users and centrally aggregates usage data. In addition to visualizing this data as dashboards for both administrators and end users, this data can also be retrieved programmatically by administrators/operators.
- GPU Cloud Providers deployments will use this API to integrate their billing platform to generate bills for users.
- Enterprise deployments will use this API for chargeback and showback workflows.
The APIs described below span multiple tenant orgs. It is necessary to use Partner API Credentials.
Summary of the APIs
| API Name | Purpose / Description |
|---|---|
| Total Usage Costs | Get total usage cost for a specific time range. |
| Usage Costs for Specific Org | Retrieve total usage cost grouped by tenant organization. |
| Usage Costs by Model | Get cost usage breakdown by deployed model. |
| Usage Cost by Project | Retrieve usage cost grouped by project. |
| Token Usage by User API Key | Get cost usage breakdown per user API key. |
| Usage by API Key & Model | Retrieve cost usage per API key for each model. |
| Aggregated Model Usage | Detailed time-series usage including tokens and costs. |
Note: Tenant Org, model deployment, and endpoint filters can be added to the request payload as optional parameters.
Total Usage Costs¶
API to get total usage cost for a given time period.
Endpoint: https://<Ops Console Domain>/api.gaap.k8smgmt.io/v1alpha1/cost/total
Method: POST
Request Payload:
{
"start": 1763274471,
"end": 1763879271
}
Example: Usage Cost for Last 7 days
Request Payload with Filters:
{
"start": 1763274471,
"end": 1763879271,
"organizationId": "7w2lnkp",
"modelId": "gpt-oss-120b-inf006"
}
Note: organizationId and modelId are optional filter parameters.
Response:
{
"$schema": "https://<Ops Console Domain>/api.gaap.k8smgmt.io/schemas/TotalCostUsage.json",
"apiVersion": "ai.rafay.dev/v1alpha1",
"kind": "TotalCostUsage",
"metadata": {
"name": "total-cost-usage"
},
"total_cost": 0.11354999999999998
}
Usage Costs for Specific Org¶
Get usage costs for the specified tenant organization.
Endpoint: https://<Ops Console Domain>/api.gaap.k8smgmt.io/v1alpha1/org/cost
Method: POST
Request Payload:
{
"start": 1763274471,
"end": 1763879271
}
Request Payload with Filters:
{
"start": 1763274471,
"end": 1763879271,
"organizationId": "7w2lnkp",
"modelId": "gpt-oss-120b-inf006"
}
Note: organizationId and modelId are optional filter parameters.
Response:
{
"$schema": "https://<Ops Console Domain>/api.gaap.k8smgmt.io/schemas/OrgCostUsageList.json",
"apiVersion": "ai.rafay.dev/v1alpha1",
"kind": "OrgCostUsageList",
"metadata": {
"count": 2,
"limit": 10
},
"items": [
{
"organizationId": "7w2lnkp",
"organizationName": "genai",
"total_cost": 0.07707
},
{
"organizationId": "q72dg2g",
"organizationName": "org-1",
"total_cost": 0.036480000000000005
}
]
}
Usage Costs by Specified Model¶
Get cost usage breakdown by model.
Endpoint: https://<Ops Console Domain>/api.gaap.k8smgmt.io/v1alpha1/model/cost
Method: POST
Request Payload:
{
"start": 1763274471,
"end": 1763879271
}
Request Payload with Filters:
{
"start": 1763274471,
"end": 1763879271,
"organizationId": "7w2lnkp",
"modelId": "gpt-oss-120b-inf006"
}
Note: organizationId and modelId are optional filter parameters.
Response:
{
"$schema": "https://<Ops Console Domain>/api.gaap.k8smgmt.io/schemas/ModelCostUsageList.json",
"apiVersion": "ai.rafay.dev/v1alpha1",
"kind": "ModelCostUsageList",
"metadata": {
"count": 4,
"limit": 10
},
"items": [
{
"modelId": "gpt-oss-120b-inf006",
"total_cost": 0.05418
},
{
"modelId": "qwen-deployment",
"total_cost": 0.045059999999999996
},
{
"modelId": "qwen-deployment-02",
"total_cost": 0.01413
},
{
"modelId": "vllm-qwen-sn",
"total_cost": 0.00018
}
]
}
Usage Cost by Project¶
Get cost usage breakdown by project in a tenant org.
Endpoint: https://<Ops Console Domain>/api.gaap.k8smgmt.io/v1alpha1/project/cost
Method: POST
Request Payload:
{
"start": 1763274471,
"end": 1763879271
}
Request Payload with Filters:
{
"start": 1763274471,
"end": 1763879271,
"organizationId": "7w2lnkp",
"modelId": "gpt-oss-120b-inf006"
}
Note: organizationId and modelId are optional filter parameters.
Response:
{
"$schema": "https://<Ops Console Domain>/api.gaap.k8smgmt.io/schemas/ProjectCostUsageList.json",
"apiVersion": "ai.rafay.dev/v1alpha1",
"kind": "ProjectCostUsageList",
"metadata": {
"count": 2,
"limit": 10
},
"items": [
{
"projectId": "defaultproject",
"projectName": "defaultproject",
"total_cost": 0.07707
},
{
"projectId": "project-1",
"projectName": "project-1",
"total_cost": 0.036480000000000005
}
]
}
Token Usage by User API Key¶
Get cost usage breakdown by user's API Key. Note that end users can have multiple API Keys.
Endpoint: https://<Ops Console Domain>/api.gaap.k8smgmt.io/v1alpha1/token/cost
Method: POST
Request Payload:
{
"start": 1763274471,
"end": 1763879271
}
Request Payload with Filters:
{
"start": 1763274471,
"end": 1763879271,
"organizationId": "7w2lnkp",
"modelId": "gpt-oss-120b-inf006"
}
Note: organizationId and modelId are optional filter parameters.
Response:
{
"$schema": "https://<Ops Console Domain>/api.gaap.k8smgmt.io/schemas/TokenCostUsageList.json",
"apiVersion": "ai.rafay.dev/v1alpha1",
"kind": "TokenCostUsageList",
"metadata": {
"count": 2,
"limit": 10
},
"items": [
{
"user_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"total_cost": 0.07707
},
{
"user_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"total_cost": 0.036480000000000005
}
]
}
Usage by User API Key and Model¶
Get cost usage for a user API Key and model.
Endpoint: https://<Ops Console Domain>/api.gaap.k8smgmt.io/v1alpha1/token/model/cost
Method: POST
Request Payload:
{
"start": 1763274471,
"end": 1763879271
}
Request Payload with Filters:
{
"start": 1763274471,
"end": 1763879271,
"organizationId": "7w2lnkp",
"modelId": "gpt-oss-120b-inf006"
}
Note: organizationId and modelId are optional filter parameters.
Response:
{
"$schema": "https://<Ops Console Domain>/api.gaap.k8smgmt.io/schemas/TokenModelCostUsageList.json",
"apiVersion": "ai.rafay.dev/v1alpha1",
"kind": "TokenModelCostUsageList",
"metadata": {
"count": 5,
"limit": 10
},
"items": [
{
"user_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"model_name": "gpt-oss-120b-inf006",
"total_cost": 0.05418
},
{
"user_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"model_name": "qwen-deployment",
"total_cost": 0.02271
},
{
"user_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"model_name": "qwen-deployment",
"total_cost": 0.02235
},
{
"user_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"model_name": "qwen-deployment-02",
"total_cost": 0.01413
},
{
"user_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"model_name": "vllm-qwen-sn",
"total_cost": 0.00018
}
]
}
Aggregated Usage for a Model¶
Get aggregated model usage data with detailed token and cost usage, grouped by time periods.
Endpoint: https:///api.gaap.k8smgmt.io/v1alpha1/model/usage/aggregated?limit=30
Method: POST
Query Parameters:
limit(optional): Maximum number of results to return (default: 30)
Request Payload:
{
"start": 1763274471,
"end": 1763879271
}
Request Payload with Filters:
{
"start": 1763274471,
"end": 1763879271,
"organizationId": "7w2lnkp",
"modelId": "gpt-oss-120b-inf006"
}
Note: organizationId and modelId are optional filter parameters.
Response:
{
"$schema": "https://<Ops Console Domain>/api.gaap.k8smgmt.io/schemas/AICostUsage.json",
"apiVersion": "ai.rafay.dev/v1alpha1",
"kind": "AICostUsage",
"metadata": {
"count": 8,
"limit": 30
},
"items": [
{
"startTime": {
"seconds": 1763251200
}
},
{
"startTime": {
"seconds": 1763337600
}
},
{
"startTime": {
"seconds": 1763424000
}
},
{
"startTime": {
"seconds": 1763510400
}
},
{
"startTime": {
"seconds": 1763596800
},
"usages": [
{
"model_id": "gpt-oss-120b-inf006",
"partnerId": "wg29ek0",
"organizationId": "7w2lnkp",
"projectId": "defaultproject",
"input_tokens": 180,
"output_tokens": 512,
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"input_tokens_cost": 0.0054,
"output_tokens_cost": 0.03072,
"total_tokens": 692,
"total_cost": 0.03612,
"startTime": {
"seconds": 1763596800
}
},
{
"model_id": "vllm-qwen-sn",
"partnerId": "wg29ek0",
"organizationId": "7w2lnkp",
"projectId": "defaultproject",
"input_tokens": 9,
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"input_tokens_cost": 0.00018,
"total_tokens": 9,
"total_cost": 0.00018,
"startTime": {
"seconds": 1763596800
}
}
]
},
{
"startTime": {
"seconds": 1763683200
},
"usages": [
{
"model_id": "qwen-deployment",
"partnerId": "wg29ek0",
"organizationId": "q72dg2g",
"projectId": "project-1",
"input_tokens": 240,
"output_tokens": 1995,
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"input_tokens_cost": 0.0024000000000000002,
"output_tokens_cost": 0.01995,
"total_tokens": 2235,
"total_cost": 0.02235,
"startTime": {
"seconds": 1763683200
}
},
{
"model_id": "qwen-deployment-02",
"partnerId": "wg29ek0",
"organizationId": "q72dg2g",
"projectId": "project-1",
"input_tokens": 180,
"output_tokens": 1233,
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"input_tokens_cost": 0.0018,
"output_tokens_cost": 0.01233,
"total_tokens": 1413,
"total_cost": 0.01413,
"startTime": {
"seconds": 1763683200
}
},
{
"model_id": "qwen-deployment",
"partnerId": "wg29ek0",
"organizationId": "7w2lnkp",
"projectId": "defaultproject",
"input_tokens": 270,
"output_tokens": 2001,
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"input_tokens_cost": 0.0026999999999999997,
"output_tokens_cost": 0.02001,
"total_tokens": 2271,
"total_cost": 0.02271,
"startTime": {
"seconds": 1763683200
}
}
]
},
{
"startTime": {
"seconds": 1763769600
}
},
{
"startTime": {
"seconds": 1763856000
},
"usages": [
{
"model_id": "gpt-oss-120b-inf006",
"partnerId": "wg29ek0",
"organizationId": "7w2lnkp",
"projectId": "defaultproject",
"input_tokens": 90,
"output_tokens": 256,
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...MASKED_TOKEN...",
"input_tokens_cost": 0.0027,
"output_tokens_cost": 0.01536,
"total_tokens": 346,
"total_cost": 0.01806,
"startTime": {
"seconds": 1763856000
}
}
]
}
]
}