Skip to content

GPU/Neo Cloud Billing using Rafay’s Usage Metering APIs

Cloud providers offering GPU or Neo Cloud services need accurate and automated mechanisms to track resource consumption. Usage data becomes the foundation for billing, showback, or chargeback models that customers expect. The Rafay Platform provides usage metering APIs that can be easily integrated into a provider’s billing system. '

In this blog, we’ll walk through how to use these APIs with a sample Python script to generate detailed usage reports.

Usage Metering


Prerequisites & Environment

This exercise assumes that you have access to an instance of the Rafay Platform. Also ensure that you have Org Admin level access to the Default Org so that you can use the API Keys to programmatically retrieve the usage metering data.

Set the following environment variables on your system. Ensure you update with the correct values for your environment.

export DAYS=30
export RAFAY_CONSOLE_URL=rafay.acme.com 
export RAFAY_DEFAULT_API_KEY=default_org_api_key
  • DAYS — metering window (lookback) in days
  • RAFAY_CONSOLE_URL — Base domain for your Rafay Platform (no protocol)
  • RAFAY_DEFAULT_API_KEY — Org Admin API key for the Default Org used for x-api-key auth

Type "env" in Terminal to verify if the variables were set correctly.

Tip: Cloud Providers can run this script via a nightly cron/Kubernetes CronJob to keep metering data current on their systems.


What the Example Script Produces

The example script will use the APIs to retrieve the data and generate two timestamped CSVs in the working directory:

  • ncp-metrics-.csv
  • ncp-metrics-sorted-.csv (sorted for convenient downstream processing)

Info

Columns include organization, profile type, profile, instance, usage (hours), and status — ideal for billing ETL and dashboards.


Full Annotated Script

Shown below is working sample code in Python to retrieve the usage/metering data.

"""
ncp_metrics.py — Annotated

Purpose: Retrieve usage metering data from the Rafay Platform for a configurable window
         (DAYS env var) and write results to timestamped CSV files for billing.

Environment variables:
  - DAYS: Number of days to look back (e.g., 30 days).
  - RAFAY_CONSOLE_URL: Your Rafay Platform's base domain (e.g., rafay.acme.com).
  - RAFAY_DEFAULT_API_KEY: Default Org's Org Admin API key used for x-api-key auth.

Typical usage:
  $ export DAYS=30
  $ export RAFAY_CONSOLE_URL=rafay.acme.com
  $ export RAFAY_DEFAULT_API_KEY=default_org_api_key
  $ python ncp_metrics.py

Notes:
  - Output files: ncp-metrics-<timestamp>.csv and a sorted variant,
                  ncp-metrics-sorted-<timestamp>.csv
  - Safe to run as a cron job / Kubernetes CronJob for nightly metering pulls.
"""

import csv
import json
import os
import requests
import sys
import time
from datetime import datetime, timedelta, timezone


# --- main(): see description below ---
# Entrypoint: computes time window, calls helper functions, writes CSVs.
# Helper function used by main().
def main():
    timestr = time.strftime("%m%d%Y-%H%M%S")
    metrics_row = ["Organization", "Profile Type", "Profile", "Instance", "Usage(h)", "Status"]
    filename = "ncp-metrics-" + timestr + ".csv"
    filename_sorted = "ncp-metrics-sorted-" + timestr + ".csv"
    fd_csv = open(filename, 'w')
    csv_writer = csv.writer(fd_csv)

    # Check required env vars
    DAYS = int(os.environ.get('DAYS', None))
    RAFAY_CONSOLE_URL = os.environ.get('RAFAY_CONSOLE_URL', None)
    RAFAY_DEFAULT_API_KEY = os.environ.get('RAFAY_DEFAULT_API_KEY', None)

    if DAYS is None:
        print("Please set DAYS environment variable for the duration to collect metrics for")
        sys.exit(1)
    if RAFAY_CONSOLE_URL is None:
        print("Please set RAFAY_CONSOLE_URL environment variable to your console URL")
        sys.exit(1)
    if RAFAY_DEFAULT_API_KEY is None:
        print("Please set RAFAY_DEFAULT_API_KEY environment variable to your default org API key")
        sys.exit(1)

    # Output header
    csv_writer.writerow(metrics_row)

    # Compute the time window in UTC (now minus DAYS)
    current_time_utc = datetime.now(timezone.utc)
    past_time_utc = current_time_utc - timedelta(days=int(DAYS))

    current_time_str = get_formatted_utc_timestamp(current_time_utc)
    past_time_str = get_formatted_utc_timestamp(past_time_utc)

    # Fetch organizations (tenants)
    organizations = get_organizations(RAFAY_CONSOLE_URL, RAFAY_DEFAULT_API_KEY)

    # For each org, fetch usage details across profile types/profiles/instances
    for org in organizations:
        org_name = org.get('name', 'Unknown')
        profiles = get_profiles(RAFAY_CONSOLE_URL, org['metadata']['name'], RAFAY_DEFAULT_API_KEY)

        for profile in profiles:
            profile_type = profile.get('spec', {}).get('type', 'Unknown')
            profile_name = profile.get('metadata', {}).get('name', 'Unknown')

            # Fetch instance usage within the time window
            instance_usage = get_profile_instance_usage(
                RAFAY_CONSOLE_URL,
                org['metadata']['name'],
                profile_name,
                past_time_str,
                current_time_str,
                RAFAY_DEFAULT_API_KEY
            )

            # Write each row to CSV
            for item in instance_usage.get("instance_usage_data", []):
                row = [
                    org_name,
                    profile_type,
                    profile_name,
                    item.get('instance_name', ''),
                    item.get('usage_hours', 0),
                    item.get('status', '')
                ]
                csv_writer.writerow(row)

    fd_csv.close()

    # Produce a sorted version of the CSV for easy consumption
    sort_csv(filename, filename_sorted)


# --- get_formatted_utc_timestamp(): see description below ---
# Helper function used by main().
def get_formatted_utc_timestamp(dt: datetime) -> str:
    # Format: 2023-09-01T12:34:56Z
    return dt.strftime("%Y-%m-%dT%H:%M:%SZ")


# --- get_organizations(): see description below ---
# Helper function used by main().
def get_organizations(console_url: str, api_key: str):
    """
    Returns a list of organizations accessible to the API key.
    """
    url = f"https://{console_url}/v2/organizations"
    headers = {
        "content-type": "application/json",
        "x-api-key": api_key,
    }
    r = requests.get(url, headers=headers, timeout=60)
    r.raise_for_status()
    data = r.json()
    # Expect data to contain 'items' with org metadata
    return data.get('items', [])


# --- get_profiles(): see description below ---
# Helper function used by main().
def get_profiles(console_url: str, org_id: str, api_key: str):
    """
    Returns a list of NCP (Neo Cloud Platform/GPU) profiles for an organization.
    """
    url = f"https://{console_url}/v2/organizations/{org_id}/ncp/profiles"
    headers = {
        "content-type": "application/json",
        "x-api-key": api_key,
    }
    r = requests.get(url, headers=headers, timeout=60)
    r.raise_for_status()
    data = r.json()
    return data.get('items', [])


# --- get_profile_instance_usage(): see description below ---
# Helper function used by main().
def get_profile_instance_usage(
    console_url: str,
    org_id: str,
    profile_name: str,
    start_time: str,
    end_time: str,
    api_key: str
):
    """
    Returns usage details (per instance) between start_time and end_time
    for a given org/profile.
    """
    url = (
        f"https://{console_url}/v2/organizations/{org_id}/ncp/profiles/"
        f"{profile_name}/usage?start_time={start_time}&end_time={end_time}"
    )
    headers = {
        "content-type": "application/json",
        "x-api-key": api_key,
    }
    r = requests.get(url, headers=headers, timeout=90)
    r.raise_for_status()
    return r.json()


# --- sort_csv(): see description below ---
# Helper function used by main().
def sort_csv(input_file: str, output_file: str):
    """
    Sorts the input CSV by Organization, Profile Type, Profile, Instance.
    Writes the sorted rows to output_file with the same header.
    """
    try:
        with open(input_file, mode='r', newline='') as infile:
            reader = csv.reader(infile)
            header = next(reader, None)

            # Define sort key based on column positions
            def sort_key(row):
                return (row[0], row[1], row[2], row[3])

            # Read all and sort (skip empty rows)
            data = [row for row in reader if row]
            sorted_data = sorted(data, key=sort_key)

        with open(output_file, mode='w', newline='') as outfile:
            writer = csv.writer(outfile)
            writer.writerow(header)
            writer.writerows(sorted_data)

        print(f"Successfully sorted data and saved to '{output_file}'.")

    except FileNotFoundError:
        print(f"Error: The file '{input_file}' was not found.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")


if __name__ == "__main__":
    main()

Running the Script

To run the script, use the following Python command

python3 ncp_metrics.py 

You should see something like the following. In the output (results trucated) below, you can see that there are several tenants (Orgs) called Coke, Acme and Pepsi. The script is iterating through the instances of SKUs spanning both Coke and Pepsi tenants, reporting the usage for each instance.

Organization: Coke
Profile: managed-developer-pods-v2
Instance: demo-serverless-pod
Usage: 286.59h

Organization: Pepsi
Profile: openwebui
Instance: lan-openwebui
Usage: 52.54h

Organization: Pepsi 
Profile: unsloth-finetune
Instance: test-abc
Usage: 0.12h

Organization: Coke
Profile: slurm-k8s
Instance: test-mohan
Usage: 17.73h

Organization: Acme
Profile: h110-small-vm
Instance: nvidia-h100-8gpu-vm
Usage: 0.34h


--- Sorting CSV File ---
Reading from 'ncp-metrics-09142025-075429.csv', sorting by column 'Organization'...
Successfully sorted data and saved to 'ncp-metrics-sorted-09142025-075429.csv'.

Important

The API returns a lot more data. In this example script, we have limited the output to only select fields from the available output.

An example of results in the "unsorted" CSV is shown below

Example CSV Usage Metering

Shown below is an example of results in the "sorted" CSV. In this example,

  • First, all rows are grouped by the name of the Org (Tenant)
  • Next, within each Org, rows are sorted by Profile Type
  • Next, within each profile type, sorted again by Profile.
  • Finally, within each profile, results are sorted by Instance.

Example CSV Usage Metering


Integrate Usage Metering Data into Billing System

Cloud Providers and Enterprises can use the following approach to integrate the usage and metering data into their billing or chargeback systems

  1. ETL/ELT into your billing DB (e.g., Postgres, BigQuery).
  2. Join usage rows with your price book (e.g., by profile type/name or instance attributes).
  3. Calculate charges i.e. gpu_hours * rate
  4. Optionally add surcharges (priority queueing, reserved vs. on-demand, storage, egress).
  5. Generate invoices and expose line items in customer portal.

Operational Tips

  • Data Pipeline: Consider separating price modeling from usage collection so you can adjust pricing without changing this data pipeline.
  • Resilience: Add retry/backoff around GETs; log failures per org/profile.
  • Idempotency: Use unique output filenames and keep raw CSVs for audit.
  • Security: Keep the API key in a secret store (Kubernetes Secret, Vault) instead of env var in production.
  • Observability: Emit metrics (# orgs, profiles scanned, API latency, rows written).