Skip to content

User Types

Service providers may wish to offer Serverless Inference to both enterprises and individual end users (aka retail users).


Enterprise Customer

This section describes how enterprise end users, tenant organizations, and administrators interact with LLM deployments hosted on Rafay Serverless Inference. The diagram illustrates the logical flow of identity, access, and usage attribution in a multi-tenant environment.

Enterprise Customer


Overview

Rafay Serverless Inference provides a multi-tenant architecture designed for enterprises that require isolated environments for groups of users while sharing centralized LLM infrastructure. Each enterprise customer is represented as a Tenant Organization, which includes:

  • End Users who consume LLM APIs
  • Org Admins who manage users, API keys, and policies
  • Logical segmentation of usage, billing, and access

Multiple tenant orgs can simultaneously access Rafay-managed LLM deployments, while remaining fully isolated from each other.


Key Components

End Users (Enterprise)

End users interact with the system through API calls or UI-driven inference requests.

Characteristics: 1. Belong to a specific tenant organization 2. Operate using tenant-scoped API keys 3. Have access only to models and endpoints permitted by their tenant 4. Their token usage is measured and attributed to the tenant and the individual’s API key

These users are shown on the left side of the diagram, with each user’s request flowing into their assigned tenant.

Tenant Organizations

Each enterprise customer is assigned a Tenant Org within Rafay’s multi-tenant control plane. A Tenant Org provides:

  • Logical isolation of users
  • Role-based access controls
  • Segregated API keys
  • Per-tenant usage metering
  • Per-tenant cost tracking
  • Customizable access to LLM deployments

In the diagram, Tenant Org-1 and Tenant Org-2 represent two independent enterprise customers accessing shared LLM infrastructure.

Org Admins

Every tenant has one or more Org Admins, responsible for:

  • Managing end users
  • Assigning roles and permissions
  • Monitoring token usage and cost
  • Configuring access to specific LLM deployments
  • Overseeing quota management and rate limits

Retail Customer

This section describes how retail end users interact with LLM deployments hosted on Rafay Serverless Inference. The diagram illustrates the logical flow of users connecting to their assigned tenant environments, which then access shared LLM deployments through Rafay Serverless Inference.

Retail Customer

Retail users (i.e. a casual data scientist, user) often require a simplified multi-tenant model where individual users operate within tenant environments, but do not require the hierarchical admin structure seen in enterprise deployments. Rafay Serverless Inference enables this by providing isolated Tenant Orgs for retail applications while allowing shared access to centralized LLM deployments.

Each tenant org represents an isolated environment with:

  • End Users who directly consume LLM APIs
  • Segregated usage attribution and metering
  • Logical separation of access, quotas, and API keys

Multiple tenant orgs (up to N) can simultaneously access Rafay-managed LLM deployments while remaining fully segregated.


Key Components

End Users (Retail)

End users in a retail environment directly interact with LLM capabilities through applications or API calls.

Characteristics:

  1. Each user is associated with a single tenant org
  2. Users operate using API keys scoped to their tenant
  3. Users may have restricted access based on application configuration
  4. All token usage is attributed to the tenant org and, optionally, the user’s API key

In the diagram, these users appear on the left, with arrows flowing into their respective tenant orgs.


Tenant Organizations

Each retail customer or application tier is represented as a Tenant Org in Rafay’s multi-tenant system.

A Tenant Org provides

  • Logical separation from other tenants.
  • Independent API keys and access policies
  • Isolated token usage, cost tracking, and quotas
  • Custom control over which models or endpoints the tenant can access

All tenant orgs connect to the shared LLM deployment environment on the right. All tenant orgs interact with shared LLM deployments hosted on the fully managed Serverless Inference platform. While the infrastructure is shared, each tenant’s usage is isolated and reported independently.