User Types

Service providers may wish to offer Serverless Inference to both enterprises and individual end users (aka retail users).

Enterprise Customer¶

This section describes how enterprise end users, tenant organizations, and administrators interact with LLM deployments hosted on Rafay Serverless Inference. The diagram illustrates the logical flow of identity, access, and usage attribution in a multi-tenant environment.

Overview¶

Rafay Serverless Inference provides a multi-tenant architecture designed for enterprises that require isolated environments for groups of users while sharing centralized LLM infrastructure. Each enterprise customer is represented as a Tenant Organization, which includes:

End Users who consume LLM APIs
Org Admins who manage users, API keys, and policies
Logical segmentation of usage, billing, and access

Multiple tenant orgs can simultaneously access Rafay-managed LLM deployments, while remaining fully isolated from each other.

Key Components¶

End Users (Enterprise)¶

End users interact with the system through API calls or UI-driven inference requests.

Characteristics: 1. Belong to a specific tenant organization 2. Operate using tenant-scoped API keys 3. Have access only to models and endpoints permitted by their tenant 4. Their token usage is measured and attributed to the tenant and the individual’s API key

These users are shown on the left side of the diagram, with each user’s request flowing into their assigned tenant.

Tenant Organizations¶

Each enterprise customer is assigned a Tenant Org within Rafay’s multi-tenant control plane. A Tenant Org provides:

Logical isolation of users
Role-based access controls
Segregated API keys
Per-tenant usage metering
Per-tenant cost tracking
Customizable access to LLM deployments

In the diagram, Tenant Org-1 and Tenant Org-2 represent two independent enterprise customers accessing shared LLM infrastructure.

Org Admins¶

Every tenant has one or more Org Admins, responsible for:

Managing end users
Assigning roles and permissions
Monitoring token usage and cost
Configuring access to specific LLM deployments
Overseeing quota management and rate limits

Retail Customer¶

This section describes how retail end users interact with LLM deployments hosted on Rafay Serverless Inference. The diagram illustrates the logical flow of users connecting to their assigned tenant environments, which then access shared LLM deployments through Rafay Serverless Inference.

Retail users (i.e. a casual data scientist, user) often require a simplified multi-tenant model where individual users operate within tenant environments, but do not require the hierarchical admin structure seen in enterprise deployments. Rafay Serverless Inference enables this by providing isolated Tenant Orgs for retail applications while allowing shared access to centralized LLM deployments.

Each tenant org represents an isolated environment with:

End Users who directly consume LLM APIs
Segregated usage attribution and metering
Logical separation of access, quotas, and API keys

Multiple tenant orgs (up to N) can simultaneously access Rafay-managed LLM deployments while remaining fully segregated.

Key Components¶

End Users (Retail)¶

End users in a retail environment directly interact with LLM capabilities through applications or API calls.

Characteristics:

Each user is associated with a single tenant org
Users operate using API keys scoped to their tenant
Users may have restricted access based on application configuration
All token usage is attributed to the tenant org and, optionally, the user’s API key

In the diagram, these users appear on the left, with arrows flowing into their respective tenant orgs.

Tenant Organizations¶

Each retail customer or application tier is represented as a Tenant Org in Rafay’s multi-tenant system.

A Tenant Org provides

Logical separation from other tenants.
Independent API keys and access policies
Isolated token usage, cost tracking, and quotas
Custom control over which models or endpoints the tenant can access

All tenant orgs connect to the shared LLM deployment environment on the right. All tenant orgs interact with shared LLM deployments hosted on the fully managed Serverless Inference platform. While the infrastructure is shared, each tenant’s usage is isolated and reported independently.