User Types
Service providers may wish to offer Serverless Inference to both enterprises and individual end users (aka retail users).
Enterprise Customer¶
This section describes how enterprise end users, tenant organizations, and administrators interact with LLM deployments hosted on Rafay Serverless Inference. The diagram illustrates the logical flow of identity, access, and usage attribution in a multi-tenant environment.
Overview¶
Rafay Serverless Inference provides a multi-tenant architecture designed for enterprises that require isolated environments for groups of users while sharing centralized LLM infrastructure. Each enterprise customer is represented as a Tenant Organization, which includes:
- End Users who consume LLM APIs
- Org Admins who manage users, API keys, and policies
- Logical segmentation of usage, billing, and access
Multiple tenant orgs can simultaneously access Rafay-managed LLM deployments, while remaining fully isolated from each other.
Key Components¶
End Users (Enterprise)¶
End users interact with the system through API calls or UI-driven inference requests.
Characteristics: 1. Belong to a specific tenant organization 2. Operate using tenant-scoped API keys 3. Have access only to models and endpoints permitted by their tenant 4. Their token usage is measured and attributed to the tenant and the individual’s API key
These users are shown on the left side of the diagram, with each user’s request flowing into their assigned tenant.
Tenant Organizations¶
Each enterprise customer is assigned a Tenant Org within Rafay’s multi-tenant control plane. A Tenant Org provides:
- Logical isolation of users
- Role-based access controls
- Segregated API keys
- Per-tenant usage metering
- Per-tenant cost tracking
- Customizable access to LLM deployments
In the diagram, Tenant Org-1 and Tenant Org-2 represent two independent enterprise customers accessing shared LLM infrastructure.
Org Admins¶
Every tenant has one or more Org Admins, responsible for:
- Managing end users
- Assigning roles and permissions
- Monitoring token usage and cost
- Configuring access to specific LLM deployments
- Overseeing quota management and rate limits
Retail Customer¶
This section describes how retail end users interact with LLM deployments hosted on Rafay Serverless Inference. The diagram illustrates the logical flow of users connecting to their assigned tenant environments, which then access shared LLM deployments through Rafay Serverless Inference.
Retail users (i.e. a casual data scientist, user) often require a simplified multi-tenant model where individual users operate within tenant environments, but do not require the hierarchical admin structure seen in enterprise deployments. Rafay Serverless Inference enables this by providing isolated Tenant Orgs for retail applications while allowing shared access to centralized LLM deployments.
Each tenant org represents an isolated environment with:
- End Users who directly consume LLM APIs
- Segregated usage attribution and metering
- Logical separation of access, quotas, and API keys
Multiple tenant orgs (up to N) can simultaneously access Rafay-managed LLM deployments while remaining fully segregated.
Key Components¶
End Users (Retail)¶
End users in a retail environment directly interact with LLM capabilities through applications or API calls.
Characteristics:
- Each user is associated with a single tenant org
- Users operate using API keys scoped to their tenant
- Users may have restricted access based on application configuration
- All token usage is attributed to the tenant org and, optionally, the user’s API key
In the diagram, these users appear on the left, with arrows flowing into their respective tenant orgs.
Tenant Organizations¶
Each retail customer or application tier is represented as a Tenant Org in Rafay’s multi-tenant system.
A Tenant Org provides
- Logical separation from other tenants.
- Independent API keys and access policies
- Isolated token usage, cost tracking, and quotas
- Custom control over which models or endpoints the tenant can access
All tenant orgs connect to the shared LLM deployment environment on the right. All tenant orgs interact with shared LLM deployments hosted on the fully managed Serverless Inference platform. While the infrastructure is shared, each tenant’s usage is isolated and reported independently.

