Administrators
Administrators are typically privileged users in a Rafay Org assigned to the customer. They are typically members of the IT/Operations team or the Platform team in an enterprise. They are expected to follow a few simple steps before they can start onboarding end users (developers and data scientists).
- Create Infrastructure Configuration
- Create Compute Profiles
- GPU Partitioning Strategy
- Specify Allocation Strategy for each Compute Profile
- Create Services Profiles
- Add/Invite End Users
Important
Review the Support Matrix for details on supported infrastructure and environments.
Infrastructure Configuration¶
In this step, the administrator defines details of the infrastructure that will be used by Rafay GPU PaaS to fulfill end user requests for workspaces. Infrastructure can span both public and private cloud environments.
Public Cloud¶
The administrator will have to provide details about the cloud providers that should be used. This will include details such as the following:
- Cloud Provider: e.g. AWS, Azure, GCP, OCI
- Region: e.g us-east-1
- Credentials: e.g. IAM Role required to provision/scale infrastructure dynamically
- Kubernetes Version: e.g. v1.29
- Instance Types: e.g. p4d.24xlarge on AWS which is powered by Nvidia A100 (40GB memory, 8 GPUs)
Data Center¶
The administrator will have to provide details about the data center that should be used. This will include details such as the following:
- Infrastructure Type: e.g. VMware vSphere, Nutanix, Bare Metal
- Credentials: e.g. SSH Keys
- Operating System: e.g. Ubuntu 22.04 LTS
- GPU Type: e.g. Nvidia A100, H100
Compute Profile¶
A compute profile is a "predefined configuration" specifying compute resources such as CPU, GPU, RAM, and storage. Administrators create and manage compute profiles as "organizational standards" that end users can consume via self service.
A profile consists of the following constructs:
- An user friendly name (e.g. Small, Medium, Large)
- Key-Value pairs describing resources
Once the administrator publishes a compute profile, it is made available for selection/use by end users. Let us look at a few examples of compute profiles to understand how this is defined.
Note
There is "no limit" for the number of compute profiles that can be maintained at the same time.
Example 1: Single GPU Profile¶
The example shown below is a "single GPU" profile where the administrator has provided the name "Medium" as the user friendly name. The infrastructure resources associated with this profile are:
Key | Value |
---|---|
GPU Model | A100 |
GPU Count | 8 |
CPU | 208 |
Storage | 800 GB |
Example 2: Multi GPU Profile¶
The example shown below is a "multi GPU, heterogenous GPU" profile where the administrator has provided the name "Medium" as the user friendly name. The infrastructure resources associated with this profile are:
Key | Value |
---|---|
GPU Model | A100 |
GPU Count | 8 |
CPU | 208 |
Storage | 800 GB |
GPU Model | H100 |
GPU Count | 64 |
CPU | 124 |
Storage | 1024 GB |
Partitioning Strategy¶
It can be ineffective and impractical to allocate a full GPU to an end user especially if they do not require it. The administrator needs to provide input on how they would like Rafay GPU PaaS to partition a high end GPU. In this step, the administrator has to select and specify how they would like to partition their GPUs.
For example, the admin can spatially partition their Nvidia A100 GPU into multiple GPU instances as described in the table below. Once the administrator provides desired number of instances, Rafay GPU PaaS will perform the spatial partitioning automatically behind the scenes and use the multiple GPU instances for assignment to multiple end users.
Number of Instances | Instance Description |
---|---|
One | 7 GPUs, 40GB |
Two | 3 GPUs, 20GB |
Three | 3 GPUs, 20GB |
Seven | 1 GPUs, 5GB |
Note
Underneath the covers, Rafay uses native technologies supported by the GPU vendor to implement partitioning. For example, Nvidia's Multi Instance GPUs-MIG is used to spatially partition GPUs into multiple slices.
Allocation Strategy¶
Once a compute profile has been created, administrators have to specify an allocation strategy for the profile.
Strategy | Description |
---|---|
On Demand (default) | Resources are provisioned once request is submitted by end user |
Warm Pool | Resources are allocated from the standby pool and the pool is replenished immediately |
The allocation strategy allows organizations to operate in the Goldilocks Zone balancing "developer experience" and "costs". Allocating pre-provisioned instances of compute profiles in near real time from a warm pool can ensure that the data scientist is immediately productive. But, this also comes at the cost of the instances that are in standby mode in the warm pool.
On Demand Strategy¶
When a compute profile is associated with an "on-demand" type allocation strategy, the warm pool is essentially maintained at "zero(0)" instances. This means that when a user requests an instance of a compute profile, provisioning is performed "on-demand".
Advantages¶
- Significant cost savings because of no idling resources
Disadvantages¶
- Poor developer experience due to wait time for infrastructure provisioning
- Likelihood of provisioning failure because of lack of availability of infrastructure at provider
Warm Pool Strategy¶
A warm pool is a collection of pre-provisioned, ready-to-use compute resources based on published compute profiles. These resources are kept in a standby state, allowing them to be immediately allocated to end users when requested.
Administrators need to configure the "desired capacity" for the warm pool size for each compute profile. The warm pool manager in GPU PaaS implements a "control loop" for every compute profile ensuring that the desired capacity is always maintained.
A warm pool strategy makes sense for compute profiles that are used frequently and extensively by users.
In the example below, the warm pool consists of instances of Small, Medium and Large compute profiles. The administrator has specified "desired warm pool capacity" for each compute profile. In this example organization, the administrators know that the Small profile is used very frequently and heavily by their data scientists. So, they have made a decision to maintain a reasonably sized warm pool for this profile.
Compute Profile | Desired Capacity |
---|---|
Small | Nine (9) |
Medium | Two (2) |
Large | One (1) |
The capacity defined in the warm pool signifies the "desired state" of capacity of the warm pool. A control loop actively monitors the capacity of the warm pool. It will automatically provision/deprovision resources to ensure that the desired warm pool capacity is maintained.
The image below describes how the warm pool controller replenishes the warm pool automatically after two instances are allocated to fulfil incoming requests.
Depending on the type of compute profile, the warm pool control loop will add/remove capacity by either adding/removing "nodes" on existing clusters or provision/deprovision "clusters". For example, for a "super sized" compute profile (e.g. 100 GPUs), it may be impractical to fulfil it by adding nodes to an existing cluster. In cases like this, the warm pool will provision a dedicated cluster instead.
Administrators may also want to incorporate "cost saving" strategies to manage their warm pool capacity. For example, instead of a static warm pool capacity through the entire week, it may make sense to shrink the warm pool capacity during weekends.
Important
The warm pool feature works only when Rafay is configured to provision and manage the lifecycle of Kubernetes clusters in your accounts. Without this, Rafay cannot manage the warm pool capacity.
Services Profile¶
Once a data scientist/developer creates a workspace via self service, it is essentially an empty system with just the raw infrastructure resources. This means that the user has to expend a lot of effort and time installing various software components into the workspace before it is usable. This can be a significant productivity drain for users.
A services profile is meant to address this challenge. It provides the means for administrators to publish templates for an entire software tech stack that can be deployed by the end users into their "workspaces" with just a single click. For example, data scientists that require a turnkey MLOps Platform in their workspace can be provided with a "vending machine" type experience. They just need to select from the catalog of environment profiles and deploy it into their newly created workspace in just a single click.
The table below is an example of an "MLOps Platform" services profile that encapsulates the following software components.
Capability | Component |
---|---|
Pipelines | Kubeflow |
Model Registry | MLflow |
Feature Store | Feast |
Object Store | GCS |
Authentication | Okta SSO |
Turnkey Profiles¶
Rafay maintains and provides a number of first class service profiles for very common use cases. Organizations can use these turnkey service profiles "as-is" with zero effort to develop and maintain them. With these service profiles, administrators can deliver the following capabilities as As a Service to their users.
- Jupyter Notebook as a Service
- Training as a Service
- Inference as a Service
- MLOps Platform as a Service
- Ray as a Service
- Jobs as a Service
End Users¶
Once an administrator has configured at least one compute profile and associated an allocation strategy to the compute profiles, end users can request and use GPU Workspaces. Administrators need to add users to a "group" that is associated with the role for self service access to GPU PaaS.
For more information, review the section on users.