Skip to content

Baremetal

The Bare Metal compute option in the Developer Hub enables deployment of dedicated physical servers that provide direct access to hardware resources. Administrators can provision Bare Metal instances for users by defining profiles with specific hardware configurations. End users can then deploy these instances with a single click, offering a streamlined and high-performance infrastructure experience.


Create Baremetal Compute Instance

To create a Baremetal compute instance:

  • Navigate to the Developer Hub and select the Baremetal type
  • Click New Bare Metal to create a new instance

Users can also access using the left navigation pane under the Compute section.

Hierarchy

Clicking on the Baremetal option from the left pane or selecting "New Bare Metal" from the home page opens a wizard that allows users to select from the available Baremetal Compute Profiles.

Hierarchy

Once the profile is selected, provide the required details. If pricing for the selected profile is configured in Global Settings by the Org Admin, a monthly estimate will be displayed.

  • Name: Enter a unique name for the compute instance
  • Description: Provide a brief summary of the instance
  • Compute Profile: Proceed with the selected compute profile
  • Workspace: Select the workspace from the drop-down menu
  • Contract Term (In Months): Specify the duration of the contract Pricing is dynamically calculated based on the selected term and is displayed in the estimate section on the right
  • Operating System: Choose the desired operating system for the instance (e.g., Ubuntu 22.04)
  • Public SSH Key: Paste your public SSH key to enable secure access to the instance

Create Compute Instance

The instance initially displays a status of In Progress.

Create Compute Instance

Upon successful deployment, the status updates to Success.

Create Compute Instance

Important

The list of compute profiles presented to the end user is dynamic i.e. if the administrator updates or publishes a new profile, it will be immediately available to the end user as an option to consume.


View Compute Instances

When the user clicks on Baremetal, they are presented with the list of Baremetal compute instances in their workspace.

  • Name
  • Workspace
  • Created At
  • Services
  • Publish Status
  • Actions

Compute Instances


Post-Deployment Operations

Once an instance has been launched and is operational, users can perform a number of actions on it. This section describes the list of actions that can be performed.

Remote Access

The end user will need access to the remote instance that is operating behind a firewall in a private data center or public cloud. These instances can be either a Kubernetes namespace or a Virtual Cluster or a Dedicated Kubernetes cluster. The secure remote access capability is powered by Rafay's Zero Trust Kubectl (ZTKA) feature.

The user can download the ZTKA "kubeconfig" file, configure their KubeCTL CLI utility to use it and access the instance remotely. The user can also click on the "kubectl" button which will open a web shell that will allow them to securely interact with the instance.

Collaborators

It is common for end users to work with both internal (i.e. employees) and external (i.e. outside the company) collaborators. Users can easily add/remove other users to a specific instance. Once the user enters the collaborator's "email address", an email invitation is sent to the user with details on how they can access this instance. Once they login, they will have the same level (i.e. role/privilege) of access to the instance.


Instance Actions

After a Bare Metal compute instance is deployed, various lifecycle operations can be performed from the Actions panel. The following options are available:

  • Start: Powers on the instance if it is currently in a stopped state
  • Stop: Initiates a graceful shutdown of the instance. This is typically used for maintenance or to reduce resource usage
  • Power Cycle: Performs a complete power cycle, turning the instance off and then back on. Useful for applying configuration changes or resolving issues
  • Power Reset: Executes a hardware-level reset, similar to pressing the reset button on a physical server. Intended for unresponsive system scenarios
  • Delete: Permanently removes the instance along with all related configurations. This action cannot be undone

Hierarchy

⚠️ It is recommended to back up any critical data before using Power Reset or Delete actions. This action is irreversible and cannot be undone.


View Metrics

To access the metrics for a Bare Metal instance:

  1. Navigate to the list of Bare Metal instances
  2. Click the ellipsis (⋮) icon under the Actions column for the desired instance
  3. Select View Metrics from the dropdown

Hierarchy

The Metrics Overview page is displayed, providing insights into:

  • CPU Utilization: Displays current, peak, and average CPU usage and committed resources
  • Memory Utilization: Shows memory usage statistics, including current, peak, and average usage
  • Storage Utilization: Indicates disk usage in terms of current, peak, and average usage

Additionally, GPU information is displayed for each allocated GPU with details such as model and identifier (e.g., GPU #1, GPU #2).

Hierarchy

GPU Metrics Breakdown

To view GPU-specific metrics, expand the corresponding GPU section. The detailed metrics include:

  • GPU Utilization: Indicates the percentage of GPU core processing capacity currently in use. A high value suggests the GPU is actively engaged in compute tasks such as inference or training.

  • GPU Memory Copy Utilization: Reflects how actively the GPU is transferring data between its memory and compute units. Elevated values may indicate frequent data movement, which could impact performance if memory bandwidth becomes a bottleneck.

  • GPU Temperature: Displays the current temperature of the GPU in degrees Celsius. Continuous high temperatures may lead to thermal throttling or hardware issues, and could indicate inadequate cooling.

  • GPU SM Clocks: Shows the clock speed of the Streaming Multiprocessors (SMs), which are responsible for executing shader and compute workloads. This metric helps assess if the GPU is running at its full performance potential.

  • GPU Memory Clocks: Indicates the frequency at which the GPU memory operates. It determines the speed at which data is read from or written to the GPU memory, affecting overall memory throughput.

  • Framebuffer Memory Used: Displays the amount of memory currently consumed by the GPU for active workloads. This includes model weights, input data, and intermediate results stored in memory.

  • Framebuffer Memory Free: Indicates the remaining available memory on the GPU that can be used for additional tasks. Monitoring this helps ensure there is sufficient memory to handle new or growing workloads without failure.

Hierarchy

These metrics provide a comprehensive view of the GPU's performance characteristics and help in identifying potential bottlenecks or hardware limitations during workload execution.