Alerts

The Controller continuously monitors both clusters and workloads deployed on the managed clusters. When a critical issue with the cluster or the workload is detected, the Controller generates an "Alert".

Alerts are generated when observed events "persist" and are unable to resolve automatically after a number of retries. The entire history of "Alerts" is persisted on the Controller and a reverse chronological history is available to Org Admins on the Console.

Alert Lifecycle¶

All Alerts start life as "Open Alerts". When the underlying issue is resolved (automatically or manually) and the issue does not manifest anymore, the alert is automatically "Closed".

Filters are provided to help sort and manage the alerts appropriately:

Project
Alerts Status (Open/Closed)
Type
Cluster
Severity
Time Range

For every alert, the following data is presented to the user:

Date: When the issue was first observed and therefore the alert was generated automatically
Duration: How long the issue has persisted
Type: See details below
Cluster: The cluster in which the issue was observed
Severity: How severe is this alert (Critical/Warning/Info)
Summary: Brief description of the issue
Description: Detailed description of the issue behind the alert

Alert Severity¶

All alerts have an associated Severity. A CRITICAL alert means the administrator needs to pay attention immediately to help address the underlying issue. A WARNING severity means there is an underlying issue that is trending poorly and will need attention quickly. An Info severity is mostly for Informational purposes only.

SLA¶

For application and ops teams, SLA can be a critical measure of their effectiveness. The "duration" of the alert provides an excellent indication of SLA. Issues should ideally be triaged and resolved ASAP in minutes.

Manage Notifications¶

You can configure which system alerts you want to receive and specify the email recipients for notifications.

Notifications¶

Under Notifications, you can enable or disable alerts for specific monitored objects. When enabled, notifications are triggered whenever relevant events occur in the environment.

Notification Type	Description	Default State
Cluster	Receive notifications related to overall cluster health and connectivity.	Enabled
Pod	Alerts for pod-related events such as failures, restarts, or unhealthy status.	Enabled
Node	Monitors node availability and performance metrics like CPU, memory, and status.	Enabled
PVC	Tracks Persistent Volume Claims for binding or capacity issues.	Enabled
Agent Health	Alerts when an agent loses connectivity or becomes unhealthy.	Enabled

Users can toggle the switch next to each notification type to enable or disable alerts as needed.

Recipient Emails¶

Under Recipient Emails, you can manage who receives these notifications.

Add or delete recipient email addresses using the Add and Delete (🗑️) icons.
Only valid email formats are accepted.
All listed recipients will receive email notifications for the enabled categories.

Actions:

Click Add to include a new email address.
Click the trash icon to remove an existing email.
Select Save to confirm your configuration.
Choose Cancel to discard changes.

Alerts Quick View¶

Cluster administrators are provided with a quick view of all open alerts associated with a cluster. In the Console, navigate to the cluster card to get a bird's eye view of open alerts.