Skip to content

Alerts

The Controller continuously monitors both clusters and workloads deployed on the managed clusters. When a critical issue with the cluster or the workload is detected, the Controller generates an "Alert".

Alerts are generated when observed events "persist" and are unable to resolve automatically after a number of retries. The entire history of "Alerts" is persisted on the Controller and a reverse chronological history is available to Org Admins on the Console.


Alert Lifecycle

All Alerts start life as "Open Alerts". When the underlying issue is resolved (automatically or manually) and the issue does not manifest anymore, the alert is automatically "Closed".

Filters are provided to help sort and manage the alerts appropriately:

  • Project
  • Alerts Status (Open/Closed)
  • Type
  • Cluster
  • Severity
  • Time Range

For every alert, the following data is presented to the user:

  • Date: When the issue was first observed and therefore the alert was generated automatically
  • Duration: How long the issue has persisted
  • Type: See details below
  • Cluster: The cluster in which the issue was observed
  • Severity: How severe is this alert (Critical/Warning/Info)
  • Summary: Brief description of the issue
  • Description: Detailed description of the issue behind the alert

Closed Alerts


Alert Severity

All alerts have an associated Severity. A CRITICAL alert means the administrator needs to pay attention immediately to help address the underlying issue. A WARNING severity means there is an underlying issue that is trending poorly and will need attention quickly. An Info severity is mostly for Informational purposes only.


SLA

For application and ops teams, SLA can be a critical measure of their effectiveness. The "duration" of the alert provides an excellent indication of SLA. Issues should ideally be triaged and resolved ASAP in minutes.


Manage Notifications

You can configure which system alerts you want to receive and specify the email recipients for notifications.

Notifications

Under Notifications, you can enable or disable alerts for specific monitored objects. When enabled, notifications are triggered whenever relevant events occur in the environment.

Notification Type Description Default State
Cluster Receive notifications related to overall cluster health and connectivity. Enabled
Pod Alerts for pod-related events such as failures, restarts, or unhealthy status. Enabled
Node Monitors node availability and performance metrics like CPU, memory, and status. Enabled
PVC Tracks Persistent Volume Claims for binding or capacity issues. Enabled
Agent Health Alerts when an agent loses connectivity or becomes unhealthy. Enabled

Users can toggle the switch next to each notification type to enable or disable alerts as needed.

Recipient Emails

Under Recipient Emails, you can manage who receives these notifications.

  • Add or delete recipient email addresses using the Add and Delete (🗑️) icons.
  • Only valid email formats are accepted.
  • All listed recipients will receive email notifications for the enabled categories.

Actions:

  • Click Add to include a new email address.
  • Click the trash icon to remove an existing email.
  • Select Save to confirm your configuration.
  • Choose Cancel to discard changes.

Alerts Quick View

Cluster administrators are provided with a quick view of all open alerts associated with a cluster. In the Console, navigate to the cluster card to get a bird's eye view of open alerts.

Quick View of Alerts