The Controller provides a web-based view of the entire fleet of Kubernetes clusters under management. Deep cluster insights and time series data is powered by the "managed Prometheus" addon that is part of the default cluster blueprint. Every managed cluster with the controller managed "monitoring addon" automatically deploys Prometheus and a number of related components to the "rafay-infra" namespace on the cluster.
Metrics scraped by Prometheus is cached locally for approx 2 hours. This data is also streamed back to the controller and stored in a centralized "time series database". This "summary" and "trend" data is then made available to authorized users of the Controller via dashboards.
Selecting either the "minimal blueprint" or unselecting the "monitoring" addon in a custom cluster blueprint will result in data not populating in many of the dashboards.
Multi Cluster Dashboard¶
The Web Console provides a single view of all clusters under management by the organization. Users are presented with "At a Glance" operational details in the multi cluster dashboard that allows them quickly address issues.
A "Cluster Dashboard" is available for all managed Kubernetes clusters. Users can use the dashboard to view detailed information about all Kubernetes resources on the cluster. Both "Current State" and "Trends" over time for Kubernetes resources are presented to aid users in not just debugging/troubleshooting but also forecasting and planning.
The dashboard organizes the information into logical categories
- Cluster Level
- Node Level
- Specific Kubernetes Resources
Each cluster card provides at a glance information is provided for each managed cluster
- Location etc
The cluster card shows "CPU and Memory Resources" on the cluster. For each metric, both total resources and current utilization is tracked and displayed.
The utilization chart will automatically transition from Green to Red if a threshold of 80% utilization is reached.
Operational Status of managed clusters is shown at a fine grained level to the admins
Reachability: Does the cluster have connectivity to the Controller?
Cluster Health: Is the cluster healthy?
Blueprint Sync Status: Has the configured blueprint synchronized on the cluster?
The cluster card also provides a view into the number of workloads operational on the cluster and their status.