Cluster Blueprints and Drift Detection¶
Around three years back, we noticed many of our customers struggling with enterprise wide standardization of their Kubernetes clusters. Every cluster in their Organization was a snowflake and they were looking for a way to enforce that every cluster had a "baseline set of add-ons". This prompted us to develop Cluster Blueprints which has turned out to be one of the most heavily used features in our platform.
In this blog, we will describe a superpower setting in the cluster blueprints feature that we see customers use heavily for their production clusters to secure against unplanned drift.
The Drift Problem¶
Although cluster blueprint solves the "standardization" challenge, it is still possible for users with Cluster Admin privileges to make "accidental" changes to the add-ons associated with a cluster blueprint.
When something like this occurs, the cluster would have "drifted" away from the desired state. Unplanned, out of band changes can result in signficant operational, compliance and security issues. For example, what if this update impacted the configuration of a critical security scanner?
Drift Detection¶
Cluster Blueprints in the Rafay Kubernetes Operations platform can be configured to actively monitor for unexpected drift. This monitoring and enforcement is performed by the Rafay Kubernetes Operator deployed on the managed cluster. Customers have two options for response when drift is detected.
Option 1: Notify
Generates an audit event when unplanned drift is detected.
Option 2: Block
Block the uplanned drift and generate an audit event.
It is a good operational practice to ensure that all updates to production clusters are "planned", "version controlled" and "approved". The image below shows an environment where "drift detection based blocking" can be used in use in conjuction with a modern GitOps based pipeline performing the "allowed/planned update".
sequenceDiagram
Git Repo->>Git Repo: Pull Request
Git Repo->>Git Repo: Merge
Git Repo->>Controller: Webhook
Controller->>Cluster: Update Blueprint
Cluster->>Cluster: Monitor
Rogue Admin-->>Cluster: Attempts out of band change
Cluster-->>Controller: Audit Event
Cluster-->>Rogue Admin: "X" Attempt Blocked "X"
Here's an example of what the "Rogue Admin" would encounter when they try to delete a "drift protected" resource in the cluster blueprint.
Try It Out¶
If you are interested in trying this out yourself, sign up for a Free Org/Tenant and use our "Getting Started Guide" for Cluster Blueprints and Drift Detection.
Get Started with Drift Detection
Blog Ideas¶
Sincere thanks to those who spend time reading our product blogs and provide us with feedback and ideas. Please Contact the Rafay Product Team if you would like us to write about specific topics.