Skip to content

Break Glass Workflows for Developer Access to Kubernetes Clusters - Introduction

In any large-scale, production-grade Kubernetes setup, maintaining the security and integrity of the clusters is critical. However, there are exceptional circumstances—such as production outages or critical bugs—where developers need emergency access to a Kubernetes cluster to resolve issues.

This is where a "Break Glass" process comes into play. It is a controlled procedure that grants temporary, elevated access to developers in critical situations, with the appropriate safeguards in place to minimize risks.

Break Glass


Background

In a highly-regulated or security-conscious environment, developer access to production Kubernetes clusters is typically restricted. Admins enforce least-privilege access to minimize attack surfaces and avoid accidental changes.

A break glass process provides a way to bypass these restrictions temporarily but with clear accountability, logging, and oversight. Described below is what a typical break glass process would like.

Pre-authorization

The developers or teams who may need break glass access should be pre-approved. This means having a clear list of users who, under emergency situations, can request elevated privileges.

Justification

Before access is granted, the developer should provide a reason for needing break glass access. The justification should be clear, specific, and logged. This not only adds a layer of accountability but also ensures that break glass isn’t abused for non-emergency purposes.

Multi-factor Authentication (MFA)

Access should only be granted after MFA is completed to ensure that only the intended recipient is granted the elevated privileges. This adds an additional layer of security.

Time-limited Access

The elevated access granted during a break glass event should be temporary. Access can be configured to automatically revoke after a certain time period (e.g., 1 hour, 4 hours, etc.). This ensures that the developer’s elevated permissions don’t persist unnecessarily.

Audit Trails and Monitoring

Every break glass event should be logged comprehensively. This includes the person requesting access, the justification, the duration of access, and actions performed while elevated access was in effect. Logs should be stored in an immutable system and periodically reviewed to detect potential misuse or security violations.

Approval Process

In many organizations, break glass access might require approval from a higher-level authority (e.g., a team lead, security officer). This ensures that the request is genuine and aligns with business continuity needs. Automated workflows can be set up to speed up this approval process, ensuring that emergencies are handled quickly without bureaucratic delay.

Post-incident Review

After the break glass access is used, a post-mortem or retrospective meeting should be conducted. This helps assess the reason for the access and ensures that any process or automation improvements can be made to avoid needing manual intervention next time. This process also ensures accountability, preventing misuse.


Summary

A well-implemented break glass process ensures that your Kubernetes environment is secure while allowing for emergency interventions when needed. By defining clear policies, setting up robust approval workflows, and ensuring thorough auditing, you can handle emergencies swiftly without compromising security.

By following these steps, you ensure that access is not only controlled but also transparent and accountable. This balance between security and operational agility is crucial in modern, distributed systems like Kubernetes.

In a follow on blog, we will describe how organizations implement secure break glass workflows for temporary and remote developer access to Kubernetes Clusters.