Skip to content

Kubernetes Upgrades

The Controller provides seamless workflows to help customers manage the lifecycle of Kubernetes clusters including workflows to keep the Kubernetes version current and up to date.


Kubernetes Versions

Kubernetes versions are expressed as vMajor.vMinor.vPatch. The Kubernetes project typically releases new vMinor versions every 3-months. New vPatch updates are made available to address security issues and/or bug fixes.


Supported Versions

The Kubernetes project maintains release branches for the most recent three minor releases. Applicable fixes, including security fixes are typically made available ONLY to these three release branches.

We actively track the Kubernetes Project for availability of patches and minor/major versions. These are then immediately put through a round of testing and qualification before being made available to customers. By default, the latest version of the Kubernetes is used for cluster provisioning. Customers can also optionally specify a version from an older minor version from the support matrix of Kubernetes during cluster provisioning. For example, the screenshot below shows the K8s version selection dropdown for provisioning on bare metal/VMs.

Version Selection

  • Users must update the Blueprint (both custom and default) of the existing cluster before upgrading the cluster to the k8 1.24 version
  • Users must upgrade the Helm and Yaml workload to the new API versions before upgrading the cluster to K8 1.24 version

Important

Upgrading a workload fails if not upgraded to the new API versions before upgrading a cluster to the k8s 1.24 version. To upgrade a deprecated Yaml or Helm charts on the k8s v1.24 cluster, users must Unpublish (delete) the workload and Publish (create) to apply the new changes on the workload


In Place k8s Upgrades

Design Goals

  • Customer’s application should not suffer from lack of orchestration capabilities (e.g. autoscaling) during k8s upgrade process.
  • Kubernetes upgrades can be scheduled and performed in customer's preferred maintenance windows.
  • Kubernetes upgrades can be performed with a canary approach i.e. one canary cluster first, then followed by remaining clusters.
  • Customer applications should be able to operate in a heterogeneous k8s environment for extended periods of time i.e. some clusters on latest versions and remaining on prior version.

Master Nodes

HA clusters will have "multiple Kubernetes masters" deployed on three separate nodes. The master nodes are upgraded one at a time ensuring there is no disruption to both customer containers as well as core control/management functions.

Non HA, single node systems have one Kubernetes master. When Kubernetes is upgraded on these systems, the control functions are paused briefly until upgrade is complete. It is worth emphasizing that there is no impact to the customer's containers on this system while Kubernetes is being upgraded.

Master Node Upgrades

Worker Nodes

The worker nodes are upgraded one at a time ensuring there is no disruption to both customer containers. Before they are upgraded, the worker nodes are tainted with "No Schedule" to ensure that new pods are not scheduled on it. Once upgrade is complete, the taint is removed.

Note that during the worker node upgrade process, there is no disruption to the data path to the customer applications.

Worker Node Upgrades


Typical Process

Upgrade Notifications

When new Kubernetes versions (vMinor or vPatch) are made available, cluster administrators are provided a notification. For example, the cluster shown below is running an older version of Kubernetes (v1.16.12) and is shown a red "upgrade available" notification badge.

Upgrade Notifications

Clicking on the notification badge will present available upgrade options. For example, in the screenshot below, the user can upgrade to

  • The latest vPatch (v1.14.1 -to- v1.14.10)
  • The next possible vMinor (v1.14.1 -to- v1.15.12)

Upgrade Notification Detail

Important

On upgrading k8s version from 1.25 to 1.26, containerd version is upgraded to 1.6.10


Start Upgrade

Only authorized users with appropriate RBAC are allowed to perform Kubernetes upgrades.

The controller performs an "in-place" upgrade of Kubernetes. During the Kubernetes upgrade process, the nodes are cordoned (not drained) before they are upgraded. As a result, pods already resident on the node can remain where they are running with no loss of transient data in local volumes.

Important

Cordon will not schedule new pods on the node. Draining will remove the current pods on the node and they get rescheduled to a different node. This can result in evictions of pods etc.


Preflight Checks

Preflight checks are automatically performed before the cluster is upgraded/downgraded to the new Kubernetes version. The process is terminated if preflight checks do not pass.

The following preflight checks are performed and have to pass before the upgrade process is allowed to proceed.

  • Cluster Readiness (i.e. Is cluster actually provisioned and in a READY state?)
  • Control Channel Health (i.e. Is the OS level control channel to Controller active?) and
  • Kubeadm Internal Preflight Checks (i.e. verifies the cluster’s health, node health etc)

Node Upgrades

The software binaries for the target Kubernetes version are downloaded from the Controller as a single TAR file. The typical size of the downloaded tar file is ~40 MB.


Post-Upgrade Validation

Post upgrade validation checks are automatically performed after the cluster is upgraded/downgraded to the new Kubernetes version. The following checks are performed and have to pass before the upgrade is deemed successful.

  • Node Ready Check (i.e. did all the cluster nodes report back as READY after upgrade?)
  • Pods Running Check (i.e. are all the pods in the critical namespaces running after upgrade?)

Important

Performing k8s upgrades on unhealthy clusters can result in long validation time windows because the process will continuously retry for 10 minutes to ensure that the cluster and pods settle down.


Successful Upgrade

The upgrade process can take a few minutes (3-5 mins) and is dependent on the number of nodes on the cluster and network bandwidth for software downloads. Note that the time taken for every step is measured and displayed to the user.

Successful Upgrade


Unsuccessful Upgrade

If the upgrade process was unsuccessful, users will be presented the option to

(a) Retry OR (b) Rollback

Failed Upgrade

Retries can be useful when transient errors occur such as downloading of binaries fail with remote clusters with poor network connectivity. A rollback will take the cluster back to its original state before the upgrade was performed.

Retry and Rollback


Upgrade History

The Controller maintains a history of all successful and unsuccessful upgrades.

  • Navigate to the Cluster
  • Click on Activity
  • Clicking on the "eye" icon will display deep details associated with the particular upgrade job

Upgrade History


Audit Trail

An audit entry is generated when a Kubernetes upgrade is performed. It is possible to retroactively determine "Who performed the upgrade when and from which version to version.

Upgrade Audit Trail