Skip to content

ARC Zonal Shift

Overview

ARC Zonal Shift enables traffic to move away from an Availability Zone (AZ) to improve application resilience during zonal events. The feature supports both manual zonal shift operations and automated zonal autoshift behavior.

Zonal Shift is controlled centrally through cluster configuration and managed from the Console after cluster provisioning.


Day-0 Configuration (Cluster Specification)

ARC Zonal Shift is configured during cluster creation by updating the cluster specification on Day 0. The following example shows a cluster specification with both Zonal Shift and Auto Zonal Shift configuration.

kind: Cluster
metadata:
  name: gopimallela-zs-ui
  project: defaultproject
spec:
  blueprint: minimal
  blueprintversion: 4.1.0
  cloudprovider: uday-qa-acc
  cniprovider: aws-cni
  proxyconfig: {}
  type: eks
---
accessConfig:
  authenticationMode: CONFIG_MAP
addons:
- name: coredns
  version: v1.12.1-eksbuild.2
- name: vpc-cni
  version: v1.20.4-eksbuild.2
- name: kube-proxy
  version: v1.33.3-eksbuild.4
- name: aws-ebs-csi-driver
  version: latest
addonsConfig: {}
apiVersion: rafay.io/v1alpha5
autoModeConfig: {}
autoZonalShiftConfig:
  allowedWindows:
  - Mon:00:00-Mon:01:00
  enabled: true
  outcomeAlarms:
  - arn:aws:cloudwatch:us-west-2:211125364662:alarm:rafay-core-dev-RDSinstanceCPUUtilization
kind: ClusterConfig
managedNodeGroups:
- amiFamily: AmazonLinux2023
  desiredCapacity: 2
  iam:
    withAddonPolicies:
      autoScaler: true
  instanceTypes:
  - t3.xlarge
  maxSize: 2
  minSize: 2
  name: ng-17e9c131
  version: "1.33"
  volumeSize: 80
  volumeType: gp3
- amiFamily: AmazonLinux2023
  desiredCapacity: 2
  iam:
    withAddonPolicies:
      autoScaler: true
  instanceTypes:
  - t3.xlarge
  maxSize: 2
  minSize: 2
  name: ng-2
  nodeRepairConfig:
    enabled: true
    maxParallelNodesRepairedCount: 2
    maxUnhealthyNodeThresholdCount: 1
    nodeRepairConfigOverrides:
    - minRepairWaitTimeMins: 20
      nodeMonitoringCondition: NetworkingReady
      nodeUnhealthyReason: ContainerRuntimeFailed
      repairAction: Replace
    - minRepairWaitTimeMins: 10
      nodeMonitoringCondition: Ready
      nodeUnhealthyReason: ContainerRuntimeFailed
      repairAction: Replace
  version: "1.33"
  volumeSize: 80
  volumeType: gp3
metadata:
  name: gopimallela-zs-ui
  region: us-west-2
  tags:
    email: gopikrishna@rafay.co
    env: dev
  version: "1.33"
vpc:
  cidr: 192.168.0.0/16
  clusterEndpoints:
    privateAccess: true
    publicAccess: false
  nat:
    gateway: Single
zonalShiftConfig:
  enabled: true

Zonal Shift Parameters

  1. zonalShiftConfig
zonalShiftConfig:
  enabled: true
Parameter Description
enabled Enables ARC Zonal Shift for the cluster. Acts as the central switch required for both manual and automatic zonal shift operations. Can be configured during Day-0 via cluster specification and updated later as a Day-2 operation. Once enabled, zonal shift operations are available from the Console.
  1. autoZonalShiftConfig
autoZonalShiftConfig:
  enabled: true
  outcomeAlarms:
    - arn:aws:cloudwatch:...
  allowedWindows:
    - Mon:00:00-Mon:01:00
Parameter Description
enabled Enables automatic zonal autoshift. When enabled, traffic can be automatically shifted during zonal events and practice runs.
outcomeAlarms Mandatory CloudWatch alarm ARNs used to evaluate practice run results. If an alarm enters ALARM state, the practice run is marked as failed.
allowedWindows Defines time windows when practice runs are allowed. Format: Day:HH:MM-Day:HH:MM (UTC).

Day-0 vs Day-2 Operations

Day-0 (Cluster Provisioning)

Allowed:

  • Configure zonalShiftConfig.enabled
  • Establish centralized Zonal Shift capability for the cluster.

Important: UI-based zonal shift operations are not supported during Day-0 configuration.

Day-2 (Post-Provision Management)

Available from the Console:

  • Enable or disable Zonal Shift
  • Start manual zonal shift
  • View history
  • Configure Auto Zonal Shift
  • Add outcome alarms
  • Configure practice run windows
  • Manage autoshift settings

Managing Zonal Shift in the Console

After cluster creation:

  1. Navigate to the cluster and open Configuration.
  2. Locate EKS Zonal Shift and click on Manage.

Upgrade Notification

Settings

Displays and controls centralized Zonal Shift enablement.

  • Shows current state.
  • Enable or disable Zonal Shift for the cluster and click Save

Upgrade Notification

History

The History tab provides a consolidated view of all zonal shift activities performed on the cluster, including manual shifts, automatic shifts, and practice runs.

Users can filter entries based on:

  • Status — Active, Expired, or Canceled
  • Type — Zonal Shift, Practice Run, or Zonal Autoshift

This tab helps users:

  • Track past and current zonal shift operations.
  • Verify whether a shift completed successfully, expired, or was canceled.
  • Review operational timelines and execution details for troubleshooting or auditing.
  • Understand how traffic-shift actions have impacted the cluster over time.

The History view is useful for monitoring operational health, validating resiliency testing, and maintaining visibility into Day-2 zonal shift activities.

Upgrade Notification

Start (Manual Zonal Shift)

Used to initiate a manual zonal shift.

Steps

  1. Select the Availability Zone and set expiration duration.
  2. Provide an optional comment and click Start to start the shift.

Behavior

  • Traffic moves away from the selected AZ.
  • Only one active shift per resource.
  • Shift automatically expires after the configured duration.

Upgrade Notification

Auto Config

Used to configure automatic zonal autoshift behavior.

Enable Autoshift

  • Enables automatic traffic shifting.
  • Starts AWS-managed practice runs used to validate resiliency.

Upgrade Notification

Configure Alarms

Alarms are used to control and evaluate Zonal Autoshift practice runs.

Outcome Alarms (Mandatory)

Outcome alarms define whether a practice run is successful or failed.

  • Evaluated during the practice run.
  • If an alarm enters the ALARM state, the practice run is marked as failed.
  • Used to validate workload stability when traffic is shifted away from an Availability Zone.

Blocking Alarms (Optional)

Blocking alarms prevent practice runs from starting when the environment is already unhealthy.

  • Checked before a practice run starts.
  • If active, the practice run does not run.

This helps avoid additional impact during ongoing issues.

Upgrade Notification

Configure Run Windows

Run windows control when automatic Zonal Autoshift practice runs are allowed to execute.

  • Allowed windows define the time periods when practice runs can occur.
  • Blocked windows prevent practice runs from running during specific time ranges.

Run windows help schedule practice runs during appropriate periods and avoid execution during business-critical or peak usage hours.

Upgrade Notification

Practice Run Behavior

When autoshift is enabled:

  • Practice runs occur approximately once per week.
  • Each run lasts about 30 minutes.
  • Traffic shifts away from a single Availability Zone.
  • Helps validate application resilience without requiring production incidents.