ARC Zonal Shift
Overview¶
ARC Zonal Shift enables traffic to move away from an Availability Zone (AZ) to improve application resilience during zonal events. The feature supports both manual zonal shift operations and automated zonal autoshift behavior.
Zonal Shift is controlled centrally through cluster configuration and managed from the Console after cluster provisioning.
Day-0 Configuration (Cluster Specification)¶
ARC Zonal Shift is configured during cluster creation by updating the cluster specification on Day 0. The following example shows a cluster specification with both Zonal Shift and Auto Zonal Shift configuration.
kind: Cluster
metadata:
name: gopimallela-zs-ui
project: defaultproject
spec:
blueprint: minimal
blueprintversion: 4.1.0
cloudprovider: uday-qa-acc
cniprovider: aws-cni
proxyconfig: {}
type: eks
---
accessConfig:
authenticationMode: CONFIG_MAP
addons:
- name: coredns
version: v1.12.1-eksbuild.2
- name: vpc-cni
version: v1.20.4-eksbuild.2
- name: kube-proxy
version: v1.33.3-eksbuild.4
- name: aws-ebs-csi-driver
version: latest
addonsConfig: {}
apiVersion: rafay.io/v1alpha5
autoModeConfig: {}
autoZonalShiftConfig:
allowedWindows:
- Mon:00:00-Mon:01:00
enabled: true
outcomeAlarms:
- arn:aws:cloudwatch:us-west-2:211125364662:alarm:rafay-core-dev-RDSinstanceCPUUtilization
kind: ClusterConfig
managedNodeGroups:
- amiFamily: AmazonLinux2023
desiredCapacity: 2
iam:
withAddonPolicies:
autoScaler: true
instanceTypes:
- t3.xlarge
maxSize: 2
minSize: 2
name: ng-17e9c131
version: "1.33"
volumeSize: 80
volumeType: gp3
- amiFamily: AmazonLinux2023
desiredCapacity: 2
iam:
withAddonPolicies:
autoScaler: true
instanceTypes:
- t3.xlarge
maxSize: 2
minSize: 2
name: ng-2
nodeRepairConfig:
enabled: true
maxParallelNodesRepairedCount: 2
maxUnhealthyNodeThresholdCount: 1
nodeRepairConfigOverrides:
- minRepairWaitTimeMins: 20
nodeMonitoringCondition: NetworkingReady
nodeUnhealthyReason: ContainerRuntimeFailed
repairAction: Replace
- minRepairWaitTimeMins: 10
nodeMonitoringCondition: Ready
nodeUnhealthyReason: ContainerRuntimeFailed
repairAction: Replace
version: "1.33"
volumeSize: 80
volumeType: gp3
metadata:
name: gopimallela-zs-ui
region: us-west-2
tags:
email: gopikrishna@rafay.co
env: dev
version: "1.33"
vpc:
cidr: 192.168.0.0/16
clusterEndpoints:
privateAccess: true
publicAccess: false
nat:
gateway: Single
zonalShiftConfig:
enabled: true
Zonal Shift Parameters¶
zonalShiftConfig
zonalShiftConfig:
enabled: true
| Parameter | Description |
|---|---|
enabled |
Enables ARC Zonal Shift for the cluster. Acts as the central switch required for both manual and automatic zonal shift operations. Can be configured during Day-0 via cluster specification and updated later as a Day-2 operation. Once enabled, zonal shift operations are available from the Console. |
autoZonalShiftConfig
autoZonalShiftConfig:
enabled: true
outcomeAlarms:
- arn:aws:cloudwatch:...
allowedWindows:
- Mon:00:00-Mon:01:00
| Parameter | Description |
|---|---|
enabled |
Enables automatic zonal autoshift. When enabled, traffic can be automatically shifted during zonal events and practice runs. |
outcomeAlarms |
Mandatory CloudWatch alarm ARNs used to evaluate practice run results. If an alarm enters ALARM state, the practice run is marked as failed. |
allowedWindows |
Defines time windows when practice runs are allowed. Format: Day:HH:MM-Day:HH:MM (UTC). |
Day-0 vs Day-2 Operations¶
Day-0 (Cluster Provisioning)
Allowed:
- Configure
zonalShiftConfig.enabled - Establish centralized Zonal Shift capability for the cluster.
❗ Important: UI-based zonal shift operations are not supported during Day-0 configuration.
Day-2 (Post-Provision Management)
Available from the Console:
- Enable or disable Zonal Shift
- Start manual zonal shift
- View history
- Configure Auto Zonal Shift
- Add outcome alarms
- Configure practice run windows
- Manage autoshift settings
Managing Zonal Shift in the Console¶
After cluster creation:
- Navigate to the cluster and open Configuration.
- Locate EKS Zonal Shift and click on Manage.
Settings¶
Displays and controls centralized Zonal Shift enablement.
- Shows current state.
- Enable or disable Zonal Shift for the cluster and click Save
History¶
The History tab provides a consolidated view of all zonal shift activities performed on the cluster, including manual shifts, automatic shifts, and practice runs.
Users can filter entries based on:
- Status — Active, Expired, or Canceled
- Type — Zonal Shift, Practice Run, or Zonal Autoshift
This tab helps users:
- Track past and current zonal shift operations.
- Verify whether a shift completed successfully, expired, or was canceled.
- Review operational timelines and execution details for troubleshooting or auditing.
- Understand how traffic-shift actions have impacted the cluster over time.
The History view is useful for monitoring operational health, validating resiliency testing, and maintaining visibility into Day-2 zonal shift activities.
Start (Manual Zonal Shift)¶
Used to initiate a manual zonal shift.
Steps
- Select the Availability Zone and set expiration duration.
- Provide an optional comment and click Start to start the shift.
Behavior
- Traffic moves away from the selected AZ.
- Only one active shift per resource.
- Shift automatically expires after the configured duration.
Auto Config¶
Used to configure automatic zonal autoshift behavior.
Enable Autoshift
- Enables automatic traffic shifting.
- Starts AWS-managed practice runs used to validate resiliency.
Configure Alarms¶
Alarms are used to control and evaluate Zonal Autoshift practice runs.
Outcome Alarms (Mandatory)
Outcome alarms define whether a practice run is successful or failed.
- Evaluated during the practice run.
- If an alarm enters the ALARM state, the practice run is marked as failed.
- Used to validate workload stability when traffic is shifted away from an Availability Zone.
Blocking Alarms (Optional)
Blocking alarms prevent practice runs from starting when the environment is already unhealthy.
- Checked before a practice run starts.
- If active, the practice run does not run.
This helps avoid additional impact during ongoing issues.
Configure Run Windows¶
Run windows control when automatic Zonal Autoshift practice runs are allowed to execute.
- Allowed windows define the time periods when practice runs can occur.
- Blocked windows prevent practice runs from running during specific time ranges.
Run windows help schedule practice runs during appropriate periods and avoid execution during business-critical or peak usage hours.
Practice Run Behavior
When autoshift is enabled:
- Practice runs occur approximately once per week.
- Each run lasts about 30 minutes.
- Traffic shifts away from a single Availability Zone.
- Helps validate application resilience without requiring production incidents.






