Configure External Storage for Air-Gapped Controllers¶
Note: This document serves as a reference guide demonstrating configuration steps using Ceph or AWS EBS/EFS as an example CSI storage solution. However, any external storage backend that provides CSI driver support is fully supported by Rafay Controller. You can use this guide as a template for configuring other CSI-compatible storage solutions such as AWS EBS/EFS, GCP Persistent Disk/Filestore, Azure Disk/Files, NFS, or any other CSI-compatible storage backend.
Rafay Controller supports both internal (OpenEBS) and external storage backends for dynamic PVC provisioning. By default, the controller installs and uses OpenEBS for local storage.
If your environment requires a centralized or distributed storage solution such as Ceph, NFS, or any CSI-compatible backend, you can switch to external storage.
Prerequisites¶
Before enabling external storage, ensure the following prerequisites are met:
Step 1: Verify CSI Driver Installation¶
CSI drivers must already be installed on the Rafay Controller. Verify installation with:
kubectl get csidrivers
kubectl get pods -n kube-system | grep csi
Important: Ensure CSI driver pods are in Running state before proceeding.
Step 2: Verify StorageClass Availability¶
Required StorageClasses must already exist on the Controller. Verify with:
kubectl get sc
Note: Both RWO (ReadWriteOnce) and RWX (ReadWriteMany) classes must exist if you plan to use both access modes.
Step 3: Verify Network Connectivity¶
Ensure network connectivity to the external storage backend from Controller nodes is available. Test connectivity from a node:
# For Ceph
telnet <ceph-mon-ip> 6789
Step 4: Verify CSI Driver Readiness¶
Verify CSI driver readiness by checking if the driver reports as available:
kubectl get csidriver <driver-name> -o jsonpath='{.status.attachRequired}'
Supported External Storage Backends¶
Rafay Controller supports any CSI-compatible storage backend that provides a Kubernetes StorageClass. This includes, but is not limited to:
- Ceph RBD / CephFS - distributed block and file storage (e.g., Rook Ceph)
- AWS EBS / EFS - Amazon managed block and file storage
- GCP PD / Filestore - Google Cloud managed storage solutions
- Azure Disk / Azure Files - Microsoft Azure managed storage
- NFS - network file system (when CSI NFS driver is installed)
- Any other CSI-compatible backend that provides a Kubernetes StorageClass
The examples in this document use Rook Ceph as a reference, but you can apply the same configuration principles to any CSI-compatible storage solution.
Configuration Steps¶
Step 1: Install CSI Storage Drivers¶
Critical: After executing
radm init(which installs Kubernetes on the Controller nodes) on the Controller nodes, install your required CSI storage drivers on the cluster before installing dependencies on the Controller.
- Install your CSI storage drivers on the cluster
- Ensure the CSI pods and StorageClasses are available and healthy
- Only then proceed with
radm dependency
Warning: If you proceed with
radm dependencybefore CSI drivers are installed, the controller will fail due to missing storage dependencies.
Step 2: Configure External Storage in config.yaml¶
Configure the following parameters in your config.yaml before executing dependencies:
storage:
external:
enabled: true # When true, skip installing internal storage OpenEBS
storageClass:
readWriteOnce: "ceph-rbd" # RWO StorageClass (ex: gp2, openebs-hostpath, standard)
readWriteMany: "ceph-cephfs" # RWX StorageClass (ex: gp2, openebs-kernel-nfs, standard)
Important Notes: - Both RWO and RWX classes are recommended for full functionality - The controller will fail to start if the specified StorageClasses do not exist - Ensure the StorageClass names exactly match what exists in your cluster
Step 3: Install Controller Dependencies¶
Run the following command to install controller dependencies with external storage enabled:
sudo radm dependency --config config.yaml
Configuration Examples¶
The following examples demonstrate how to configure external storage using different CSI-compatible backends. These are reference examples—you can adapt them to your specific storage solution.
Example 1: Ceph (RBD + CephFS) - Reference Example
storage:
external:
enabled: true
storageClass:
readWriteOnce: "ceph-rbd"
readWriteMany: "ceph-cephfs"
Pre-configuration requirements:
- Ceph cluster must be healthy:
ceph healthshould reportHEALTH_OK - Verify cephx credentials are correctly configured in CSI secrets
- Ensure RBD and CephFS are enabled in your Ceph cluster
- Validate pool/filesystem exists:
ceph osd pool ls ceph fs ls
Example 2: AWS (EBS + EFS) - Reference Example
storage:
external:
enabled: true
storageClass:
readWriteOnce: "gp2"
readWriteMany: "gp2"
Pre-configuration requirements:
- AWS EBS CSI driver must be installed with proper IAM permissions
- EFS CSI driver must be installed
- Verify IAM role has
ec2:AttachVolume,ec2:CreateVolumepermissions for EBS - EFS security groups must allow NFS traffic (port 2049) from worker nodes
- Validate EFS is accessible:
showmount -e <efs-dns-name>
Validation Steps¶
Step 1: Validate StorageClass¶
Verify that your required StorageClasses exist and are correctly configured:
kubectl get sc
kubectl describe sc <rwo-class-name>
kubectl describe sc <rwx-class-name>
Ensure the output shows:
- Provisioner matches your CSI driver (e.g., pd.csi.storage.gke.io, ceph.rbd.csi.ceph.com)
- Parameters are correctly set for your backend
- Reclaim Policy is set appropriately (usually Delete or Retain)
Step 2: Inspect CSI Driver Configuration¶
Verify your CSI driver is properly configured:
kubectl get storageclasses -o yaml | grep -A 10 provisioner
kubectl get csidrivers -o yaml
kubectl get pods -n kube-system | grep csi
Check for any driver-related errors or misconfigurations.
Troubleshooting¶
Issue: PVC Stuck in Pending¶
A PVC in Pending state indicates the storage backend cannot provision a volume.
Diagnostic commands:
# Check PVC status and events
kubectl describe pvc <pvc-name>
# Look for specific error messages
kubectl get events | grep <pvc-name>
# Check if StorageClass exists
kubectl get sc <storage-class-name>
# Verify CSI driver is running
kubectl get pods -n kube-system | grep csi
Common causes and solutions:
- StorageClass does not exist:
- Verify the StorageClass name in your config matches exactly with
kubectl get sc - StorageClass names are case-sensitive
- Verify the StorageClass name in your config matches exactly with
- CSI driver missing or not ready:
- Ensure the CSI driver pods are in Running state:
kubectl get pods -n kube-system | grep -i csi kubectl logs -n kube-system <csi-driver-pod> | grep -i error
- Ensure the CSI driver pods are in Running state:
- Backend unreachable:
- Verify network connectivity from Controller nodes to the storage backend. Test from a node:
From the debug pod:
kubectl debug node/<node-name> -it --image=ubuntuping <storage-backend-ip> telnet <storage-backend-ip> <port>
- Verify network connectivity from Controller nodes to the storage backend. Test from a node:
- Wrong pool/share configuration:
- For Ceph: Verify the pool exists with
ceph osd pool lsand RBD is enabled
- For Ceph: Verify the pool exists with
- Missing credentials:
- For Ceph: Verify cephx keys are in the CSI secret:
kubectl get secret -n kube-system csi-rbd-secret -o yaml - For AWS: Verify IAM role has required permissions:
ec2:CreateVolume,ec2:AttachVolume,ec2:DeleteVolume
- For Ceph: Verify cephx keys are in the CSI secret:
Issue: Pod Stuck in ContainerCreating (Mount Issues)¶
When a pod is stuck in ContainerCreating state, the container cannot mount the PVC.
Diagnostic commands:
# Check pod events
kubectl describe pod <pod-name>
# Check kubelet logs for mount errors
kubectl logs -n kube-system -l component=kubelet | grep -i mount
# Verify PVC is bound
kubectl get pvc <pvc-name>
# Check if volume is attached to the node
kubectl get volumeattachments | grep <pvc-name>
Common causes and solutions:
- Node cannot connect to the storage backend:
- The Controller nodes cannot reach the storage backend. Verify network connectivity:
# From the node ping <storage-backend-ip> telnet <storage-backend-ip> <port>
- The Controller nodes cannot reach the storage backend. Verify network connectivity:
- Firewall/security group issues:
- Ensure firewall rules allow traffic from Controller nodes to the storage backend:
- Ceph: Ports 6789 (mon), 6800-7300 (osd/mds)
- EBS: AWS API access through EC2 instance role
- EFS: Port 2049 from security group
- Ensure firewall rules allow traffic from Controller nodes to the storage backend:
- Ceph MONs server down:
- Check backend health:
# For Ceph ceph status ceph quorum_status
- Check backend health:
- Missing kernel modules:
- Required kernel modules must be loaded on Controller nodes:
# Check for RBD module lsmod | grep rbd modprobe rbd
- Required kernel modules must be loaded on Controller nodes:
- Wrong fstype:
- Verify the StorageClass specifies the correct filesystem type:
# Check parameters, especially fstype kubectl describe sc <storage-class-name>
- Verify the StorageClass specifies the correct filesystem type:
- CSI driver plugin not running on node:
- Ensure CSI node plugins are deployed on all worker nodes:
kubectl get daemonset -n kube-system | grep csi kubectl get pods -n kube-system -o wide | grep csi
- Ensure CSI node plugins are deployed on all worker nodes: