Skip to content

Configure External Storage for Air-Gapped Controllers

Note: This document serves as a reference guide demonstrating configuration steps using Ceph or AWS EBS/EFS as an example CSI storage solution. However, any external storage backend that provides CSI driver support is fully supported by Rafay Controller. You can use this guide as a template for configuring other CSI-compatible storage solutions such as AWS EBS/EFS, GCP Persistent Disk/Filestore, Azure Disk/Files, NFS, or any other CSI-compatible storage backend.

Rafay Controller supports both internal (OpenEBS) and external storage backends for dynamic PVC provisioning. By default, the controller installs and uses OpenEBS for local storage.

If your environment requires a centralized or distributed storage solution such as Ceph, NFS, or any CSI-compatible backend, you can switch to external storage.


Prerequisites

Before enabling external storage, ensure the following prerequisites are met:

Step 1: Verify CSI Driver Installation

CSI drivers must already be installed on the Rafay Controller. Verify installation with:

kubectl get csidrivers
kubectl get pods -n kube-system | grep csi

Important: Ensure CSI driver pods are in Running state before proceeding.

Step 2: Verify StorageClass Availability

Required StorageClasses must already exist on the Controller. Verify with:

kubectl get sc

Note: Both RWO (ReadWriteOnce) and RWX (ReadWriteMany) classes must exist if you plan to use both access modes.

Step 3: Verify Network Connectivity

Ensure network connectivity to the external storage backend from Controller nodes is available. Test connectivity from a node:

# For Ceph
telnet <ceph-mon-ip> 6789

Step 4: Verify CSI Driver Readiness

Verify CSI driver readiness by checking if the driver reports as available:

kubectl get csidriver <driver-name> -o jsonpath='{.status.attachRequired}'

Supported External Storage Backends

Rafay Controller supports any CSI-compatible storage backend that provides a Kubernetes StorageClass. This includes, but is not limited to:

  • Ceph RBD / CephFS - distributed block and file storage (e.g., Rook Ceph)
  • AWS EBS / EFS - Amazon managed block and file storage
  • GCP PD / Filestore - Google Cloud managed storage solutions
  • Azure Disk / Azure Files - Microsoft Azure managed storage
  • NFS - network file system (when CSI NFS driver is installed)
  • Any other CSI-compatible backend that provides a Kubernetes StorageClass

The examples in this document use Rook Ceph as a reference, but you can apply the same configuration principles to any CSI-compatible storage solution.


Configuration Steps

Step 1: Install CSI Storage Drivers

Critical: After executing radm init (which installs Kubernetes on the Controller nodes) on the Controller nodes, install your required CSI storage drivers on the cluster before installing dependencies on the Controller.

  1. Install your CSI storage drivers on the cluster
  2. Ensure the CSI pods and StorageClasses are available and healthy
  3. Only then proceed with radm dependency

Warning: If you proceed with radm dependency before CSI drivers are installed, the controller will fail due to missing storage dependencies.

Step 2: Configure External Storage in config.yaml

Configure the following parameters in your config.yaml before executing dependencies:

storage:
  external:
    enabled: true  # When true, skip installing internal storage OpenEBS
  storageClass:
    readWriteOnce: "ceph-rbd"     # RWO StorageClass (ex: gp2, openebs-hostpath, standard)
    readWriteMany: "ceph-cephfs"  # RWX StorageClass (ex: gp2, openebs-kernel-nfs, standard)

Important Notes: - Both RWO and RWX classes are recommended for full functionality - The controller will fail to start if the specified StorageClasses do not exist - Ensure the StorageClass names exactly match what exists in your cluster

Step 3: Install Controller Dependencies

Run the following command to install controller dependencies with external storage enabled:

sudo radm dependency --config config.yaml

Configuration Examples

The following examples demonstrate how to configure external storage using different CSI-compatible backends. These are reference examples—you can adapt them to your specific storage solution.

Example 1: Ceph (RBD + CephFS) - Reference Example

storage:
  external:
    enabled: true
  storageClass:
    readWriteOnce: "ceph-rbd"
    readWriteMany: "ceph-cephfs"

Pre-configuration requirements:

  1. Ceph cluster must be healthy: ceph health should report HEALTH_OK
  2. Verify cephx credentials are correctly configured in CSI secrets
  3. Ensure RBD and CephFS are enabled in your Ceph cluster
  4. Validate pool/filesystem exists:
    ceph osd pool ls
    ceph fs ls
    

Example 2: AWS (EBS + EFS) - Reference Example

storage:
  external:
    enabled: true
  storageClass:
    readWriteOnce: "gp2"
    readWriteMany: "gp2"

Pre-configuration requirements:

  1. AWS EBS CSI driver must be installed with proper IAM permissions
  2. EFS CSI driver must be installed
  3. Verify IAM role has ec2:AttachVolume, ec2:CreateVolume permissions for EBS
  4. EFS security groups must allow NFS traffic (port 2049) from worker nodes
  5. Validate EFS is accessible:
    showmount -e <efs-dns-name>
    

Validation Steps

Step 1: Validate StorageClass

Verify that your required StorageClasses exist and are correctly configured:

kubectl get sc
kubectl describe sc <rwo-class-name>
kubectl describe sc <rwx-class-name>

Ensure the output shows: - Provisioner matches your CSI driver (e.g., pd.csi.storage.gke.io, ceph.rbd.csi.ceph.com) - Parameters are correctly set for your backend - Reclaim Policy is set appropriately (usually Delete or Retain)

Step 2: Inspect CSI Driver Configuration

Verify your CSI driver is properly configured:

kubectl get storageclasses -o yaml | grep -A 10 provisioner
kubectl get csidrivers -o yaml
kubectl get pods -n kube-system | grep csi

Check for any driver-related errors or misconfigurations.


Troubleshooting

Issue: PVC Stuck in Pending

A PVC in Pending state indicates the storage backend cannot provision a volume.

Diagnostic commands:

# Check PVC status and events
kubectl describe pvc <pvc-name>

# Look for specific error messages
kubectl get events | grep <pvc-name>

# Check if StorageClass exists
kubectl get sc <storage-class-name>

# Verify CSI driver is running
kubectl get pods -n kube-system | grep csi

Common causes and solutions:

  1. StorageClass does not exist:
    • Verify the StorageClass name in your config matches exactly with kubectl get sc
    • StorageClass names are case-sensitive
  2. CSI driver missing or not ready:
    • Ensure the CSI driver pods are in Running state:
      kubectl get pods -n kube-system | grep -i csi
      kubectl logs -n kube-system <csi-driver-pod> | grep -i error
      
  3. Backend unreachable:
    • Verify network connectivity from Controller nodes to the storage backend. Test from a node:
      kubectl debug node/<node-name> -it --image=ubuntu
      
      From the debug pod:
      ping <storage-backend-ip>
      telnet <storage-backend-ip> <port>
      
  4. Wrong pool/share configuration:
    • For Ceph: Verify the pool exists with ceph osd pool ls and RBD is enabled
  5. Missing credentials:
    • For Ceph: Verify cephx keys are in the CSI secret:
      kubectl get secret -n kube-system csi-rbd-secret -o yaml
      
    • For AWS: Verify IAM role has required permissions: ec2:CreateVolume, ec2:AttachVolume, ec2:DeleteVolume

Issue: Pod Stuck in ContainerCreating (Mount Issues)

When a pod is stuck in ContainerCreating state, the container cannot mount the PVC.

Diagnostic commands:

# Check pod events
kubectl describe pod <pod-name>

# Check kubelet logs for mount errors
kubectl logs -n kube-system -l component=kubelet | grep -i mount

# Verify PVC is bound
kubectl get pvc <pvc-name>

# Check if volume is attached to the node
kubectl get volumeattachments | grep <pvc-name>

Common causes and solutions:

  1. Node cannot connect to the storage backend:
    • The Controller nodes cannot reach the storage backend. Verify network connectivity:
      # From the node
      ping <storage-backend-ip>
      telnet <storage-backend-ip> <port>
      
  2. Firewall/security group issues:
    • Ensure firewall rules allow traffic from Controller nodes to the storage backend:
      • Ceph: Ports 6789 (mon), 6800-7300 (osd/mds)
      • EBS: AWS API access through EC2 instance role
      • EFS: Port 2049 from security group
  3. Ceph MONs server down:
    • Check backend health:
      # For Ceph
      ceph status
      ceph quorum_status
      
  4. Missing kernel modules:
    • Required kernel modules must be loaded on Controller nodes:
      # Check for RBD module
      lsmod | grep rbd
      modprobe rbd
      
  5. Wrong fstype:
    • Verify the StorageClass specifies the correct filesystem type:
      # Check parameters, especially fstype
      kubectl describe sc <storage-class-name>
      
  6. CSI driver plugin not running on node:
    • Ensure CSI node plugins are deployed on all worker nodes:
      kubectl get daemonset -n kube-system | grep csi
      kubectl get pods -n kube-system -o wide | grep csi