Air-Gapped Controller Troubleshooting¶
This guide covers common issues and their solutions for air-gapped (self-hosted) Rafay controllers.
Cluster Issues¶
Cluster Dashboard Not Loading¶
Problem: Cluster dashboards may not load due to TimescaleDB instability affecting metrics flow.
Troubleshooting Steps:
- 
Check if required pods are running: kubectl get pods -A | grep timescale kubectl get pods -A | grep promscales
- 
If pods are unhealthy: 
- Review pod logs:
   kubectl logs -n <namespace> <timescale-pod-name>
- Restart TimescaleDB pods:
   kubectl delete pod <timescale-pod-name> -n <namespace>
Tip
Look for connection timeouts or OOM (out of memory) errors in the logs.
MKS Upgrade Failures During Preflight Checks¶
Problem: MKS upgrades may fail intermittently with control channel status fluctuations:
Status of control channel for <Node_name>: Down  
Status of control channel for <Node_name>: Up  
Status of control channel for <Node_name>: Up  
Status of control channel for <Node_name>: Up
Solution:
- SSH into the affected MKS node
- Restart the required services:
sudo systemctl restart salt-minion sudo systemctl restart chisel.service
Note
Wait a few seconds after restarting services before re-attempting the upgrade.
ZTKA Connection Issues¶
Problem: Unable to access cluster using ZTKA channel (kubectl hangs or times out)
Troubleshooting Steps:
- Verify all cluster pods are running
- Check relay-agent pod logs:
kubectl logs -n rafay-system -l app=relay-agent kubectl logs -n rafay-system -l app=v2-relay-agent
Common Causes: - Firewall rules blocking outbound controller connections - DNS or networking issues in restricted environments
Solution:
Restart the relay-agent pod:
kubectl delete pod -n rafay-system -l app=relay-agent
Tip
After restart, verify kubectl connectivity through ZTKA channel.
Controller Issues¶
TLS & Certificate Issues¶
Problem: UI access fails with certificate errors or container image pulls fail due to TLS validation.
Solution: Replace/renew expired TLS certificates using the following steps:
- Prepare Certificate Files
- Generate new TLS certificate and private key for wildcard controller domain (e.g., *.controller.example.com)
- Save certificate chain as tls.crt
- 
Save private key as tls.key
- 
Backup Existing TLS Secrets kubectl get secret admin-ingress-certs -n istio-system -o yaml > admin-ingress-certs.yaml kubectl get secret rafay-container-registry-tls-secret-opaque -n rafay-core -o yaml > rafay-container-registry-tls-secret-opaque.yaml
- 
Update TLS Secrets kubectl create secret generic admin-ingress-certs \ --from-file=tls.crt=tls.crt \ --from-file=tls.key=tls.key \ -n istio-system -o yaml --dry-run=client | kubectl apply -f - kubectl create secret generic rafay-container-registry-tls-secret-opaque \ --from-file=tls.crt=tls.crt \ --from-file=tls.key=tls.key \ -n rafay-core -o yaml --dry-run=client | kubectl apply -f -
- 
Restart Affected Deployments kubectl rollout restart deployment/istio-ingressgateway -n istio-system kubectl rollout restart deployment/admin-api -n rafay-core
Note
Check the validity of the controller console URL before and after the certificate replacement to ensure the new certificate is applied correctly and trusted by the browser.