Air-Gapped Controller Troubleshooting¶
This guide covers common issues and their solutions for air-gapped (self-hosted) Rafay controllers.
Cluster Issues¶
Cluster Dashboard Not Loading¶
Problem: Cluster dashboards may not load due to TimescaleDB instability affecting metrics flow.
Troubleshooting Steps:
-
Check if required pods are running:
kubectl get pods -A | grep timescale kubectl get pods -A | grep promscales
-
If pods are unhealthy:
- Review pod logs:
kubectl logs -n <namespace> <timescale-pod-name>
- Restart TimescaleDB pods:
kubectl delete pod <timescale-pod-name> -n <namespace>
Tip
Look for connection timeouts or OOM (out of memory) errors in the logs.
MKS Upgrade Failures During Preflight Checks¶
Problem: MKS upgrades may fail intermittently with control channel status fluctuations:
Status of control channel for <Node_name>: Down
Status of control channel for <Node_name>: Up
Status of control channel for <Node_name>: Up
Status of control channel for <Node_name>: Up
Solution:
- SSH into the affected MKS node
- Restart the required services:
sudo systemctl restart salt-minion sudo systemctl restart chisel.service
Note
Wait a few seconds after restarting services before re-attempting the upgrade.
ZTKA Connection Issues¶
Problem: Unable to access cluster using ZTKA channel (kubectl hangs or times out)
Troubleshooting Steps:
- Verify all cluster pods are running
- Check relay-agent pod logs:
kubectl logs -n rafay-system -l app=relay-agent kubectl logs -n rafay-system -l app=v2-relay-agent
Common Causes: - Firewall rules blocking outbound controller connections - DNS or networking issues in restricted environments
Solution:
Restart the relay-agent pod:
kubectl delete pod -n rafay-system -l app=relay-agent
Tip
After restart, verify kubectl connectivity through ZTKA channel.
Controller Issues¶
TLS & Certificate Issues¶
Problem: UI access fails with certificate errors or container image pulls fail due to TLS validation.
Solution: Replace/renew expired TLS certificates using the following steps:
- Prepare Certificate Files
- Generate new TLS certificate and private key for wildcard controller domain (e.g.,
*.controller.example.com
) - Save certificate chain as
tls.crt
-
Save private key as
tls.key
-
Backup Existing TLS Secrets
kubectl get secret admin-ingress-certs -n istio-system -o yaml > admin-ingress-certs.yaml kubectl get secret rafay-container-registry-tls-secret-opaque -n rafay-core -o yaml > rafay-container-registry-tls-secret-opaque.yaml
-
Update TLS Secrets
kubectl create secret generic admin-ingress-certs \ --from-file=tls.crt=tls.crt \ --from-file=tls.key=tls.key \ -n istio-system -o yaml --dry-run=client | kubectl apply -f - kubectl create secret generic rafay-container-registry-tls-secret-opaque \ --from-file=tls.crt=tls.crt \ --from-file=tls.key=tls.key \ -n rafay-core -o yaml --dry-run=client | kubectl apply -f -
-
Restart Affected Deployments
kubectl rollout restart deployment/istio-ingressgateway -n istio-system kubectl rollout restart deployment/admin-api -n rafay-core
Note
Check the validity of the controller console URL before and after the certificate replacement to ensure the new certificate is applied correctly and trusted by the browser.