Cluster provisioning will fail if issues are detected and cannot be automatically overcome. When this occurs, the user is presented with an error message on the Web Console with a link to download the "error logs".
There are a number of environmental issues that can cause provisioning failure.
Pre Flight Checks¶
Users are strongly recommended to perform "pre-flight checks" of the nodes before initiating provisioning. These pre-flight checks are designed to quickly detect environmental or misconfiguration issues that will result in cluster provisioning issues.
Please initiate provisioning ONLY after the pre-flight checks have passed successfully.
If you are unable to resolve the issue yourself, please contact Support or via the provided private Slack channel for your organization. The support organization is available 24x7 and will be able to assist you immediately.
Please make sure that you have downloaded the "error log file" that was shown during failure. Provide this to the support team for troubleshooting.
Remote Diagnosis and Resolution¶
For customers using the SaaS Controller, note that Ops/Support is actively monitoring your clusters.
With a customer's permission, as long as the nodes are operational (i.e. running), support can remotely debug, diagnose and resolve issues. Support will inform the customer if the underlying issue is due to misconfiguration (e.g. network connectivity) or environmental issues (e.g. bad storage etc).
Support DOES NOT require any form of inbound connectivity to perform remote diagnosis and fixes.
Storage Health check¶
The customers using the rook-ceph storage node must deploy the default-upstream blueprint to the cluster.
Step 1: Verify the Blueprint Sync¶
The rook-ceph storage is provided as an add-on with default-upstream blueprint, thus users can verify the rook-ceph managed storage deployment using the blueprint sync icon. Refer Update Blueprint to know more about update blueprint sync status
Step 2: Verify the pods through Kubectl¶
On successful blueprint sync, users can view the rook-ceph pods running as shown in the below example:
kubectl -n rook-ceph get pod 11NAME READY STATUS RESTARTS AGE 12csi-cephfsplugin-4r8c5 3/3 Running 0 4m1s 13csi-cephfsplugin-provisioner-b54db7d9b-mh7mb 6/6 Running 0 4m 14csi-rbdplugin-6684r 3/3 Running 0 4m1s 15csi-rbdplugin-provisioner-5845579d68-sq7f2 6/6 Running 0 4m1s 16rook-ceph-mgr-a-f576c8dc4-76z96 1/1 Running 0 3m8s 17rook-ceph-mon-a-6f6684764f-sljtr 1/1 Running 0 3m29s 18rook-ceph-operator-75fbfb7756-56hq8 1/1 Running 1 17h 19rook-ceph-osd-0-5c466fd66f-8lsq2 1/1 Running 0 2m55s 20rook-ceph-osd-prepare-oci-robbie-tb-mks1-b9t7s 0/1 Completed 0 3m5s 21rook-discover-5q2cq 1/1 Running 1 17h