Skip to content

Troubleshooting

Cluster provisioning will fail if issues are detected and cannot be automatically overcome. When this occurs, the user is presented with an error message on the Web Console with a link to download the "error logs".

Provisioning Failure


Self Diagnosis

There are a number of environmental issues that can cause provisioning failure.

Pre Flight Checks

Users are strongly recommended to perform "pre-flight checks" of the nodes before initiating provisioning. These pre-flight checks are designed to quickly detect environmental or misconfiguration issues that will result in cluster provisioning issues.

Please initiate provisioning ONLY after the pre-flight checks have passed successfully.


Support

If you are unable to resolve the issue yourself, please contact Support or via the provided private Slack channel for your organization. The support organization is available 24x7 and will be able to assist you immediately.

Please make sure that you have downloaded the "error log file" that was shown during failure. Provide this to the support team for troubleshooting.


Remote Diagnosis and Resolution

For customers using the SaaS Controller, note that Ops/Support is actively monitoring your clusters.

With a customer's permission, as long as the nodes are operational (i.e. running), support can remotely debug, diagnose and resolve issues. Support will inform the customer if the underlying issue is due to misconfiguration (e.g. network connectivity) or environmental issues (e.g. bad storage etc).

Important

Support DOES NOT require any form of inbound connectivity to perform remote diagnosis and fixes.


Storage Health check

The customers using the rook-ceph storage node must deploy the default-upstream blueprint to the cluster.

Step 1: Verify the Blueprint Sync

The rook-ceph storage is provided as an add-on with default-upstream blueprint, thus users can verify the rook-ceph managed storage deployment using the blueprint sync icon. Refer Update Blueprint to know more about update blueprint sync status

Step 2: Verify the pods through Kubectl

On successful blueprint sync, users can view the rook-ceph pods running as shown in the below example:

kubectl -n rook-ceph get pod
11NAME                                             READY   STATUS      RESTARTS   AGE
12csi-cephfsplugin-4r8c5                           3/3     Running     0          4m1s
13csi-cephfsplugin-provisioner-b54db7d9b-mh7mb     6/6     Running     0          4m
14csi-rbdplugin-6684r                              3/3     Running     0          4m1s
15csi-rbdplugin-provisioner-5845579d68-sq7f2       6/6     Running     0          4m1s
16rook-ceph-mgr-a-f576c8dc4-76z96                  1/1     Running     0          3m8s
17rook-ceph-mon-a-6f6684764f-sljtr                 1/1     Running     0          3m29s
18rook-ceph-operator-75fbfb7756-56hq8              1/1     Running     1          17h
19rook-ceph-osd-0-5c466fd66f-8lsq2                 1/1     Running     0          2m55s
20rook-ceph-osd-prepare-oci-robbie-tb-mks1-b9t7s   0/1     Completed   0          3m5s
21rook-discover-5q2cq                              1/1     Running     1          17h