Troubleshoot
If you have encountered an issue deploying or accessing the environment, use the below troubleshooting steps to resolve the issue.
- Log into the Rafay console and navigate to Environments -> Environments and click on the MLOps environment
- If there are any failures, expand the activity and review any error messages in detail
- If the error messages are related to variables that were entered, edit the environment variables at the top of the screen and click Save & Deploy to reprovision the environment with updated variables
Once the issue has been identified and corrected, go back to the environment and attempt to deploy the environment again.
- If there is a failure deploying one of the application resources (MLflow, Kubeflow or Feast), review the error message within Environment Manager. Correct the issue and deploy the environment again.
- If additional details are needed, go to Infrastructure -> Clusters and then go to the Resources tab of the deployed cluster. Select Pods in the left hand pane and find any pods that are not in a Running state.
- If there are pods that are not in a running state, select the Actions button for that pod and select events to see if there are any issues.
- If there are pods that are not in a running state, select the Actions button for that pod and select shell and logs -> logs. Review the logs to determine why the pod is not in a running state
Once the issue has been identified and corrected, go back to the environment and attempt to deploy the environment again.
- If the environment successfully deployed but the MLOps URL cannot be accessed, be sure that the cluster's public IP addresses have been registered to the URL domain
- If the environment successfully deployed but the MLOps URL cannot be accessed, be sure that the DNS certificates are valid for the URL domain
- If the local account is not working ensure the proper username and password entered during employment are being used to login
Important
Please contact your assigned Rafay customer success person if you need assistance with further troubleshooting.