DL MLOps Pipeline
Please review the overview section to understand details about the end-to-end Deep Learning MLOps pipeline you will implement in the steps below.
Step 1: Login¶
In this step, you will login to your MLOps Platform.
- Navigate to the URL (This will be provided by your platform team)
- Login using your local credentials or SSO credentials (Identity Provider such as Okta)
Once logged in, you will see the home dashboard screen.
Step 2: Create a Notebook¶
In this step, you will create a Jupyter Notebook. The Rafay MLOps platform based on Kubeflow provides a way to use notebooks to create Pipelines in such a way that each step in the Pipeline runs in its own container.
ML orchestration in Kubeflow is done through Pipelines. A Pipeline is a multi-step workflow in which each step is run in its own container—with a special way for data and objects to flow between them. This allows containers to be reused over different Pipelines with extra code written. When Kubeflow Pipelines runs a Component, a container image is spawned in a Kubernetes Pod and the component’s inputs are passed in as command-line arguments. When the component has finished, its outputs are returned as files. These input arguments and output files are known as artifacts.
In this step, we will use our notebook to convert our Python code based pipeline into a Kubeflow Pipelines–compatible code.
- Navigate to Notebooks
- Click New Notebook
- Enter a name for the notebook
- Select JupyterLab
- Select kubeflownotebookswg/jupyter-scipy:v1.8.0 for the custom notebook
- Set the minimum CPU to 1
- Set the minimum memory to 1
- Click Advanced Options under Data Volumes
- Select Configurations -> Allow access to Kubeflow Pipelines
- Click Launch
It will take 1-2 minutes to create the notebook.
Step 3: Generate Pipeline¶
In this step, you will build a model life cycle pipeline.
- Navigate to Notebooks
- Click Connect on the previously created notebook
- Click Terminal to open the terminal
- Enter the following command in the terminal to install kfp-kubernetes
pip install --user kfp-kubernetes
- Download the following notebook file
- In the left hand folder tree, click on the upload files icon
- Upload the previously downloaded dl-notebook.ipynb file
- Double click the dl-notebook.ipynb file in the folder tree to open the notebook
- Click the run icon
- Click the refresh icon in the folder tree view and you will see a new file named dl-pipeline.yaml
- Right click on the file and select Download to download the file to your local machine
Note
The file you downloaded is the Kubeflow Pipelines–compatible code generated from our Python code.
Step 4: Create Pipeline¶
In this step, we will use the pipeline YAML file generated in a previous step and load it in.
- Navigate back to the Kubeflow dashboard
- Click Pipelines
- Click Upload pipeline
- Enter a name for the pipeline
- Select Upload a file
- Click Choose file and select the previously downloaded tensorboard_metrics.yaml file
- Click Create
- Click Experiments (KFP)
- Click Create experiment
- Enter a name for the experiment
- Click Next
Now, we will run the pipeline.
- Click Choose on the Pipeline section and select the previously created pipeline
- Click Choose on the Experiment section and select the previously created experiment
- Enter /data for the data_path parameter
- Click Start
The pipeline will begin to run. After ~15 minutes, you should see the pipeline completed successfully.
Step 5: TensorBoard¶
In this step, we will create a TensorBoard that uses the data written to the PVC used by the pipeline. The TensorBoard will allow us to visualize the the accuracy of the model.
- Navigate back to the Kubeflow dashboard
- Click TensorBoards
- Click New TensorBoard
- Enter a name for the TensorBoard
- Select PVC
- Choose the PVC with name ending in -tb-example
- Select Configurations -> Allow access to Kubeflow Pipelines
- Click Create
- Once the TensorBoard is created, click Connect
You will be redirected to the TensorBoard where you will be able to view the results of the model training.
Recap¶
Congratulations! At this point, you have successfully created a Jupyter notebook which used a pipeline to train a data model and output TensorBoard metrics which were used to visualize the training results.