Overview

Notebooks provide a way to run web-based development environments for data scientists and ML engineers to develop and experiment with models using extremely popular notebook interfaces such as Jupyter.

The notebooks are deployed and operated on the host Kubernetes clusters. They run as containers inside a Kubernetes Pod, which means the type of IDE (and which packages are installed) is determined by the container image you pick for your server.

Notebooks allow users to:

Write and run code interactively for data analysis, preprocessing, and model training.
Access powerful Kubernetes-based infrastructure directly from the notebook, providing scalability and resource allocation tailored to machine learning workflows.
Integrate with other components like Pipelines, Katib (hyperparameter tuning), and KFServing (model serving), which enables a smooth transition from development to production.

Key Features¶

The section below describes some of the key features of notebooks in the MLOps platform.

Seamless Jupyter Notebook Integration: A native interface for managing and creating Jupyter notebooks, making it easy to leverage familiar tools for data exploration and model building.
Scalable Infrastructure: By running on top of Kubernetes, the notebooks can dynamically allocate CPU, GPU, and memory resources as needed, allowing you to scale your notebook environments for more demanding workloads.
Multi-User and Collaborative Environment: Support for multi-user environments with proper role-based access control (RBAC), enabling secure, isolated access to notebook servers for multiple data scientists within a team or organization.
Notebook Customization: Users can easily customize their notebook environment by selecting pre-configured images with necessary libraries or creating their own custom images for specific projects or tasks.
Integration with MLOps Pipelines: Notebooks can be integrated with Pipelines, making it easy to convert your code into reusable pipelines for end-to-end machine learning workflows.

Benefits¶

Consistency Across Environments: Since these notebooks run on top of Kubernetes, they provide a consistent, reproducible environment for development, testing, and deployment. This helps eliminate discrepancies between local development and production.
Collaboration: With multi-user support and the ability to share notebooks, it fosters collaboration across teams. Data scientists and engineers can work in parallel on the same platform, sharing insights and models more easily.
Resource Optimization: Kubernetes' native scheduling and scaling ensure that your notebooks only use the resources they need, and they can scale up or down based on workload demand, optimizing cloud resource usage.
End-to-End ML Workflow Integration: Notebooks can be integrated into the Pipelines, allowing you to take code written in a notebook and turn it into reusable components that can be executed as part of a CI/CD pipeline for machine learning workflows.

Important

Idle notebooks are automatically shut down (i.e. culled) and the resources given back to the common resource pool. Note that users can restart their notebooks anytime.