Skip to content

Lifecycle

This section describes how users can manage the lifecycle of notebooks in their workspace. In their workspace, users can create and manage multiple notebooks at a given time


Create Notebook

Follow the steps described below to create a new notebook. Once the user authenticates successfully, they will be logged into their workspace and can access the dashboard.

Configure

  • Navigate to the Notebooks section
  • Click on New Notebook and provide a name

Type

Select the type of notebook server. The following types are supported:

  • JupyterLab
  • Visual Studio Code
  • RStudio

Notebook Type


Image

Choose a base image that includes libraries like TensorFlow, PyTorch, or custom-built images. This table lists the default list of images that are available as part of the platform. Administrators can create and provide custom images for users. Please review the related documentation for detailed instructions.

Note

Packages installed by users after spawning a Notebook will last the lifetime of the notebook if they are installed into a PVC-backed directory.


Resources

Specify CPU, GPU, and memory allocation for the notebook. The following defaults are used if the user does not specify anything

  • CPU: 0.5
  • Memory: 1 Gi
  • GPUs: None

Workspace Volume

A workspace volume for a notebook refers to a persistent storage volume that is attached to a notebook server (e.g., Jupyter Notebook). This volume allows the notebook to store data, scripts, models, or other files that persist across sessions and remain available even after the notebook server is stopped or restarted. It provides a way to maintain a consistent working environment for users and ensures that important files are not lost when the notebook is temporarily shut down.

By default, users are provided with a 5GB workspace volume (i.e. backed by a PVC and therefore can persist across sessions) based on the default storage class configured on the host cluster. During notebook creation, users have the option to change the defaults. See the example below for various options available for the user.

Notebook Workspace Volume

The workspace volume is a specific PVC attached to a notebook server where users can store their work. It is commonly used to store:

  • Notebooks: Jupyter or other notebook files.
  • Datasets: Files used during data processing and model training.
  • Scripts: Code, libraries, and custom tools.
  • Model Artifacts: Saved models and checkpoints during training.
  • The volume is mounted inside the notebook’s file system, making it accessible from within the running notebook environment.

For example, let’s say a data scientist is working on a large machine learning project where they need access to datasets, notebooks, and model checkpoints. Here’s how a workspace volume would be helpful:

  1. They upload their dataset to the workspace volume.
  2. The dataset remains available across multiple notebook sessions, so they don’t have to upload it every time.
  3. They save trained models and checkpoints directly to the volume, allowing them to restart the training from the latest checkpoint if needed.
  4. If the notebook server is stopped or scaled down, the volume retains all the files, ensuring that nothing is lost.

Important

For user installed packages to persist across restarts, please ensure these are installed using "pip install --user "


Data Volumes

A data volume in a notebook refers to a persistent storage volume that is mounted to the notebook server to store and access data required for machine learning (ML) tasks, such as datasets, model files, or intermediate results. Data volumes allow notebooks to read from and write to persistent storage, ensuring that data is retained across sessions and that larger datasets can be efficiently handled without requiring constant re-uploading.

The data volume is mounted to a specific directory in the notebook’s file system. For example, it might be mounted at /mnt/data, allowing the notebook server to access and store files in that location.

Users can attach a "new" volume during notebook creation or an "existing" volume that may have been seeded with data.

Notebook Data Volume

Data volumes are especially useful for long-running experiments or workflows where data must be preserved.

  • Handling Large Datasets: Data volumes allow users to store and access large datasets that may not fit within the local storage of the notebook server or may take a long time to upload or download for every session. This is especially beneficial for machine learning tasks that involve significant amounts of data for training and testing.

  • Data Sharing Across Sessions: With a data volume attached to a notebook, the same data can be used across multiple sessions without reloading or reprocessing it. This makes it easy to resume work or continue analysis without losing progress.

  • Separation of Compute and Storage: Data volumes allow for the separation of compute (the notebook server) and storage (the data volume). This allows you to scale, stop, or restart the notebook server without affecting the data.

  • Collaboration: If the data volume is mounted as a shared volume, multiple users can access the same datasets from different notebooks, facilitating collaboration. For example, a team of data scientists can work on the same project using the same datasets, without needing to create multiple copies.


Connect to Notebook

Once the notebook has been configured, Click on Launch.

Depending on the image type selected, it can take a few seconds/minutes for the notebook to become ready for use. A Kubernetes Ingress is automatically created for the notebook server's web interface.

Click on Connect to launch the Jupyter Notebook interface.

Connect Notebooks

This will launch a new web browser tab and redirect the already authenticated user to their notebook exposed via the exposed Ingress. Users can start developing and running machine learning code directly in the browser from this point onwards. Within the notebook, users can:

  • Perform data preprocessing and visualization.
  • Train machine learning models using distributed computing resources.
  • Experiment with hyperparameters, using the Katib integration for tuning.
  • Export or serve trained models directly from the notebook using KFServing.

Shutdown Notebook

Users can click on the "shutdown" icon in their notebook listing page to shut it down. This will stop the notebook server and release the associated Kubernetes resources in the namespace.

Shutdown Notebook

Auto Shutdown

Administrators can configure automatic culling of idle notebooks. Once the threshold is met, the idle notebook is automatically shutdown and the resources are released to the host cluster.

Note

The default idle timer is 24 hours and can be customized by the administrator.


List Notebooks

Users can configure and launch multiple notebooks in their workspace as long as the resources configured for the notebooks do not exceed the configured user quota. In the example below, the user has two notebooks in their workspace. But, only one notebook is active.

List Notebooks


Start Notebook

Stopped (manually or culled) notebooks can be restarted by the user by simply clicking on the Start icon.


Delete Notebook

Users can delete a notebook by clicking on the Delete icon. Note that this is a destructive action and cannot be undone.