Evidently

Evidently helps evaluate, test, and monitor data and ML-powered systems. Evidently is both a library of 100+ ready-made evaluations, and a framework to easily implement yours: from Python functions to LLM judges.

Evidently has a modular architecture, and you can start with ad hoc checks without complex installations. There are 3 interfaces with which you can:

Get a visual report with a summary of evaluation metrics
Run conditional checks with a TestSuite to get a pass/fail outcome, and
Plot the evaluation results over time on a Monitoring Dashboard.

Configuration¶

The below content provides a sample notebook which can be used to create and show both Reports and Test Suites within Evidently Cloud (i.e. their SaaS offering). The metrics will then be displayed within Monitoring Dashboards which can be used to visualize trends.

You can create a free account here

Within you Rafay Kubeflow based MLOps environment:

Navigate to Notebooks
Click New Notebook
Enter a name for the notebook
Select JupyterLab
Set the minimum CPU to 1
Set the minimum memory to 1
Click Launch

It will take 1-2 minutes to create the notebook.

Navigate to Notebooks
Click Connect on the previously created notebook

Download Notebook

In the left hand folder tree, click on the upload files icon
Upload the previously downloaded evidently_advanced.ipynb file
Double click the evidently_advanced.ipynb file in the folder tree to open the notebook

Credentials¶

You need to provide credentials for the notebook to connect to your Evidently Cloud account and aggregate model metrics etc.

Update the section in the notebook that contains ENTER API KEY with the value of your API key found here
Update the sections in the notebook that contains ENTER PROJECT UUID with the value of your project UUID found here and then be either selecting or creating a project
Click the Restart kernel and run all cells icon

Model Metrics¶

Within the Notebook, the code will use the Adult dataset from OpenML. It will split this dataset to create two different datasets, one that mimics a reference dataset and one that mimics a production dataset. A reference dataset is needed to compare current production data against in order to evaluate drift. In this example, artificial drift between the datasets will be introduced to help simulate drift conditions.

Reports¶

The notebook will first use the Evidently library to evaluate the datasets on data drift and data quality. Reports of these evaluations will be created and will be sent to Evidently Cloud. Ten reports will be created to simulate a daily report being produced over a ten day period. The frequency in which reports are sent can be customized by the user all the way from scheduled events to near real time.

The reports can be accessed by going to your Evidently project and selecting Reports on the left hand tree.

Click Explore on the report to explore additional details

Test Suites¶

The notebook will then use the Evidently library to evaluate the datasets on data drift and data quality . Test Suites of these evaluations will be created and will be sent to Evidently Cloud. Ten Test Suites will be created to simulate a daily report being produced over a ten day period.

Test Suites differ from Reports as Test Suites check for specific conditions within the data and present a pass/fail status on the check.

The test suites can be accessed by going to your Evidently project and selecting Test Suites on the left hand tree

Click Explore on the test suite to explore additional details

Monitoring Dashboards¶

Now that multiple days worth of Reports and Test Suites have been sent to Evidently Cloud, the system can now display this data within a monitoring dashboard to show the trends of specific metrics.

The dashboards can be accessed by going to your Evidently project and selecting Dashboard on the left hand tree

Custom panels within the dashboard can be created to show the specific report metrics required for your dataset monitoring.

By clicking the Data Tests tab within the Dashboard window, you will then be directed to the monitoring dashboards for the uploaded Test Suites.