Skip to content

About

Rafay's MLOps platform for Kubeflow provides an integrated Feature Store based on the Feast project. Feast is an open-source feature store that serves as a central repository for managing, storing, and serving machine learning features. It plays a critical role in productionizing machine learning models by allowing teams to manage the full lifecycle of feature data, from feature ingestion and storage to serving real-time and batch features for online and offline model training.

Feast simplifies the complexities of managing features in machine learning pipelines by offering a unified platform for feature storage, management, and serving. By ensuring feature consistency, low-latency access, and reusability, Feast enables efficient model development and deployment at scale.

Feast Logo


Architecture & Design

Different systems may produce data at different rates, and a feature store is flexible enough to handle those different cadences, both for ingestion and during retrieval. For example, sensor data could be produced in real time, arriving every second, or there could be a monthly file that is generated from an external system reporting a summary of the last month’s transactions. Each of these need to be processed and ingested into the feature store.

A user-facing application may operate at very low latency using up-to-the-second features, whereas when training the model, features are pulled offline as a larger batch but with higher latency.

No single database system can handle both scaling to potentially terabytes of data and extremely low latency on the order of milliseconds. Rafay's Integrated Feature Store's is designed to handle both offline and online feature stores.

Offline vs Offline

Rafay's integrated "Feast" based Feature Store is specifically configured to use different services for the offline and online use cases. For example, the design for Google Cloud (GCP) provisions and uses the following managed services.

Feature Description Service
Online Store Low-latency feature storage for real-time model inference Google Memory Store (Managed Redis)
Offline Store Large-scale storage for feature batch processing and model training Google Cloud Storage (GCS)