Overview
Feature stores are like data warehouses for data science. Their primary goal is to enable data scientists to short-circuit the time it takes to go from data ingestion to ML model training & inference. Specifically at larger organizations, data scientists frequently find themselves doing a lot of duplicate work by creating the same features again and again, and then validating them for different use-cases. This increases the time-to-market and costs significantly. A feature store registers available features and makes them ready to be discovered and consumed by ML training pipelines and inference services.
It is a widely known fact that ~80% of a data scientist’s time goes into data wrangling, including tasks like sourcing of data, ingestion of data, cleaning it, and featurizing the data. They should be spending time developing models instead.
Benefits¶
At its core, it provides a centralized storage solution for machine learning features. It enables efficient reuse of features across different models and provides a consistent and reliable source of features during both training and inference.
Online and Offline Features¶
Online Features
These are used in real-time serving environments to ensure low-latency access to features during inference. Online stores can be backed by fast databases like Redis or DynamoDB to ensure they can deliver the required performance.
Offline Features
These are stored in data warehouses or big data storage solutions (like BigQuery, Snowflake, or S3), these features are used for batch processing and model training.
Consistency¶
One of the major benefits of using Feast is maintaining consistency between training and inference. Feast ensures that the same features used during training are available in real-time for serving, which mitigates the training-serving skew problem.
Ingestion and Serving¶
Feast allows users to ingest raw data from multiple sources, transform it into features, and store them in a feature store. These features can then be retrieved for either batch training or online inference.
Versioning and Reusability¶
Features in Feast are version controlled. This allows users to track changes and reuse features across multiple models. This saves time and ensures consistency.
Portability¶
Feast is cloud-agnostic and can be integrated into various data and model serving pipelines, including cloud platforms like AWS, GCP, and Azure.