By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
A Feature Store is a centralized repository for storing, sharing, and reusing ML features across training and inference. It eliminates feature drift (inconsistencies between training and production data) and redundant feature engineering (recomputing the same features for different models). Amazon SageMaker Feature Store is AWS’s managed solution, offering online (low-latency) and offline (batch) storage for real-time and batch ML workflows.
Real-world scenario: A ride-hailing app (like Uber) needs to predict ETA (Estimated Time of Arrival) in real-time. Features like traffic conditions, driver location, historical trip times, and weather data must be computed once and reused across: - Training (batch jobs to build the ETA model). - Real-time inference (when a user requests a ride, the app fetches the latest features in milliseconds). - Batch inference (daily reports on ETA accuracy).
Without a feature store, teams waste time recomputing features, risk inconsistencies, and struggle with latency.
SageMaker Feature Store AWS’s managed feature store for storing, sharing, and retrieving ML features. Supports online (low-latency) and offline (batch) access. Reduces feature drift and duplication.
Feature Group A collection of features (e.g., user_id, avg_trip_duration, current_traffic) stored in SageMaker Feature Store. Each feature group has a schema and can be online-only, offline-only, or both.
user_id
avg_trip_duration
current_traffic
Online Store A low-latency (sub-10ms) key-value store for real-time inference (e.g., fetching a user’s latest features when they open the app). Backed by Amazon DynamoDB under the hood.
Offline Store A batch-optimized store for training and batch inference (e.g., generating daily reports). Data is stored in Amazon S3 in Parquet format and queried via Athena or SageMaker Processing.
Feature Definition A schema for a feature (e.g., feature_name: "avg_trip_duration", dtype: "float", description: "Average trip time in minutes"). Ensures consistency across teams.
feature_name: "avg_trip_duration", dtype: "float", description: "Average trip time in minutes"
Record Identifier A unique key (e.g., user_id, trip_id) that links features to an entity (e.g., a user or ride). Used to query features in the online store.
trip_id
Event Time A timestamp (e.g., trip_start_time) that tracks when a feature was computed. Critical for time-travel queries (e.g., "What were the features for User X at 3 PM yesterday?").
trip_start_time
Feature Drift When training data features differ from production features (e.g., a feature is computed differently in training vs. inference). A feature store prevents this by ensuring the same feature logic is used everywhere.
Tecton / Feast Open-source feature stores (not AWS-native). Tecton is a managed service (like SageMaker Feature Store but cloud-agnostic), while Feast is self-hosted. AWS exams focus on SageMaker Feature Store.
SageMaker Processing A managed batch processing service for feature engineering (e.g., computing avg_trip_duration from raw trip logs). Outputs can be written to the offline store.
SageMaker Pipelines AWS’s ML orchestration service for automating feature engineering, training, and inference workflows. Can trigger SageMaker Processing jobs to update the feature store.
Athena AWS’s serverless SQL query engine for analyzing data in S3 (e.g., querying the offline store for training data).
User
Ride
Driver
user_avg_rating
ride_distance
driver_current_location
user_features
last_trip_time
event_time
online_store=True
user_feature_group = FeatureGroup( name="user-features", sagemaker_session=sagemaker_session, record_identifier_name="user_id", event_time_feature_name="event_time", enable_online_store=True, s3_uri="s3://my-bucket/offline-store/" ) user_feature_group.create( s3_uri="s3://my-bucket/offline-store/", record_identifier_name="user_id", event_time_feature_name="event_time", role_arn="arn:aws:iam::123456789012:role/FeatureStoreRole" ) ```
PutRecord
Example: ```python from sagemaker.feature_store.feature_store import FeatureStore
feature_store = FeatureStore(sagemaker_session) feature_store.put_record( feature_group_name="user-features", record=[ {"FeatureName": "user_id", "ValueAsString": "123"}, {"FeatureName": "avg_trip_duration", "ValueAsString": "15.2"}, {"FeatureName": "event_time", "ValueAsString": "2023-10-01T12:00:00Z"} ] ) ```
sql SELECT user_id, avg_trip_duration, last_trip_time FROM user_features_offline WHERE event_time BETWEEN timestamp '2023-01-01' AND timestamp '2023-10-01'
GetRecord
python response = feature_store.get_record( feature_group_name="user-features", record_identifier_value="123" ) print(response["Record"]) # Returns latest features for user_id=123
s3:GetObject
dynamodb:GetItem
sagemaker:PutRecord
WHERE event_time BETWEEN ...
A fintech company needs to detect fraudulent transactions in real-time. Features like user_spending_patterns and device_location must be fetched in <10ms. Which SageMaker Feature Store configuration should they use? - A) Offline-only store - B) Online-only store - C) Both online and offline stores - D) Neither; use DynamoDB directly
user_spending_patterns
device_location
Answer: B) Online-only store Explanation: Real-time fraud detection requires low-latency feature access, which the online store provides. The offline store is unnecessary if training is done separately.
A data scientist is building a recommendation model and needs to train on 3 months of historical user features. The features are already computed and stored in S3. Which approach is most cost-effective? - A) Query the online store for all historical features - B) Use Athena to query the offline store (S3) - C) Recompute all features from raw data - D) Use DynamoDB Streams to replay historical features
Answer: B) Use Athena to query the offline store (S3) Explanation: The offline store (S3) is cheaper for batch queries, and Athena can efficiently scan historical data.
A team notices that their production model’s accuracy is dropping because the features used in training differ from those in inference. What is the most likely cause, and how can SageMaker Feature Store help? - A) The model is overfitting; retrain with more data - B) Feature drift; use the same feature group for training and inference - C) The online store is too slow; switch to DynamoDB - D) The offline store is corrupted; restore from backup
Answer: B) Feature drift; use the same feature group for training and inference Explanation: Feature drift occurs when training and inference features differ. A feature store ensures the same feature logic is used in both phases.
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.