Fatskills
Practice. Master. Repeat.
Study Guide: Data Science and Machine Learning 101: Model Deployment and MLOps MLOps Principles Monitoring Drift Detection Retraining Feature Store
Source: https://www.fatskills.com/introdution-to-engineering/chapter/data-science-and-machine-learning-data-science-and-machine-learning-model-deployment-and-mlops-mlops-principles-monitoring-drift-detection-retraining-feature-store

Data Science and Machine Learning 101: Model Deployment and MLOps MLOps Principles Monitoring Drift Detection Retraining Feature Store

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~5 min read

What This Is

MLOps (ML Operations) is the set of practices that keep a machine‑learning model reliable after it leaves the notebook. It covers continuous monitoring, detecting data or concept drift, triggering automated retraining, and managing features in a centralized feature store. In production, a churn‑prediction model that scores millions of customers each day must stay accurate even as buying habits change; MLOps supplies the guardrails that catch degradation early and refresh the model without manual firefighting.


Key Terms & Formulas

  • Monitoring Dashboard – Real‑time UI (e.g., Grafana, Evidently) that visualises metrics such as latency, error rate, and prediction distribution.
  • Data Drift (Covariate Shift) – Change in the input feature distribution:  (D_{KL}(P_{train}(X) | P_{prod}(X)))  where (D_{KL}) is the Kullback‑Leibler divergence.
  • Concept Drift – Change in the relationship (P(Y|X)); often measured by a drop in validation AUC or by a statistical test on residuals.
  • Population Stability Index (PSI) – (\text{PSI}= \sum_{i=1}^{k} (p_i - q_i) \ln\frac{p_i}{q_i}) where (p_i) and (q_i) are bin proportions in train vs. production. PSI > 0.2 signals notable drift.
  • Retraining Trigger – Rule‑based or model‑based condition (e.g., if PSI > 0.2 or val_auc_drop > 0.05:) that launches a new training pipeline.
  • Feature Store – Centralized catalog (e.g., Feast, Tecton) that version‑controls feature definitions, stores offline batches, and serves online look‑ups.
  • Online Feature Retrieval – Low‑latency API (GET /features?ids=...) that returns pre‑computed vectors for inference; typically < 10 ms SLA.
  • Model Registry – Service (e.g., MLflow, SageMaker Model Registry) that tracks model versions, signatures, and stage (Staging → Production).
  • Canary Deployment – Gradual rollout (e.g., 5 % traffic) to compare new model metrics against the incumbent before full promotion.
  • Evidently AI Drift Detector – Open‑source library that computes PSI, KS‑test, and visualises feature drift with one‑line calls.


Step‑by‑Step / Process Flow

  1. Ingest & Store Features – Write raw data to a data lake, materialise nightly batch features with Spark, and register them in a feature store.
  2. Train Baseline & Register – Train a model (e.g., XGBClassifier) using the feature store’s offline API, log parameters, metrics, and the model artifact to a registry.
  3. Deploy with Monitoring Hooks – Push the model to a serving platform (Docker + FastAPI). Attach a monitoring agent that logs prediction histograms, latency, and PSI per feature.
  4. Detect Drift – Run a scheduled job (e.g., Airflow DAG) that pulls the latest production data, computes PSI/KS, and compares current validation AUC to the stored baseline.
  5. Trigger Retraining – If drift thresholds are crossed, automatically launch a new training pipeline (same code, new data) and register the candidate model.
  6. Canary & Promote – Deploy the candidate to a canary cohort, monitor live metrics, and if they improve ≥ X % (e.g., lift in churn‑recall), promote to Production.

Common Mistakes

  • Mistake: Only monitoring overall accuracy.
    Correction: Track distributional metrics (PSI, KS) and business KPIs (churn‑recall, revenue lift); accuracy can stay stable while the model silently degrades on a sub‑population.

  • Mistake: Hard‑coding feature transformations in the inference code.
    Correction: Centralise all preprocessing in the feature store so offline and online pipelines stay identical; version the feature definitions.

  • Mistake: Relying on a single drift threshold.
    Correction: Combine statistical tests (PSI, KS) with performance checks (AUC drop) and use a multi‑trigger policy to avoid false alarms.

  • Mistake: Retraining on the same stale data.
    Correction: Pull the latest production window (e.g., last 30 days) before each retrain; optionally augment with a rolling window to preserve long‑term trends.

  • Mistake: Deploying the new model without a canary.
    Correction: Use a canary rollout to compare live metrics before full promotion; this catches integration bugs and unexpected side‑effects.


Data Science Interview / Practical Insights

  1. “Explain the difference between data drift and concept drift.” – Expect you to cite covariate shift vs. change in (P(Y|X)) and give a concrete metric (PSI vs. AUC drop).
  2. “How would you design a monitoring system for a fraud‑detection model with a 0.1 % fraud rate?” – Look for discussion of precision‑recall curves, alert thresholds on recall, and imbalanced‑aware drift metrics (e.g., KS on the fraud score).
  3. “What are the pros and cons of a feature store versus embedding the feature pipeline in the model code?” – Mention reusability, consistency, lineage, and online latency as pros; note operational overhead as a con.
  4. “When would you choose a scheduled retraining vs. an event‑driven retraining?” – Scheduled is simple and guarantees freshness; event‑driven reacts faster to abrupt drift but requires robust drift detection logic.

Quick Check Questions

  1. Scenario: Your churn model’s PSI for “monthly_usage” jumps to 0.35, but validation AUC stays at 0.78.
    Answer: Investigate feature drift; the model may still be accurate overall, but a sub‑segment could be mis‑predicted – consider a targeted retrain or feature redesign.

  2. Scenario: Production latency spikes after deploying a new XGBoost model.
    Answer: Check the online feature retrieval path and model size; use model compression (e.g., tree pruning) or move heavy preprocessing to the feature store.

  3. Scenario: You have a feature store but notice the online API returns stale values for a week.
    Answer: Verify the feature materialisation schedule and ensure the online cache invalidates after each batch; add a health check alert for data freshness.


Last‑Minute Cram Sheet (10 one‑liners)

  1. ⚠️ PSI > 0.2 → strong data drift; > 0.1 → moderate drift.
  2. Feature Store = source‑of‑truth for both offline training and online inference.
  3. Canary rollout = “shadow traffic” + metric comparison before full promotion.
  4. Evidently AI: evidently.calculate_psi(train, prod, feature_name) (one‑line drift).
  5. Retraining trigger = if psi > 0.15 or val_auc_drop > 0.03: (simple rule).
  6. Model Registry stores: artifact, signature, stage, and lineage.
  7. Latency SLA < 10 ms for online feature fetch; batch SLA ≈ 1 h for nightly jobs.
  8. KS test p‑value < 0.05 → reject null that train & prod feature distributions are identical.
  9. Version features (feature_name_v2) instead of overwriting to keep reproducibility.
  10. ⚠️ Monitoring only “accuracy” misses drift; always log prediction distribution histograms.


ADVERTISEMENT