By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
MLOps (ML Operations) is the set of practices that keep a machine‑learning model reliable after it leaves the notebook. It covers continuous monitoring, detecting data or concept drift, triggering automated retraining, and managing features in a centralized feature store. In production, a churn‑prediction model that scores millions of customers each day must stay accurate even as buying habits change; MLOps supplies the guardrails that catch degradation early and refresh the model without manual firefighting.
if PSI > 0.2 or val_auc_drop > 0.05:
GET /features?ids=...
XGBClassifier
Mistake: Only monitoring overall accuracy. Correction: Track distributional metrics (PSI, KS) and business KPIs (churn‑recall, revenue lift); accuracy can stay stable while the model silently degrades on a sub‑population.
Mistake: Hard‑coding feature transformations in the inference code. Correction: Centralise all preprocessing in the feature store so offline and online pipelines stay identical; version the feature definitions.
Mistake: Relying on a single drift threshold. Correction: Combine statistical tests (PSI, KS) with performance checks (AUC drop) and use a multi‑trigger policy to avoid false alarms.
Mistake: Retraining on the same stale data. Correction: Pull the latest production window (e.g., last 30 days) before each retrain; optionally augment with a rolling window to preserve long‑term trends.
Mistake: Deploying the new model without a canary. Correction: Use a canary rollout to compare live metrics before full promotion; this catches integration bugs and unexpected side‑effects.
Scenario: Your churn model’s PSI for “monthly_usage” jumps to 0.35, but validation AUC stays at 0.78. Answer: Investigate feature drift; the model may still be accurate overall, but a sub‑segment could be mis‑predicted – consider a targeted retrain or feature redesign.
Scenario: Production latency spikes after deploying a new XGBoost model. Answer: Check the online feature retrieval path and model size; use model compression (e.g., tree pruning) or move heavy preprocessing to the feature store.
Scenario: You have a feature store but notice the online API returns stale values for a week. Answer: Verify the feature materialisation schedule and ensure the online cache invalidates after each batch; add a health check alert for data freshness.
evidently.calculate_psi(train, prod, feature_name)
if psi > 0.15 or val_auc_drop > 0.03:
feature_name_v2
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.