By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
Logging, monitoring, and alerting are critical for maintaining reliability, performance, and explainability in ML pipelines. Without them, models can silently degrade, drift, or produce biased predictions—leading to costly failures. For example, in a real-time fraud detection system, you need to: - Log inference requests (who, when, what input, what prediction). - Monitor latency, error rates, and feature drift (e.g., sudden spikes in transaction amounts). - Alert when model confidence drops below a threshold (e.g., fraud probability < 0.7). - Explain why a transaction was flagged (e.g., "high transaction amount + unusual location") to comply with regulations like GDPR.
Google Cloud provides Cloud Logging (centralized logs), Cloud Monitoring (metrics, dashboards, alerts), and Vertex Explainable AI (model interpretability) to solve these challenges.
Scenario: You’re deploying a Vertex AI endpoint for a recommendation model. You need to log: - Inference requests (user ID, input features, timestamp). - Prediction responses (recommended items, confidence scores). - Errors (e.g., "Feature X missing").
Steps:1. Enable Cloud Logging for Vertex AI: - Go to Vertex AI > Model Deployments > [Your Endpoint] > Logs. - Enable "Request/Response Logging" (stores inputs/outputs in Cloud Logging).2. Custom Logs via Python SDK: ```python from google.cloud import logging logging_client = logging.Client() logger = logging_client.logger("vertex_ai_recommendations")
def predict(request): logger.log_struct({ "user_id": request.user_id, "features": request.features, "prediction": prediction, "timestamp": datetime.now().isoformat() }) return prediction ``3. Query Logs in Cloud Logging: - Filter with:resource.type="aiplatform.googleapis.com/Endpoint" jsonPayload.method="predict"`. - Export to BigQuery for long-term analysis.
``3. Query Logs in Cloud Logging: - Filter with:
Scenario: Your fraud detection model’s precision drops from 95% to 80%. You need to detect this automatically.
Steps:1. Enable Vertex AI Model Monitoring: - In Vertex AI > Model Monitoring, create a monitoring job. - Select: - Objective: "Prediction drift" or "Feature skew". - Baseline: Training data (or a recent time window). - Schedule: Hourly/daily.2. Set Alerting Policies in Cloud Monitoring: - Go to Cloud Monitoring > Alerting > Create Policy. - Condition: metric.type="aiplatform.googleapis.com/model_monitoring/drift" > threshold (e.g., PSI > 0.25). - Notification: Email/Slack/PagerDuty.3. Visualize in Dashboards: - Create a Cloud Monitoring dashboard with: - Latency (p99). - Error rate. - Drift metrics (PSI, KL divergence).
metric.type="aiplatform.googleapis.com/model_monitoring/drift"
Scenario: A bank’s loan approval model rejects an applicant. The applicant requests an explanation (GDPR "right to explanation").
Steps:1. Enable Explainability During Training: - In Vertex AI Training, set explanation_method="integrated-gradients" (or "sampled-shapley"). - Deploy the model with explanations enabled.2. Request Explanations at Inference: python from google.cloud import aiplatform endpoint = aiplatform.Endpoint("projects/PROJECT/locations/REGION/endpoints/ENDPOINT_ID") response = endpoint.explain(instances=[input_data]) print(response.explanations[0].attributions) # Feature importance3. Log Explanations for Compliance: - Store explanations in Cloud Logging or BigQuery for audits.
explanation_method="integrated-gradients"
python from google.cloud import aiplatform endpoint = aiplatform.Endpoint("projects/PROJECT/locations/REGION/endpoints/ENDPOINT_ID") response = endpoint.explain(instances=[input_data]) print(response.explanations[0].attributions) # Feature importance
"Vertex AI Model Monitoring vs. Vertex Explainable AI?"
Key Constraints:
Explainable AI supports tabular data and images (not text).
"Which Service?" Scenarios:
aiplatform.googleapis.com/TrainingJob
A fintech company’s fraud detection model is deployed on Vertex AI. They need to comply with GDPR and provide explanations for rejected transactions. Which GCP service should they use? ? Answer: Vertex Explainable AI (provides feature attributions for individual predictions). ? Why not Vertex AI Model Monitoring? That detects drift, not explanations.
A data scientist notices that their Vertex AI endpoint’s latency has increased from 100ms to 500ms. They want to identify the bottleneck. Which two GCP services should they use? ? Answer: Cloud Trace (distributed tracing) + Cloud Profiler (CPU/heap analysis). ? Why not Cloud Logging? Logs won’t show latency breakdowns.
A retail company’s recommendation model is experiencing feature drift (user behavior changed post-holiday season). They want to detect this automatically and alert the ML team. Which GCP service should they configure? ? Answer: Vertex AI Model Monitoring (tracks drift/skew) + Cloud Monitoring alerts. ? Why not Cloud Logging? Logs won’t calculate drift metrics.
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.