Fatskills
Practice. Master. Repeat.
Study Guide: Cloud ML - Google Cloud Professional Machine Learning Engineer: Training Options (Vertex AI Workbench, Custom Training, AutoML, BigQuery ML)
Source: https://www.fatskills.com/hesi/chapter/cloud-ml-cert-gcp-ml-training-options-vertex-ai-workbench-custom-training-automl-bigquery-ml

Cloud ML - Google Cloud Professional Machine Learning Engineer: Training Options (Vertex AI Workbench, Custom Training, AutoML, BigQuery ML)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~8 min read

GCP_ML – Training Options (Vertex AI Workbench, Custom Training, AutoML, BigQuery ML)

Google Cloud Professional Machine Learning Engineer – Study Guide: Training Options

(Vertex AI Workbench, Custom Training, AutoML, BigQuery ML)


What This Is

Training options in GCP determine how you build, tune, and deploy ML models—whether you need full control (custom training), speed (AutoML), or SQL-based simplicity (BigQuery ML). Real-world scenario: A retail chain wants to predict customer churn. Analysts use BigQuery ML to train a logistic regression model directly on transaction data (no code), while data scientists fine-tune a Vertex AI Custom Training job with PyTorch for higher accuracy. Meanwhile, marketers use AutoML Tables to quickly deploy a model without writing code.


Key Terms & Services

  • Vertex AI Workbench: GCP’s managed JupyterLab environment for ML development, with built-in integrations to BigQuery, Cloud Storage, and Vertex AI services. Best for interactive exploration, prototyping, and debugging.
  • Vertex AI Custom Training: Fully managed service to run containerized training jobs (TensorFlow, PyTorch, XGBoost) on GCP compute. Supports distributed training, hyperparameter tuning, and custom Docker images.
  • Vertex AI AutoML: No-code/low-code service to train high-quality models for vision, NLP, tabular data, and more. Automates feature engineering, model selection, and hyperparameter tuning.
  • BigQuery ML: SQL-based ML tool to create and deploy models directly in BigQuery. Supports regression, classification, clustering, and forecasting—ideal for analysts who know SQL but not Python.
  • Vertex AI Training with Pre-built Containers: GCP-provided Docker images for TensorFlow, PyTorch, and scikit-learn. Reduces setup time for custom training jobs.
  • Vertex AI Hyperparameter Tuning: Automated search for optimal hyperparameters (e.g., learning rate, batch size) using Bayesian optimization or grid/random search.
  • Vertex AI Pipelines: Orchestrates ML workflows (training, evaluation, deployment) using Kubeflow Pipelines or TensorFlow Extended (TFX). Ensures reproducibility and CI/CD for ML.
  • Cloud Storage (GCS): Object storage for training data, model artifacts, and checkpoints. Required for all Vertex AI training jobs.
  • AI Platform (Legacy): Older GCP ML service (now part of Vertex AI). Exam trap: Know that Vertex AI replaced AI Platform, but some questions may still reference it.
  • Distributed Training: Scaling training across multiple GPUs/TPUs (e.g., using tf.distribute.Strategy or PyTorch’s DataParallel). Supported in Vertex AI Custom Training.
  • Model Explainability: Tools like Vertex AI Explainable AI to interpret model predictions (e.g., feature importance, SHAP values). Often tested in exam scenarios about compliance or debugging.

Step-by-Step / Process Flow

1. Choosing the Right Training Option

Use Case Service Why?
Quick prototyping in SQL BigQuery ML Analysts can train models with CREATE MODEL statements.
No-code/low-code ML Vertex AI AutoML Business users train models without writing Python.
Full control over training Vertex AI Custom Training Data scientists need custom architectures (e.g., transformers, CNNs).
Interactive development Vertex AI Workbench JupyterLab environment with GCP integrations (BigQuery, GCS, Vertex AI).

2. Training a Model with Vertex AI Custom Training

  1. Prepare data: Upload training/evaluation data to Cloud Storage (GCS) in a supported format (CSV, TFRecord, Parquet).
  2. Write training code: Use a framework (TensorFlow, PyTorch) and containerize it (or use a pre-built container).
  3. Define job specs: In the GCP Console or gcloud, specify:
  4. Machine type (e.g., n1-standard-4 for CPU, n1-standard-16 + GPU for deep learning).
  5. Container image (custom or pre-built).
  6. Hyperparameters (passed as command-line args or via Vertex AI Hyperparameter Tuning).
  7. Submit the job: Run via gcloud ai custom-jobs create or the Console. Vertex AI handles provisioning and scaling.
  8. Monitor training: View logs in Cloud Logging and metrics in Vertex AI Experiments.
  9. Deploy model: Register the trained model in Vertex AI Model Registry and deploy to an endpoint.

3. Training a Model with AutoML

  1. Upload data: Import data from BigQuery, GCS, or local files into a Vertex AI Dataset.
  2. Select model type: Choose AutoML Tables, Vision, NLP, or Video based on data type.
  3. Configure training: Set budget (node-hours) and optional test/train split.
  4. Train: Vertex AI automatically:
  5. Performs feature engineering (e.g., embeddings for text, resizing for images).
  6. Tests multiple architectures (e.g., EfficientNet for images, BERT for text).
  7. Optimizes hyperparameters.
  8. Evaluate: Review metrics (precision, recall, AUC) in the Console.
  9. Deploy: One-click deployment to a Vertex AI Endpoint for online predictions.

4. Training a Model with BigQuery ML

  1. Write SQL: Use CREATE MODEL to train directly on BigQuery tables: sql CREATE MODEL `project.dataset.churn_model` OPTIONS( model_type='LOGISTIC_REG', input_label_cols=['churned'] ) AS SELECT * FROM `project.dataset.customer_data`;
  2. Evaluate: Run ML.EVALUATE to check metrics: sql SELECT * FROM ML.EVALUATE(MODEL `project.dataset.churn_model`);
  3. Predict: Use ML.PREDICT for batch inference: sql SELECT * FROM ML.PREDICT(MODEL `project.dataset.churn_model`, (SELECT * FROM `project.dataset.new_customers`));
  4. Export (optional): Export the model to GCS for deployment in Vertex AI.

Common Mistakes

Mistake 1: Using AutoML for Large-Scale Custom Models

  • Mistake: Assuming AutoML can handle complex architectures (e.g., transformers with 1B+ parameters) or custom loss functions.
  • Correction: AutoML is for no-code/low-code use cases. For custom models, use Vertex AI Custom Training with a custom container.

Mistake 2: Ignoring Data Format Requirements

  • Mistake: Uploading data in unsupported formats (e.g., JSON for AutoML Tables, which only accepts CSV/BigQuery).
  • Correction: Check GCP’s data format docs. For AutoML Tables, use CSV or BigQuery; for Custom Training, use TFRecord/Parquet for better performance.

Mistake 3: Overlooking Cost Controls in AutoML

  • Mistake: Letting AutoML run for 100+ node-hours without setting a budget, leading to unexpected costs.
  • Correction: Always set a node-hour budget in AutoML (e.g., 20 hours). Monitor costs in Cloud Billing.

Mistake 4: Using BigQuery ML for Real-Time Predictions

  • Mistake: Trying to use ML.PREDICT for low-latency inference (e.g., fraud detection).
  • Correction: BigQuery ML is for batch predictions. For real-time, deploy the model to a Vertex AI Endpoint or use Cloud Functions with the BigQuery ML model exported to GCS.

Mistake 5: Not Leveraging Vertex AI Workbench for Debugging

  • Mistake: Debugging training jobs by SSH’ing into VMs or manually checking logs.
  • Correction: Use Vertex AI Workbench for interactive debugging. It integrates with Cloud Logging and Vertex AI Experiments for metrics.

Certification Exam Insights

1. Service Selection Traps

  • AutoML vs. Custom Training:
  • AutoML is for business users or quick prototyping (e.g., marketing teams predicting customer lifetime value).
  • Custom Training is for data scientists needing control (e.g., fine-tuning a BERT model for legal documents).
  • BigQuery ML vs. Vertex AI:
  • BigQuery ML is for SQL users who want to train models without leaving BigQuery (e.g., analysts predicting sales).
  • Vertex AI is for scalable, production-grade ML (e.g., deploying a model behind an API).

2. Key Constraints

  • AutoML Limits:
  • Max 100 node-hours per training job (can be increased via support).
  • No custom loss functions or architectures.
  • BigQuery ML Limits:
  • No GPU support (CPU-only).
  • Model size limit (~100MB for most models).
  • Custom Training Costs:
  • GPU/TPU costs can add up quickly. Use preemptible VMs for cost savings (but jobs may be interrupted).

3. Tricky Scenarios

  • Scenario: A team needs to train a model on sensitive healthcare data with HIPAA compliance. Which service?
  • Answer: Vertex AI Custom Training (supports VPC-SC and CMEK for encryption). AutoML and BigQuery ML may not meet compliance requirements for all use cases.
  • Scenario: A company wants to retrain a model weekly with new data. Which service supports automation?
  • Answer: Vertex AI Pipelines (orchestrates training, evaluation, and deployment). BigQuery ML can be automated via Cloud Scheduler + Cloud Functions, but Vertex AI Pipelines is more robust.

4. "Which Service?" Questions

  • Question: A data analyst wants to train a model on a 100GB BigQuery table without writing Python. Which service?
  • Answer: BigQuery ML (SQL-based training).
  • Question: A startup needs to deploy a model in 1 day with minimal ML expertise. Which service?
  • Answer: Vertex AI AutoML (no-code training and deployment).

Quick Check Questions

Question 1

A retail company wants to predict customer churn using a logistic regression model on a BigQuery table. The team consists of SQL analysts with no Python experience. Which GCP service should they use? - A) Vertex AI AutoML Tables - B) Vertex AI Custom Training - C) BigQuery ML - D) Vertex AI Workbench

Answer: C) BigQuery ML – Best for SQL users training models directly on BigQuery data.


Question 2

A data scientist needs to fine-tune a PyTorch model with distributed training across 4 GPUs. Which GCP service should they use? - A) Vertex AI AutoML - B) Vertex AI Custom Training - C) BigQuery ML - D) Vertex AI Workbench

Answer: B) Vertex AI Custom Training – Supports custom containers, distributed training, and GPU scaling.


Question 3

A marketing team wants to quickly deploy a model to predict customer lifetime value (CLV) using a CSV file with 50K rows. They have no ML expertise. Which service should they use? - A) Vertex AI Custom Training - B) Vertex AI AutoML Tables - C) BigQuery ML - D) Vertex AI Pipelines

Answer: B) Vertex AI AutoML Tables – No-code solution for tabular data with one-click deployment.


Last-Minute Cram Sheet

  1. Vertex AI Workbench = Managed JupyterLab for interactive ML development.
  2. Vertex AI Custom Training = Full control over training (custom containers, GPUs, distributed training).
  3. Vertex AI AutoML = No-code/low-code training for vision, NLP, and tabular data. Max 100 node-hours per job.
  4. BigQuery ML = Train models in SQL (no Python). No GPU support.
  5. Pre-built containers = GCP-provided Docker images for TensorFlow, PyTorch, scikit-learn.
  6. Hyperparameter tuning = Use Vertex AI Hyperparameter Tuning (Bayesian optimization).
  7. Data formats:
  8. AutoML Tables: CSV or BigQuery.
  9. Custom Training: TFRecord/Parquet (better performance).
  10. Cost traps:
  11. AutoML node-hours can get expensive (set a budget).
  12. GPU/TPU costs in Custom Training (use preemptible VMs for savings).
  13. Deployment:
  14. BigQuery ML models can be exported to GCS for Vertex AI deployment.
  15. AutoML models deploy to Vertex AI Endpoints with one click.
  16. Exam trap: Vertex AI replaced AI Platform—don’t pick legacy services!