Fatskills
Practice. Master. Repeat.
Study Guide: Google Cloud Professional Machine Learning Engineer Certification Exam : Topics You Should Study
Source: https://www.fatskills.com/law/chapter/google-cloud-professional-machine-learning-engineer-certification-exam-topic-you-should-study

Google Cloud Professional Machine Learning Engineer Certification Exam : Topics You Should Study

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~6 min read

Topics to Study (and links)


- Recall, precision — get this picture into your head permanently (https://en.wikipedia.org/wiki/Precision_and_recall#/media/File:Precisionrecall.svg). Understand what are the implications of both. How will you increase either value? If you have the picture in mind, it will help you compute what you need.
— you also need to keep in mind the formula for both.
- Accuracy, F-score — similar to the recall and precision, remember accuracy also, including the formula. Just know what F-score is — you probably don’t need to remember the formula for it.
- Log Loss — (https://www.kaggle.com/dansbecker/what-is-log-loss)
- AUC ROC curve and its use. (https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc)
- Precision-Recall curve (less important but know what it is)
- Based on the formulae, what needs to change to improve recall, precision, accuracy, etc. Think in terms of both the formula and the graph for things like ROC.
- Pub/Sub, Dataflow and pipelines, Dataprep — If you’ve done the Data Engineer, there is no additional study to be done.
- BigQuery — This is the central datawarehouse for both storing and analyzing structured data. Again, if you’ve done the PDE, you’ll be fine on this section.
- Data cleaning and validation. Think about tools like Dataprep, Dataflow, Dataproc, and cleaning data with code.
- How to deal with missing data — do you remove it? do you compute it? In which scenarios would you take different approaches?
- Data Normalization — what is this and why would you use it? (https://developers.google.com/machine-learning/data-prep/transform/normalization)
- Transforming Data — numerical and categorical (https://developers.google.com/machine-learning/data-prep/transform/transform-numeric)
- Feature crossing (https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture)
- One-hot encoding, one-hot encoding hashed (https://developers.google.com/machine-learning/crash-course/feature-crosses/crossing-one-hot-vectors)
- Binning (https://developers.google.com/machine-learning/crash-course/representation/cleaning-data)
- Metadata management in Kubeflow (https://www.kubeflow.org/docs/components/metadata/)
- Basic TensorFlow (https://www.tensorflow.org/tutorials)
- TFRecords — when are they preferred over options like dataframes and csv files. (https://www.tensorflow.org/tutorials/load_data/tfrecord)
- How to optimize for data input? If data is not coming in fast enough, what can you do?
- How to optimize for data processing? If data is coming in fast, but processing is slow, what can you do?
- Use of tf.data.Dataset.prefetch/interleave/cache (https://www.tensorflow.org/guide/data_performance)
- How to work with sensitive data? (https://cloud.google.com/solutions/sensitive-data-and-ml-datasets)
- DLP (https://cloud.google.com/dlp)
- Different types of data encryption
— Format Preserving Encryption (https://en.wikipedia.org/wiki/Format-preserving_encryption)
— encryption with AES-264/
— salting
- BigQuery ML — BigQuery by itself is important. All the ML you can do with BQML is also very important. What all can you do with BQML? What are its limitations. (https://cloud.google.com/bigquery-ml/docs/introduction)
- BQML — Can it work with other ML libraries? Can you import models from tf, scipy, etc.?
- BQML available algorithms (https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create)
- AI Platform — what all can you do here? For what kinds of tasks would you look outside for other tools? (https://cloud.google.com/ai-platform)
- Basic algorithms — regression, classification, k-means. What kind of data does it classify? When would you use them?
- Containerization — learn the basics of docker and kubernetes. How do you create images? Why use containers at all?
- Kubeflow pipelines — why and how do you use containerization w.r.t. machine learning pipelines? (https://cloud.google.com/ai-hub/docs/kubeflow-pipeline)
- Which libraries can work with AI Platform and Kubeflow Pipelines? For other libraries, what should you do?
- Regularization methods — what is L0, L1, L2 regularization? When would you use them? (https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/l2-regularization)
- Develop the ability to look at a scenario and see what kind of regularization to apply.
- Develop the ability to look at a scenario and see what kind of algorithms to apply.
— don’t have to go too deep into any algorithm
- WhatIf Tool — when do you use it? How do you use it? How do you discover different outcomes? How do you conduct experiments? (https://pair-code.github.io/what-if-tool/)
- Explainable AI — what is this? When would you use this? w(https://cloud.google.com/explainable-ai)
- How can Kubeflow do scheduled trainings? Using external tools (https://cloud.google.com/solutions/machine-learning/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow-pipelines) and also using an built-in scheduler.
- When do you need to bring cloud composer? (https://cloud.google.com/composer)
- Which of the available options should you choose for different requirements — AutoML, pre-built APIs, AI Platform algorithms, custom algorithms?
- Know a little about Tensorflow Extended — I’m expecting the importance of this could grow in the future.
- Building ML pipelines — how do you construct a continuous, end to end pipelines starting from data ingestion and ending with making predictions?
- There are references to scenarios for CNNs and DNNs, but not too much details. So gather a general understanding of when you would use different algorithms or training methods.
- Various AutoML possibilities — Vision, Natural Language, Translation, Tables, Video Intelligence. When should you use the AutoML version and when no?
- How is the AutoML offering different from the API offering. For example, there is also Cloud Vision, Video AI, Cloud Natural Language. When would you use one or the other?
- Splitting training data into test and validation set. (https://developers.google.com/machine-learning/crash-course/training-and-test-sets/splitting-data)
- Issues encountered when splitting is not done correctly.
- Project management around machine learning — how to separate users into groups and projects. This is similar to what you encounter in ACE, PCA, and DevOps.
- Why and how to ensure that your model is current? Data and the model could be getting stale. You will need to create pipelines to ensure that they are updated continuously.
- Strategies for deploying models in production. This is similar to what you would do in DevOps. Know things like canary testing, A/B testing, going from model experimentation to testing to versioning to deployment.
- Hyperparameter tuning and available tools (https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview)
- Kinds of automl edge models (https://cloud.google.com/vision/automl/docs/train-edge)
- Formatting input for online prediction (https://cloud.google.com/ai-platform/prediction/docs/online-predict#formatting_your_input_for_online_prediction)
- Collaborative filtering, Feature/family filtering — what are they and where are they used? (https://en.wikipedia.org/wiki/Collaborative_filtering)
- Quantization
- Feature Attributions — Sampled Shapley, integrated gradients, and XRAI — when do you use these? (https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data, https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview#compare-methods)