Home > Google Cloud Certified Professional Data Engineer > Quizzes > Google Certified Professional Data Engineer: Choosing Training and Serving Infrastructure
Google Certified Professional Data Engineer: Choosing Training and Serving Infrastructure
Fast practice, instant feedback. Timer auto-submits when time’s up.
Avg score: 14% Most missed: “A study of global climate change is building a network of environmental sensors …”
Google Certified Professional Data Engineer: Choosing Training and Serving Infrastructure
Time left 00:00
10 Questions

1. You are developing a machine learning model that will predict failures in high-precision machining equipment. The equipment has hundreds of IoT sensors that send telemetry data every second. Thousands of the machines are in use in a variety of operating conditions. A year’s worth of data is available for model training. You plan to use TensorFlow, a synchronous training strategy, and TPUs. Which of the following strategies would you use?
2. You have developed a TensorFlow model using only the most basic TensorFlow operations and no custom operations. You have a large volume of data available for training, but by your estimates it could take several weeks to train the model using a 16 vCPU Compute Engine instance. Which of the following should you try instead?
3. Your client has developed a machine learning model that detects anomalies in equity trading time-series data. The model runs as a service in a Google Kubernetes Engine (GKE) cluster deployed in the us-west-1 region. A number of financial institutions in New York and London are interested in licensing the technology, but they are concerned that the total time required to make a prediction is longer than they can tolerate. The distance between the serving infrastructure and New York is about 4,800 kilometers, and the distance to London is about 8,000 kilometers. This is an example of what kind of problem with serving a machine learning model?
4. You have developed a machine learning model that uses a specialized Fortran library that is optimized for highly parallel, high-precision arithmetic. You only have access to the compiled code and cannot make any changes to source code. You want to use an accelerator to reduce the training time of your model. Which of the following options would you try first?
5. In the Google Cloud Platform IoT reference model, which of the following GCP services is used for stream processing?
6. A study of global climate change is building a network of environmental sensors distributed across the globe. Sensors are deployed in groups of 12 sensors and a gateway. An analytics pipeline is implemented in GCP. Data will be ingested by Cloud Pub/Sub and analyzed using the stream processing capabilities of Cloud Dataflow. The analyzed data will be stored in BigQuery for further analysis by scientists. The bandwidth between the gateways and the GCP is limited and sometimes unreliable. The scientists have determined that they need the average temperature, pressure, and humidity measurements of each group of 12 sensors for a one-minute period. Each sensor sends data to the gateway every second. This generates 720 data points (12 sensors × 60 seconds) every minute for each of the three measurements. The scientists only need the one-minute average for temperature, pressure, and humidity. What data processing strategy would you implement?
7. A startup is developing a product for autonomous vehicle manufacturers that will enable its vehicles to detect objects better in adverse weather conditions. The product uses a machine learning model built on TensorFlow. Which of the following options would you choose to serve this model?
8. In the Google Cloud Platform IoT reference model, which of the following GCP services is used for ingestion?
9. Your DevOps team is deploying an IoT system to monitor and control environmental conditions in your building. You are using a standard IoT architecture. Which of the following components would you not use?
10. You are in the early stages of developing a machine learning model using a framework that requires high-precision arithmetic and benefits from massive parallelization. Your data set fits within 32 GB of memory. You want to use Jupyter Notebooks to build the model iteratively and analyze results. What kind of infrastructure would you use?