Latest 2025 Realistic Verified Professional-Machine-Learning-Engineer Dumps - 100% Free Professional-Machine-Learning-Engineer Exam Dumps
Get 2025 Updated Free Google Professional-Machine-Learning-Engineer Exam Questions and Answer
NEW QUESTION # 23
You recently created a new Google Cloud Project After testing that you can submit a Vertex Al Pipeline job from the Cloud Shell, you want to use a Vertex Al Workbench user-managed notebook instance to run your code from that instance You created the instance and ran the code but this time the job fails with an insufficient permissions error. What should you do?
- A. Ensure that the Vertex Al Workbench instance is assigned the Identity and Access Management (1AM) Notebooks Runner role.
- B. Ensure that the Workbench instance that you created is in the same region of the Vertex Al Pipelines resources you will use.
- C. Ensure that the Vertex Al Workbench instance is assigned the Identity and Access Management (1AM) Vertex Al User rote.
- D. Ensure that the Vertex Al Workbench instance is on the same subnetwork of the Vertex Al Pipeline resources that you will use.
Answer: C
NEW QUESTION # 24
Your organization's call center has asked you to develop a model that analyzes customer sentiments in each call. The call center receives over one million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region in which the call originated, and no Personally Identifiable Information (Pll) can be stored or analyzed. The data science team has a third-party tool for visualization and access which requires a SQL ANSI-2011 compliant interface. You need to select components for data processing and for analytics. How should the data pipeline be designed?
- A. 1 = Cloud Function, 2 = Cloud SQL
- B. 1 = Dataflow, 2 = Cloud SQL
- C. 1 = Dataflow, 2 = BigQuery
- D. 1 = Pub/Sub, 2 = Datastore
Answer: A
NEW QUESTION # 25
Your team frequently creates new ML models and runs experiments. Your team pushes code to a single repository hosted on Cloud Source Repositories. You want to create a continuous integration pipeline that automatically retrains the models whenever there is any modification of the code. What should be your first step to set up the CI pipeline?
- A. Configure a Cloud Function that builds the repository each time a new branch is created.
- B. Configure a Cloud Build trigger with the event set as "Pull Request"
- C. Configure a Cloud Function that builds the repository each time there is a code change.
- D. Configure a Cloud Build trigger with the event set as "Push to a branch"
Answer: D
Explanation:
According to the web search results, Cloud Build1 is a service that executes your builds on Google Cloud Platform infrastructure. Cloud Build can import source code from Cloud Source Repositories2, Cloud Storage, GitHub, Bitbucket, or any publicly hosted Git repository. Cloud Build allows you tocreate and manage build triggers, which are automated workflows that run whenever a code change is pushed to your source repository.
You can use Cloud Build triggers to automatically retrain your ML models whenever there is any modification of the code. Therefore, option B is the best way to set up the CI pipeline for the given use case, as it allows you to configure a Cloud Build trigger with the event set as "Push to a branch", which means the trigger will run whenever a new commit is pushed to a specific branch of your source repository. The other options are not relevant or optimal for this scenario. References:
* Cloud Build
* Cloud Source Repositories
* Google Professional Machine Learning Certification Exam 2023
* Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
NEW QUESTION # 26
You recently trained an XGBoost model on tabular data You plan to expose the model for internal use as an HTTP microservice After deployment you expect a small number of incoming requests. You want to productionize the model with the least amount of effort and latency. What should you do?
- A. Use a prebuilt XGBoost Vertex container to create a model and deploy it to Vertex Al Endpoints.
- B. Build a Flask-based app Package the app in a custom container on Vertex Al and deploy it to Vertex Al Endpoints.
- C. Deploy the model to BigQuery ML by using CREATE model with the
BOOSTED-THREE-REGRESSOR statement and invoke the BigQuery API from the microservice. - D. Build a Flask-based app Package the app in a Docker image and deploy it to Google Kubernetes Engine in Autopilot mode.
Answer: A
Explanation:
XGBoost is a popular open-source library that provides a scalable and efficient implementation of gradient boosted trees. You can use XGBoost to train a classification or regression model on tabular data. You can also use Vertex AI to productionize the model and expose it for internal use as an HTTP microservice. Vertex AI is a service that allows you to create and train ML models using Google Cloud technologies. You can use a prebuilt XGBoost Vertex container to create a model and deploy it to Vertex AI Endpoints. A prebuilt Vertex container is a container image that contains the dependencies and libraries needed to run a specific ML framework, such as XGBoost. You can use a prebuilt Vertex container to simplify the model creation and deployment process, without having to build your own custom container. Vertex AI Endpoints is a service that allows you to serve your ML models online and scale them automatically. You can use Vertex AI Endpoints to deploy the model from the prebuilt Vertex container and expose it as an HTTP microservice. You can also configure the endpoint to handle a small number of incoming requests, and optimize the latency and cost of serving the model. By using a prebuilt XGBoost Vertex container and Vertex AI Endpoints, you can productionize the model with the least amount of effort and latency. References:
* XGBoost documentation
* Vertex AI documentation
* Prebuilt Vertex container documentation
* Vertex AI Endpoints documentation
* Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
NEW QUESTION # 27
You need to analyze user activity data from your company's mobile applications. Your team will use BigQuery for data analysis, transformation, and experimentation with ML algorithms. You need to ensure real-time ingestion of the user activity data into BigQuery. What should you do?
- A. Run a Dataflow streaming job to ingest the data into BigQuery.
- B. Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery,
- C. Configure Pub/Sub to stream the data into BigQuery.
- D. Run an Apache Spark streaming job on Dataproc to ingest the data into BigQuery.
Answer: A
Explanation:
The best option to ensure real-time ingestion of the user activity data into BigQuery is to run a Dataflow streaming job to ingest the data into BigQuery. Dataflow is a fully managed service that can handle both batch and stream processing of data, and can integrate seamlessly with BigQuery and other Google Cloud services. Dataflow can also use Apache Beam as the programming model, which provides a unified and portable API for developing data pipelines. By using Dataflow, you can avoid the complexity and overhead of managing your own infrastructure, and focus on the logic and transformation of your data. Dataflow can also handle various types of data, such as structured, unstructured, or binary data, and can apply windowing, aggregation, and other operations on the data streams.
The other options are not optimal for the following reasons:
A . Configuring Pub/Sub to stream the data into BigQuery is not a good option, as Pub/Sub is a messaging service that can publish and subscribe to data streams, but cannot perform any transformation or processing on the data. Pub/Sub can be used as a source or a sink for Dataflow, but not as a standalone solution for ingesting data into BigQuery.
B . Running an Apache Spark streaming job on Dataproc to ingest the data into BigQuery is not a good option, as it requires setting up and managing your own cluster of virtual machines, which can increase the cost and complexity of your solution. Moreover, Apache Spark is not natively integrated with BigQuery, and requires using connectors or intermediate storage to write data to BigQuery, which can introduce latency and inefficiency.
D . Configuring Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery is not a bad option, but it is not necessary, as Dataflow can directly read data from the mobile applications without using Pub/Sub as an intermediary. Using Pub/Sub can add an extra layer of abstraction and reliability, but it can also increase the cost and complexity of your solution, and introduce some delay in the data ingestion.
Reference:
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate Google Cloud launches machine learning engineer certification Dataflow documentation BigQuery documentation
NEW QUESTION # 28
You are developing an ML model that predicts the cost of used automobiles based on data such as location, condition model type color, and engine-'battery efficiency. The data is updated every night Car dealerships will use the model to determine appropriate car prices. You created a Vertex Al pipeline that reads the data splits the data into training/evaluation/test sets performs feature engineering trains the model by using the training dataset and validates the model by using the evaluation dataset. You need to configure a retraining workflow that minimizes cost What should you do?
- A. Compare the results to the evaluation results from a previous run If the performance improved deploy the model to a Vertex Al endpoint Configure a cron job to redeploy the pipeline every night.
- B. Compare the training and evaluation losses of the current run If the losses are similar deploy the model to a Vertex Al endpoint with training/serving skew threshold model monitoring When the model monitoring threshold is tnggered redeploy the pipeline.
- C. Compare the training and evaluation losses of the current run If the losses are similar, deploy the model to a Vertex AI endpoint Configure a cron job to redeploy the pipeline every night.
- D. Compare the results to the evaluation results from a previous run If the performance improved deploy the model to a Vertex Al endpoint with training/serving skew threshold model monitoring. When the model monitoring threshold is triggered, redeploy the pipeline.
Answer: D
NEW QUESTION # 29
You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an "Out of Memory" error. What should you do?
- A. Apply for a quota increase for the number of prediction requests.
- B. Use base64 to encode your data before using it for prediction.
- C. Send the request again with a smaller batch of instances.
- D. Use batch prediction mode instead of online mode.
Answer: B
NEW QUESTION # 30
You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (Pll). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?
- A. Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption
- B. Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.
- C. Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.
- D. Before training, use BigQuery to select only the columns that do not contain sensitive data Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.
Answer: A
Explanation:
The best option for reducing the sensitivity of the dataset before training the model is to use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption. This option allows you to keep every column in the dataset, while protecting the sensitive data from unauthorized access or exposure. The Cloud DLP API can detect and classify various types of sensitive data, such as names, email addresses, phone numbers, credit card numbers, and more1. Dataflow can create scalable and reliable pipelines to process large volumes of data from BigQuery and other sources2. Format Preserving Encryption (FPE) is a technique that encrypts sensitive data while preserving its original format and length, which can help maintain the utility and validity of the data3.
By using Dataflow with the DLP API, you can apply FPE to the sensitive values in the dataset, and store the encrypted data in BigQuery or another destination. You can also use the same pipeline to decrypt the data when needed, by using the same encryption key and method4.
The other options are not as suitable as option B, for the following reasons:
* Option A: Using Dataflow to ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column, would reduce the sensitivity of the data, but also the utility and accuracy of the data. Randomization is a technique that replaces sensitive data with random values, which can prevent re-identification of the data, but also distort the distribution and relationships of the data3. This can affect the performance and quality of the ML model, especially if every column is critical to the model.
* Option C: Using the Cloud DLP API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt, would reduce the sensitivity of the data, but also the utility and validity of the data. AES-256 is a symmetric encryption algorithm that uses a 256-bit key to encrypt and decrypt data. A salt is a random value that is added to the data before encryption, to increase the randomness and security of the encrypted data. However, AES-256 does not preserve the format or length of the original data, which can cause problems when storing or processing the data. For example, if the original data is a 10-digit phone number, AES-256 would produce a much longer and different string, which can break the schema or logic of the dataset3.
* Option D: Before training, using BigQuery to select only the columns that do not contain sensitive data, and creating an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals, would reduce the exposure of the sensitive data, but also the completeness and relevance of the data. An authorized view is a BigQuery view that allows you to share query results with particular users or groups, without giving them access to the underlying tables. However, this option assumes that you can identify the columns that do not contain sensitive data, which may not be easy or accurate.
Moreover, this option would remove some columns from the dataset, which can affect the performance and quality of the ML model, especially if every column is critical to the model.
References:
* Preparing for Google Cloud Certification: Machine Learning Engineer, Course 5: Responsible AI, Week
2: Privacy
* Google Cloud Professional Machine Learning Engineer Exam Guide, Section 5: Developing responsible AI solutions, 5.2 Implementing privacy techniques
* Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 9:
Responsible AI, Section 9.4: Privacy
* De-identification techniques
* Cloud Data Loss Prevention (DLP) API
* Dataflow
* Using Dataflow and Sensitive Data Protection to securely tokenize and import data from a relational database to BigQuery
* [AES encryption]
* [Salt (cryptography)]
* [Authorized views]
NEW QUESTION # 31
You are an ML engineer at a global car manufacturer. You need to build an ML model to predict car sales in different cities around the world. Which features or feature crosses should you use to train city-specific relationships between car type and number of sales?
- A. Two feature crosses as a element-wise product the first between binned latitude and one-hot encoded car type, and the second between binned longitude and one-hot encoded car type
- B. Three individual features binned latitude, binned longitude, and one-hot encoded car type
- C. One feature obtained as an element-wise product between latitude, longitude, and car type
- D. One feature obtained as an element-wise product between binned latitude, binned longitude, and one- hot encoded car type
Answer: D
Explanation:
A feature cross is a synthetic feature that is obtained by combining two or more existing features, usually by taking their product or concatenation. A feature cross can help to capture the nonlinear and interaction effects between the original features, and improve the predictive performance of the model. A feature cross can be applied to different types of features, such as numeric, categorical, or geospatial features1.
For the use case of building an ML model to predict car sales in different cities around the world, the best option is to use one feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type. This option involves creating a feature cross that combines three individual features: binned latitude, binned longitude, and one-hot encoded car type. Binning is a technique that transforms a continuous numeric feature into a discrete categorical feature by dividing its range into equal intervals, or bins. One-hot encoding is a technique that transforms a categorical feature into a binary vector, where each element corresponds to a possible category, and has a value of 1 if the feature belongs to that category, and 0 otherwise. By applying binning and one-hot encoding to the latitude, longitude, and car type features, the feature cross can capture the city-specific relationships between car type and number of sales, as each combination of bins and car types can represent a different city and its preference for a certain car type.
For example, the feature cross can learn that a city with a latitude bin of [40, 50], a longitude bin of [-80, -70], and a car type of SUV has a higher number of sales than a city with a latitude bin of [-10, 0], a longitude bin of [10, 20], and a car type of sedan. Therefore, using one feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type is the best option for this use case.
References:
* Feature Crosses | Machine Learning Crash Course
NEW QUESTION # 32
You need to develop a custom TensorRow model that will be used for online predictions. The training data is stored in BigQuery. You need to apply instance-level data transformations to the data for model training and serving. You want to use the same preprocessing routine during model training and serving. How should you configure the preprocessing routine?
- A. Create a pipeline in Vertex Al Pipelines to read the data from BigQuery and preprocess it using a custom preprocessing component.
- B. Create a preprocessing function that reads and transforms the data from BigQuery Create a Vertex Al custom prediction routine that calls the preprocessing function at serving time.
- C. Create a BigQuery script to preprocess the data, and write the result to another BigQuery table.
- D. Create an Apache Beam pipeline to read the data from BigQuery and preprocess it by using TensorFlow Transform and Dataflow.
Answer: D
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to "design, build, and productionalize ML models to solve business challenges using Google Cloud technologies". TensorFlow Transform2 is a library for preprocessing data with TensorFlow. TensorFlow Transform enables you to define and execute distributed pre-processing or feature engineering functions on large data sets, and then export the same functions as a TensorFlow graph for re-use during training or serving. TensorFlow Transform can handle both instance-level and full-pass data transformations. Apache Beam3 is an open source framework for building scalable and portable data pipelines. Apache Beam supports both batch and streaming data processing. Dataflow4 is a fully managed service for running Apache Beam pipelines on Google Cloud. Dataflow handles the provisioning and management of the compute resources, as well as the optimization and execution of the pipelines. Therefore, option D is the best way to configure the preprocessing routine for the given use case, as it allows you to use the same preprocessing logic during model training and serving, and leverage the scalability and performance of Dataflow. The other options are not relevant or optimal for this scenario. Reference:
Professional ML Engineer Exam Guide
TensorFlow Transform
Apache Beam
Dataflow
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
NEW QUESTION # 33
You are developing an image recognition model using PyTorch based on ResNet50 architecture. Your code is working fine on your local laptop on a small subsample. Your full dataset has 200k labeled images You want to quickly scale your training workload while minimizing cost. You plan to use 4 V100 GPUs. What should you do? (Choose Correct Answer and Give Reference and Explanation)
- A. Package your code with Setuptools. and use a pre-built container Train your model with Vertex Al using a custom tier that contains the required GPUs.
- B. Configure a Compute Engine VM with all the dependencies that launches the training Train your model with Vertex Al using a custom tier that contains the required GPUs.
- C. Create a Vertex Al Workbench user-managed notebooks instance with 4 V100 GPUs, and use it to train your model
- D. Create a Google Kubernetes Engine cluster with a node pool that has 4 V100 GPUs Prepare and submit a TFJob operator to this node pool.
Answer: A
Explanation:
The best option for scaling the training workload while minimizing cost is to package the code with Setuptools, and use a pre-built container. Train the model with Vertex AI using a custom tier that contains the required GPUs. This option has the following advantages:
It allows the code to be easily packaged and deployed, as Setuptools is a Python tool that helps to create and distribute Python packages, and pre-built containers are Docker images that contain all the dependencies and libraries needed to run the code. By packaging the code with Setuptools, and using a pre-built container, you can avoid the hassle and complexity of building and maintaining your own custom container, and ensure the compatibility and portability of your code across different environments.
It leverages the scalability and performance of Vertex AI, which is a fully managed service that provides various tools and features for machine learning, such as training, tuning, serving, and monitoring. By training the model with Vertex AI, you can take advantage of the distributed and parallel training capabilities of Vertex AI, which can speed up the training process and improve the model quality. Vertex AI also supports various frameworks and models, such as PyTorch and ResNet50, and allows you to use custom containers and custom tiers to customize your training configuration and resources.
It reduces the cost and complexity of the training process, as Vertex AI allows you to use a custom tier that contains the required GPUs, which can optimize the resource utilization and allocation for your training job. By using a custom tier that contains 4 V100 GPUs, you can match the number and type of GPUs that you plan to use for your training job, and avoid paying for unnecessary or underutilized resources. Vertex AI also offers various pricing options and discounts, such as per-second billing, sustained use discounts, and preemptible VMs, that can lower the cost of the training process.
The other options are less optimal for the following reasons:
Option A: Configuring a Compute Engine VM with all the dependencies that launches the training. Train the model with Vertex AI using a custom tier that contains the required GPUs, introduces additional complexity and overhead. This option requires creating and managing a Compute Engine VM, which is a virtual machine that runs on Google Cloud. However, using a Compute Engine VM to launch the training may not be necessary or efficient, as it requires installing and configuring all the dependencies and libraries needed to run the code, and maintaining and updating the VM. Moreover, using a Compute Engine VM to launch the training may incur additional cost and latency, as it requires paying for the VM usage and transferring the data and the code between the VM and Vertex AI.
Option C: Creating a Vertex AI Workbench user-managed notebooks instance with 4 V100 GPUs, and using it to train the model, introduces additional cost and risk. This option requires creating and managing a Vertex AI Workbench user-managed notebooks instance, which is a service that allows you to create and run Jupyter notebooks on Google Cloud. However, using a Vertex AI Workbench user-managed notebooks instance to train the model may not be optimal or secure, as it requires paying for the notebooks instance usage, which can be expensive and wasteful, especially if the notebooks instance is not used for other purposes. Moreover, using a Vertex AI Workbench user-managed notebooks instance to train the model may expose the model and the data to potential security or privacy issues, as the notebooks instance is not fully managed by Google Cloud, and may be accessed or modified by unauthorized users or malicious actors.
Option D: Creating a Google Kubernetes Engine cluster with a node pool that has 4 V100 GPUs. Prepare and submit a TFJob operator to this node pool, introduces additional complexity and cost. This option requires creating and managing a Google Kubernetes Engine cluster, which is a fully managed service that runs Kubernetes clusters on Google Cloud. Moreover, this option requires creating and managing a node pool that has 4 V100 GPUs, which is a group of nodes that share the same configuration and resources. Furthermore, this option requires preparing and submitting a TFJob operator to this node pool, which is a Kubernetes custom resource that defines a TensorFlow training job. However, using Google Kubernetes Engine, node pool, and TFJob operator to train the model may not be necessary or efficient, as it requires configuring and maintaining the cluster, the node pool, and the TFJob operator, and paying for their usage. Moreover, using Google Kubernetes Engine, node pool, and TFJob operator to train the model may not be compatible or scalable, as they are designed for TensorFlow models, not PyTorch models, and may not support distributed or parallel training.
Reference:
[Vertex AI: Training with custom containers]
[Vertex AI: Using custom machine types]
[Setuptools documentation]
[PyTorch documentation]
[ResNet50 | PyTorch]
NEW QUESTION # 34
You work at an organization that maintains a cloud-based communication platform that integrates conventional chat, voice, and video conferencing into one platform. The audio recordings are stored in Cloud Storage. All recordings have an 8 kHz sample rate and are more than one minute long. You need to implement a new feature in the platform that will automatically transcribe voice call recordings into a text for future applications, such as call summarization and sentiment analysis. How should you implement the voice call transcription feature following Google-recommended best practices?
- A. Upsample the audio recordings to 16 kHz. and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
- B. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
- C. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
- D. Upsample the audio recordings to 16 kHz. and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
Answer: A
NEW QUESTION # 35
Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: ['driversjicense', 'passport', 'credit_card']. Which loss function should you use?
- A. Categorical cross-entropy
- B. Categorical hinge
- C. Binary cross-entropy
- D. Sparse categorical cross-entropy
Answer: A
Explanation:
- **Categorical entropy** is better to use when you want to **prevent the model from giving more importance to a certain class**. Or if the **classes are very unbalanced** you will get a better result by using Categorical entropy.
- But **Sparse Categorical Entropy** is a more optimal coice if you have a huge amount of classes, enough to make a lot of memory usage, so since sparse categorical entropy uses less columns it **uses less memory**.
https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other
NEW QUESTION # 36
You deployed an ML model into production a year ago. Every month, you collect all raw requests that were sent to your model prediction service during the previous month. You send a subset of these requests to a human labeling service to evaluate your model's performance. After a year, you notice that your model's performance sometimes degrades significantly after a month, while other times it takes several months to notice any decrease in performance. The labeling service is costly, but you also need to avoid large performance degradations. You want to determine how often you should retrain your model to maintain a high level of performance while minimizing cost. What should you do?
- A. Run training-serving skew detection batch jobs every few days to compare the aggregate statistics of the features in the training dataset with recent serving data. If skew is detected, send the most recent serving data to the labeling service.
- B. Compare the cost of the labeling service with the lost revenue due to model performance degradation over the past year. If the lost revenue is greater than the cost of the labeling service, increase the frequency of model retraining; otherwise, decrease the model retraining frequency.
- C. Identify temporal patterns in your model's performance over the previous year. Based on these patterns, create a schedule for sending serving data to the labeling service for the next year.
- D. Train an anomaly detection model on the training dataset, and run all incoming requests through this model. If an anomaly is detected, send the most recent serving data to the labeling service.
Answer: D
NEW QUESTION # 37
A Machine Learning team uses Amazon SageMaker to train an Apache MXNet handwritten digit classifier model using a research dataset. The team wants to receive a notification when the model is overfitting.
Auditors want to view the Amazon SageMaker log activity report to ensure there are no unauthorized API calls.
What should the Machine Learning team do to address the requirements with the least amount of code and fewest steps?
- A. Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Set up Amazon SNS to receive a notification when the model is overfitting
- B. Implement an AWS Lambda function to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting.
- C. Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting.
- D. Implement an AWS Lambda function to log Amazon SageMaker API calls to AWS CloudTrail. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting.
Answer: D
NEW QUESTION # 38
You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?
- A. Convert the images Into TFRecords, store the images in Cloud Storage, and then use the tf. data API to read the images for training
- B. Convert the images to tf .Tensor Objects, and then run Dataset. from_tensor_slices{).
- C. Create a tf.data.Dataset.prefetch transformation
- D. Convert the images to tf .Tensor Objects, and then run tf. data. Dataset. from_tensors ().
Answer: A
Explanation:
An input pipeline is a way to prepare and feed data to a machine learning model for training or inference. An input pipeline typically consists of several steps, such as reading, parsing, transforming, batching, and prefetching the data. An input pipeline can improve the performance and efficiency of the model, as it can handle large and complex datasets, optimize the data processing, and reduce the latency and memory usage1.
For the use case of developing an input pipeline for an ML training model that processes images from disparate sources at a low latency, the best option is to convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training. This option involves using the following components and techniques:
TFRecords: TFRecords is a binary file format that can store a sequence of data records, such as images, text, or audio. TFRecords can help to compress, serialize, and store the data efficiently, and reduce the data loading and parsing time. TFRecords can also support data sharding and interleaving, which can improve the data throughput and parallelism2.
Cloud Storage: Cloud Storage is a service that allows you to store and access data on Google Cloud. Cloud Storage can help to store and manage large and distributed datasets, such as images from different sources, and provide high availability, durability, and scalability. Cloud Storage can also integrate with other Google Cloud services, such as Compute Engine, AI Platform, and Dataflow3.
tf.data API: tf.data API is a set of tools and methods that allow you to create and manipulate data pipelines in TensorFlow. tf.data API can help to read, transform, batch, and prefetch the data efficiently, and optimize the data processing for performance and memory. tf.data API can also support various data sources and formats, such as TFRecords, CSV, JSON, and images.
By using these components and techniques, the input pipeline can process large datasets of images from disparate sources that do not fit in memory, and provide low latency and high performance for the ML training model. Therefore, converting the images into TFRecords, storing the images in Cloud Storage, and using the tf.data API to read the images for training is the best option for this use case.
Reference:
Build TensorFlow input pipelines | TensorFlow Core
TFRecord and tf.Example | TensorFlow Core
Cloud Storage documentation | Google Cloud
[tf.data: Build TensorFlow input pipelines | TensorFlow Core]
NEW QUESTION # 39
You work for a retail company. You have been asked to develop a model to predict whether a customer will purchase a product on a given day. Your team has processed the company's sales data, and created a table with the following rows:
* Customer_id
* Product_id
* Date
* Days_since_last_purchase (measured in days)
* Average_purchase_frequency (measured in 1/days)
* Purchase (binary class, if customer purchased product on the Date)
You need to interpret your models results for each individual prediction. What should you do?
- A. Create a BigQuery table Use BigQuery ML to build a logistic regression classification model Use the values of the coefficients of the model to interpret the feature importance with higher values corresponding to more importance.
- B. Create a Vertex Al tabular dataset Train an AutoML model to predict customer purchases Deploy the model to a Vertex Al endpoint and enable feature attributions Use the "explain" method to get feature attribution values for each individual prediction.
- C. Create a Vertex Al tabular dataset Train an AutoML model to predict customer purchases Deploy the model to a Vertex Al endpoint. At each prediction enable L1 regularization to detect non-informative features.
- D. Create a BigQuery table Use BigQuery ML to build a boosted tree classifier Inspect the partition rules of the trees to understand how each prediction flows through the trees.
Answer: B
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to "explain the predictions of a trained model". Vertex AI provides feature attributions using Shapley Values, a cooperative game theory algorithm that assigns credit to each feature in a model for a particular outcome2. Feature attributions can help you understand how the model calculates the predictions and debug or optimize the model accordingly. You can use AutoML for Tabular Data to generate and query local feature attributions3. The other options are not relevant or optimal for this scenario. Reference:
Professional ML Engineer Exam Guide
Feature attributions for classification and regression
AutoML for Tabular Data
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
NEW QUESTION # 40
A real estate company wants to create a machine learning model for predicting housing prices based on a historical dataset. The dataset contains 32 features.
Which model will meet the business requirement?
- A. Linear regression
- B. Logistic regression
- C. Principal component analysis (PCA)
- D. K-means
Answer: A
NEW QUESTION # 41
You are developing a model to identify traffic signs in images extracted from videos taken from the dashboard of a vehicle. You have a dataset of 100 000 images that were cropped to show one out of ten different traffic signs. The images have been labeled accordingly for model training and are stored in a Cloud Storage bucket You need to be able to tune the model during each training run. How should you train the model?
- A. Train a model for image classification by using Vertex Al AutoML.
- B. Develop the model training code for image classification and train a model by using Vertex Al custom training.
- C. Develop the model training code for object detection and tram a model by using Vertex Al custom training.
- D. Train a model for object detection by using Vertex Al AutoML.
Answer: C
NEW QUESTION # 42
Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?
- A. Use Kubeflow Pipelines to execute the experiments Export the metrics file, and query the results using the Kubeflow Pipelines API.
- B. Use Al Platform Training to execute the experiments Write the accuracy metrics to BigQuery, and query the results using the BigQueryAPI.
- C. Use Al Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API
- D. Use Al Platform Training to execute the experiments Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
Answer: A
NEW QUESTION # 43
You have been given a dataset with sales predictions based on your company's marketing activities. The data is structured and stored in BigQuery, and has been carefully managed by a team of data analysts. You need to prepare a report providing insights into the predictive capabilities of the data. You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks.
You only have a few hours to gather the results of your experiments. Which Google Cloud tools should you use to complete this task in the most efficient and self-serviced way?
- A. Read the data from BigQuery using Dataproc, and run several models using SparkML.
- B. Train a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms.
- C. Use BigQuery ML to run several regression models, and analyze their performance.
- D. Use Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics.
Answer: C
Explanation:
* Option A is correct because using BigQuery ML to run several regression models, and analyze their performance is the most efficient and self-serviced way to complete the task. BigQuery ML is a service that allows you to create and use ML models within BigQuery using SQL queries1. You can use BigQuery ML to run different types of regression models, such as linear regression, logistic regression, or DNN regression2. You can also use BigQuery ML to analyzethe performance of your models, such as the mean squared error, the accuracy, or the ROC curve3. BigQuery ML is fast, scalable, and easy to use, as it does not require any data movement, coding, or additional tools4.
* Option B is incorrect because reading the data from BigQuery using Dataproc, and running several models using SparkML is not the most efficient and self-serviced way to complete the task. Dataproc is a service that allows you to create and manage clusters of virtual machinesthat run Apache Spark and other open-source tools5. SparkML is a library that provides ML algorithms and utilities for Spark.
However, this option requires more effort and resources than option A, as it involves moving the data from BigQuery to Dataproc, creating and configuring the clusters, writing and running the SparkML code, and analyzing the results.
* Option C is incorrect because using Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics is not the most efficient and self-serviced way to complete the task. Vertex AI Workbench is a service that allows you to create and use notebooks for ML development and experimentation. Scikit-learn is a library that provides ML algorithms and utilities for Python. However, this option also requires more effort and resources than option A, as it involves creating and managing the notebooks, writing and running the scikit-learn code, and analyzing the results.
* Option D is incorrect because training a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms is not the most efficient and self-serviced way to complete the task. TensorFlow is a framework that allows you to create and train ML models using Python or other languages. Vertex AI is a service that allows you to train and deploy ML models using built-in algorithms or custom containers. However, this option also requires more effort and resources than option A, as it involves writing and running the TensorFlow code, creating and managing the training jobs, and analyzing the results.
References:
* BigQuery ML overview
* Creating a model in BigQuery ML
* Evaluating a model in BigQuery ML
* BigQuery ML benefits
* Dataproc overview
* [SparkML overview]
* [Vertex AI Workbench overview]
* [Scikit-learn overview]
* [TensorFlow overview]
* [Vertex AI overview]
NEW QUESTION # 44
A Machine Learning Specialist is preparing data for training on Amazon SageMaker. The Specialist is using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format and is transformed into a numpy.array, which appears to be negatively affecting the speed of the training.
What should the Specialist do to optimize the data for training on SageMaker?
- A. Use the SageMaker hyperparameter optimization feature to automatically optimize the data.
- B. Use the SageMaker batch transform feature to transform the training data into a DataFrame.
- C. Use AWS Glue to compress the data into the Apache Parquet format.
- D. Transform the dataset into the RecordIO protobufformat.
Answer: D
NEW QUESTION # 45
You need to quickly build and train a model to predict the sentiment of customer reviews with custom categories without writing code. You do not have enough data to train a model from scratch. The resulting model should have high predictive performance. Which service should you use?
- A. AutoML Natural Language
- B. AI Hub pre-made Jupyter Notebooks
- C. Cloud Natural Language API
- D. AI Platform Training built-in algorithms
Answer: A
Explanation:
AutoML Natural Language is a service that allows you to build and train custom natural language models without writing code. You can use AutoML Natural Language to perform sentiment analysis with custom categories, such as positive, negative, or neutral. You can also use pre-trained models or transfer learning to leverage existing knowledge and reduce the amount of data required to train a model from scratch. AutoML Natural Language provides a user-friendly interface and a powerful AutoML engine that optimizes your model for high predictive performance.
Cloud Natural Language API is a service that provides pre-trained models for common natural language tasks, such as sentiment analysis, entity analysis, and syntax analysis. However, it does not allow you to customize the categories or use your own data for training.
AI Hub pre-made Jupyter Notebooks are interactive documents that contain code, text, and visualizations for various machine learning scenarios. However, they require some coding skills and data preparation to use them effectively.
AI Platform Training built-in algorithms are pre-configured machine learning algorithms that you can use to train models on AI Platform. However, they do not support sentiment analysis as a natural language task.
References:
* AutoML Natural Language documentation
* Cloud Natural Language API documentation
* AI Hub documentation
* AI Platform Training documentation
NEW QUESTION # 46
You work on a growing team of more than 50 data scientists who all use Al Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?
- A. Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about Al Platform resource usage In BigQuery create a SQL view that maps users to the resources they are using.
- B. Separate each data scientist's work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.
- C. Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources
- D. Set up restrictive I AM permissions on the Al Platform notebooks so that only a single user or group can access a given instance.
Answer: C
Explanation:
Labels are key-value pairs that can be attached to any AI Platform resource, such as jobs, models, versions, or endpoints1. Labels can help you organize your resources into descriptive categories, such as project, team, environment, or purpose. You can use labels to filter the results when you list or monitor your resources, or to group them for billing or quota purposes2. Using labels is a simple and scalable way to manage your AI Platform resources without creating unnecessary complexity or overhead. Therefore, using labels to organize resources is the best strategy for this use case.
Reference:
Using labels
Filtering and grouping by labels
NEW QUESTION # 47
......
Professional-Machine-Learning-Engineer Dumps PDF and Test Engine Exam Questions: https://examsboost.actualpdf.com/Professional-Machine-Learning-Engineer-real-questions.html
