Ultimate Guide to Prepare Free Google Professional-Data-Engineer Exam Questions & Answer [Q75-Q99]

4.5/5 - (6 votes)

Ultimate Guide to Prepare Free Google Professional-Data-Engineer Exam Questions and Answer

Pass Google Professional-Data-Engineer Tests Engine pdf – All Free Dumps

Understanding functional and technical aspects of Google Professional Data Engineer Exam Building and operationalizing data processing systems

The following will be discussed here:

  • Monitoring pipelines
  • Storage costs and performance
  • Building and operationalizing processing infrastructure
  • Validating a migration
  • Transformation
  • Data acquisition and import
  • Provisioning resources
  • Batch and streaming
  • Integrating with new data sources
  • Building and operationalizing data processing systems
  • Testing and quality control
  • Lifecycle management of data
  • Building and operationalizing storage systems
  • Effective use of managed services (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Cloud Datastore, Cloud Memorystore)
  • Adjusting pipelines
  • Building and operationalizing pipelines
  • Awareness of current state and how to migrate a design to a future state

Understanding functional and technical aspects of Google Professional Data Engineer Exam Ensuring solution quality

The following will be discussed here:

  • Verification and monitoring
  • Ensuring scalability and efficiency
  • Ensuring privacy (e.g., Data Loss Prevention API)
  • Resizing and autoscaling resources
  • Designing for data and application portability (e.g., multi-cloud, data residency requirements)
  • Data security (encryption, key management)
  • Choosing between ACID, idempotent, eventually consistent requirements
  • Pipeline monitoring (e.g., Stackdriver)
  • Ensuring reliability and fidelity
  • Legal compliance (e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR))
  • Ensuring flexibility and portability

 

Q75. You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?

 
 
 
 

Q76. Your company is using WHILECARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error:
# Syntax error : Expected end of statement but got “-” at [4:11] SELECT age FROM bigquery-public-data.noaa_gsod.gsod WHERE age != 99 AND_TABLE_SUFFIX = `1929′ ORDER BY age DESC Which table name will make the SQL statement work correctly?

 
 
 
 

Q77. You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud. You want to support transactions that scale horizontally. You also want to optimize data for range queries on nonkey columns. What should you do?

 
 
 
 

Q78. Business owners at your company have given you a database of bank transactions. Each row contains the
user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what
type of machine learning can be applied to the data. Which three machine learning applications can you
use? (Choose three.)

 
 
 
 
 
 

Q79. You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs. You want to use a managed service. What should you do?

 
 
 
 

Q80. Which of the following statements about Legacy SQL and Standard SQL is not true?

 
 
 
 

Q81. What are two of the benefits of using denormalized data structures in BigQuery?

 
 
 
 

Q82. Why do you need to split a machine learning dataset into training data and test data?

 
 
 
 

Q83. The _________ for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Dataflow pipeline.

 
 
 
 

Q84. You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

 
 
 
 
 

Q85. You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

 
 
 
 

Q86. Your financial services company is moving to cloud technology and wants to store 50 TB of financial time- series data in the cloud. This data is updated frequently and new data will be streaming in all the time. Your company also wants to move their existing Apache Hadoop jobs to the cloud to get insights into this data.
Which product should they use to store the data?

 
 
 
 

Q87. Which of the following is NOT one of the three main types of triggers that Dataflow supports?

 
 
 
 

Q88. You’re training a model to predict housing prices based on an available dataset with real estate properties.
Your plan is to train a fully connected neural net, and you’ve discovered that the dataset contains latitude and longitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you’d like to engineer a feature that incorporates this physical dependency.
What should you do?

 
 
 
 

Q89. You work for a shipping company that has distribution centers where packages move on delivery lines to route them properly. The company wants to add cameras to the delivery lines to detect and track any visual damage to the packages in transit. You need to create a way to automate the detection of damaged packages and flag them for human review in real time while the packages are in transit. Which solution should you choose?

 
 
 
 

Q90. You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics.
Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded.
The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

 
 
 
 

Q91. You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)

 
 
 
 
 

Q92. Your company’s on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage.
You want to minimize the storage cost of the migration. What should you do?

 
 
 
 

Q93. Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

 
 
 
 

Q94. When a Cloud Bigtable node fails, ____ is lost.

 
 
 
 

Q95. Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?

 
 
 
 

Q96. You are planning to use Google’s Dataflow SDK to analyze customer data such as displayed below. Your project requirement is to extract only the customer name from the data source and then write to an output PCollection.
Tom,555 X street
Tim,553 Y street
Sam, 111 Z street
Which operation is best suited for the above data processing requirement?

 
 
 
 

Q97. Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?

 
 
 
 

Q98. You work for an advertising company, and you’ve developed a Spark ML model to predict click-through rates at advertisement blocks. You’ve been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?

 
 
 
 

Q99. You have spent a few days loading data from comma-separated values (CSV) files into the Google
BigQuery table CLICK_STREAM. The column DTstores the epoch time of click events. For convenience,
you chose a simple schema where every field is treated as the STRINGtype. Now, you want to compute
web session durations of users who visit your site, and you want to change its data type to the
TIMESTAMP. You want to minimize the migration effort without making future queries computationally
expensive. What should you do?

 
 
 
 
 

Online Exam Practice Tests with detailed explanations!: https://www.dumpleader.com/Professional-Data-Engineer_exam.html

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter the text from the image below