Building Batch Data Pipelines on GCP complete course is currently being offered by Google Cloud through Coursera platform.

About this Course:
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.
EL, ELT, ETL Quiz 1 Answers
Q1. Which of the following is the ideal use case for Extract and Load (EL)
- Ans: Scheduled periodic loads of log files (e.g. once a day)
Executing Spark on Cloud Dataproc Quiz 2 Answers
Q1. Which of the following statements are true about Cloud Dataproc?
- Lets you run Spark and Hadoop clusters with minimal administration
- Helps you create job-specific clusters without HDFS
Q2. Match each of the terms with what they do when setting up clusters in Cloud Dataproc:
Term Definition
__ 1. Zone – A. Costs less but may not be available always
__ 2. Standard Cluster mode – B. Determines the Google data center where compute nodes will be
__ 3. Preemptible – C. Provides 1 master and N workers
- B
- C
- A
Q3. Cloud Dataproc provides the ability for Spark programs to separate compute & storage by:
- Reading and writing data directory from/to Cloud Storage
Cloud Data Fusion and Cloud Composer Quiz 3 Answers
Q1. Cloud Data Fusion is the ideal solution when you need
- to build visual pipelines
Data Processing with Cloud Dataflow Quiz 4 Answers
Q1. Which of the following statements are true?
- Dataflow executes Apache Beam pipelines
- Dataflow transforms support both batch and streaming pipelines
Q2. Match each of the Dataflow terms with what they do in the life of a dataflow job:
Term Definition
__ 1. Transform A. Output endpoint for your pipeline
__ 2. PCollection B. A data processing operation or step in your pipeline
__ 3. Sink C. A set of data in your pipeline
- B
- C
- A
Post a Comment