Home > Cloud Computing > Quizzes > CCD-410: Cloudera Certified Developer for Apache Hadoop (CCDH) Exam
CCD-410: Cloudera Certified Developer for Apache Hadoop (CCDH) Exam
Fast practice, instant feedback. Timer auto-submits when time’s up.
Avg score: 27% Most missed: “MapReduce v2 (MRv2/YARN) splits which major functions of the Job Tracker into se…”
CCD-410: Cloudera Certified Developer for Apache Hadoop (CCDH) Exam
Time left 00:00
25 Questions

1. You want to populate an associative array in order to perform a map-side join. You ?v decided to put this information in a text file, place that file into the DistributedCache and read it in your Mapper before any records are processed. Indentify which method in the Mapper you should use to implement code for reading the file and populating the associative array?
2. Which process describes the lifecycle of a Mapper?
3. When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?
4. You ve written a MapReduce job that will process 500 million input records and generated 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reduces which is a potential bottleneck. A custom implementation of which interface is most likely to reduce the amount of intermediate data transferred across the network?
5. In a MapReduce job with 500 map tasks, how many map task attempts will there be?
6. Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation.
7. All keys used for intermediate output from mappers must:
8. Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS.
9. You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high-level programming language like Python. Which format should you use to store this data in HDFS?
10. You need to run the same job many times with minor variations. Rather than hardcoding all job configuration options in your drive code, you ve decided to have your Driver subclass org.apache.hadoop.conf.Configured and implement the org.apache.hadoop.util.Tool interface. Indentify which invocation correctly passes.mapred.job.name with a value of Example to Hadoop?
11. You have written a Mapper which invokes the following five calls to the OutputColletor.collect method: output.collect (new Text ( Apple ), new Text ( Red ) ) ; output.collect (new Text ( Banana ), new Text ( Yellow ) ) ; output.collect (new Text ( Apple ), new Text ( Yellow ) ) ; output.collect (new Text ( Cherry ), new Text ( Red ) ) ; output.collect (new Text ( Apple ), new Text ( Green ) ) ; How many times will the Reducer s reduce method be invoked?
12. You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory?
13. Identify the MapReduce v2 (MRv2 / YARN) daemon responsible for launching application containers and monitoring application resource usage?
14. Given a directory of files with the following structure: line number, tab character, string: Example: 1abialkjfjkaoasdfjksdlkjhqweroij 2kadfjhuwqounahagtnbvaswslmnbfgy 3kjfteiomndscxeqalkzhtopedkfsikj You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class)?
15. Indentify which best defines a SequenceFile?
16. Which describes how a client reads a file from HDFS?
17. You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file. Which is the best way to make this library available to your MapReducer job at runtime?
18. You want to count the number of occurrences for each unique word in the supplied input data. You ?v decided to implement this by having your mapper tokenize each word and emit a literal value 1, and then have your reducer increment a counter for each literal 1 it receives. After successful implementing this, it occurs to you that you could optimize this by specifying a combiner. Will you be able to reuse your existing Reduces as your combiner in this case and why or why not?
19. What is a SequenceFile?
20. When is the earliest point at which the reduce method of a given Reducer can be called?
21. Identify the tool best suited to import a portion of a relational database every day as files into HDFS, and generate Java classes to interact with that imported data?
22. On a cluster running MapReduce v1 (MRv1), a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot. What determines how the JobTracker assigns each map task to a TaskTracker?
23. You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement?
24. Workflows expressed in Oozie can contain:
25. In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?