CCD-410: Cloudera Certified Developer for Apache Hadoop (CCDH) Exam | Practice Test / Quiz / MCQs

1. You want to populate an associative array in order to perform a map-side join. You ?v decided to put this information in a text file, place that file into the DistributedCache and read it in your Mapper before any records are processed. Indentify which method in the Mapper you should use to implement code for reading the file and populating the associative array?

combineconfigureinitmap

2. Which process describes the lifecycle of a Mapper?

The JobTracker spawns a new Mapper to process all records in a single file.The TaskTracker spawns a new Mapper to process all records in a single input split.The JobTracker calls the TaskTracker s configure () method, then its map () method and finally its close () method.The TaskTracker spawns a new Mapper to process each key-value pair.

3. When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?

When the signature of the reduce method matches the signature of the combine method.When the types of the reduce operation s input key and input value match the types of the reducer s output key and output value and when the reduce operation is both communicative and associative.Always.The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance..Always.Code can be reused in Java since it is a polymorphic object-oriented programming language.

4. You ve written a MapReduce job that will process 500 million input records and generated 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reduces which is a potential bottleneck. A custom implementation of which interface is most likely to reduce the amount of intermediate data transferred across the network?

PartitionerCombinerWritableComparableOutputFormat

6. Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation.

TaskTrackerNameNodeJobTracker DataNode

7. All keys used for intermediate output from mappers must:

Be a subclass of FileInputFormat.Implement a splittable compression algorithm.Implement WritableComparable.Override isSplitable.

8. Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS.

Yes, so long as both tables fit into memory.Yes.Yes, but only if one of the tables fits into memoryNo, MapReduce cannot perform relational operations.

9. You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high-level programming language like Python. Which format should you use to store this data in HDFS?

SequenceFilesAvroJSONHTML

10. You need to run the same job many times with minor variations. Rather than hardcoding all job configuration options in your drive code, you ve decided to have your Driver subclass org.apache.hadoop.conf.Configured and implement the org.apache.hadoop.util.Tool interface. Indentify which invocation correctly passes.mapred.job.name with a value of Example to Hadoop?

hadoop MyDrive D mapred.job.name=Example input outputhadoop mapred.job.name=Example MyDriver input outputhadoop MyDriver mapred.job.name=Example input outputhadoop setproperty mapred.job.name=Example MyDriver input output

11. You have written a Mapper which invokes the following five calls to the OutputColletor.collect method: output.collect (new Text ( Apple ), new Text ( Red ) ) ; output.collect (new Text ( Banana ), new Text ( Yellow ) ) ; output.collect (new Text ( Apple ), new Text ( Yellow ) ) ; output.collect (new Text ( Cherry ), new Text ( Red ) ) ; output.collect (new Text ( Apple ), new Text ( Green ) ) ; How many times will the Reducer s reduce method be invoked?

3061

12. You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory?

None, the directory cannot be named jobdataFour, all files will be processedTwo, file names with a leading period or underscore are ignoredThree, the pound sign is an invalid character for HDFS file names

13. Identify the MapReduce v2 (MRv2 / YARN) daemon responsible for launching application containers and monitoring application resource usage?

ResourceManagerApplicationMasterServiceApplicationMasterNodeManager

14. Given a directory of files with the following structure: line number, tab character, string: Example: 1abialkjfjkaoasdfjksdlkjhqweroij 2kadfjhuwqounahagtnbvaswslmnbfgy 3kjfteiomndscxeqalkzhtopedkfsikj You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class)?

SequenceFileInputFormatSequenceFileAsTextInputFormatBDBInputFormatKeyValueFileInputFormat

15. Indentify which best defines a SequenceFile?

A SequenceFile contains a binary encoding of an arbitrary number key-value pairs.Each key must be the same type.Each value must be the same type.A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objectsA SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects

16. Which describes how a client reads a file from HDFS?

The client contacts the NameNode for the block location(s).The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s).The client then reads the data directly off the DataNode.The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.The client contacts the NameNode for the block location(s).The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the client.The client queries the NameNode for the block location(s).The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).

17. You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file. Which is the best way to make this library available to your MapReducer job at runtime?

Have your system administrator place the JAR file on a Web server accessible to all cluster nodes and then set the HTTP_JAR_URL environment variable to its location.Have your system administrator copy the JAR to all nodes in the cluster and set its location in the HADOOP_CLASSPATH environment variable before you submit your job.Package your code and the Apache Commands Math library into a zip file named JobJar.zipWhen submitting the job on the command line, specify the -libjars option followed by the JAR file path.

18. You want to count the number of occurrences for each unique word in the supplied input data. You ?v decided to implement this by having your mapper tokenize each word and emit a literal value 1, and then have your reducer increment a counter for each literal 1 it receives. After successful implementing this, it occurs to you that you could optimize this by specifying a combiner. Will you be able to reuse your existing Reduces as your combiner in this case and why or why not?

No, because the Combiner is incompatible with a mapper which doesn t use the same data type for both the key and value.No, because the Reducer and Combiner are separate interfaces.Yes, because the sum operation is both associative and commutative and the input and output types to the reduce method match.No, because the sum operation in the reducer is incompatible with the operation of a Combiner.

19. What is a SequenceFile?

A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous writable objects.A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be same type.A SequenceFile contains a binary encoding of an arbitrary number of homogeneous writable objects.A SequenceFile contains a binary encoding of an arbitrary number of Writable Comparable objects, in sorted order.

20. When is the earliest point at which the reduce method of a given Reducer can be called?

It depends on the InputFormat used for the job.As soon as a mapper has emitted at least one record.As soon as at least one mapper has finished processing its input split.Not until all mappers have finished processing all records.

21. Identify the tool best suited to import a portion of a relational database every day as files into HDFS, and generate Java classes to interact with that imported data?

SqoopFlumeOoziePig

22. On a cluster running MapReduce v1 (MRv1), a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot. What determines how the JobTracker assigns each map task to a TaskTracker?

The number and speed of CPU cores on the TaskTracker node.The location of the InsputSplit to be processed in relation to the location of the node.The amount of free disk space on the TaskTracker node.The amount of RAM installed on the TaskTracker node.

23. You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement?

Combiner Reducer Reducer Mapper

24. Workflows expressed in Oozie can contain:

Iterntive repetition of MapReduce jobs until a desired answer or state is reached.Sequences of MapReduce job only; on Pig on Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.

25. In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

Write a custom MapRunner that iterates over all key-value pairs in the entire file.Write a custom FileInputFormat and override the method isSplitable to always return false.Increase the parameter that controls minimum split size in the job configuration.Set the number of mappers equal to the number of input files you want to process.

25 Questions

❤ If you liked Fatskills, consider supporting us by checking out The Life Manuals You Never Got.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | OSHA Basics Quiz | What Should We Know?
Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson
© 2026 Fatskills.com

All trademarks, logos and brand names are the property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.

25 Questions

❤ If you liked Fatskills, consider supporting us by checking out The Life Manuals You Never Got.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | OSHA Basics Quiz | What Should We Know? Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson© 2026 Fatskills.com

All trademarks, logos and brand names are the property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | OSHA Basics Quiz | What Should We Know?
Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson
© 2026 Fatskills.com