Filtering Data Using Hadoop MapReduce
Extracting meaningful information from a very large dataset can be painstaking. In this Skillsoft Aspire course learners examine how Hadoops MapReduce can be used to speed up this operation. In a new project code the Mapper for an application to count the number of passengers in each Titanic class in the input data set. Then develop a Reducer and Driver to generate final passenger counts in each Titanic class. Build the project by using Maven and run on Hadoop master node to check that output correctly shows passenger class numbers. Apply MapReduce to filter only surviving Titanic passengers from the input data set. Execute the application and verify that filtering has worked correctly; examine job and output files with YARN cluster manager and HDFS (Hadoop Distributed File System) NameNode web User interfaces. Using a restaurant apps data set use MapReduce to obtain the distinct set of cuisines offered. Build and run the application and confirm output with HDFS from both command line and web application. The exercise involves filtering data by using MapReduce.