Hadoop & MapReduce Getting Started
In this course learners will explore the theory behind big data analysis using Hadoop and how MapReduce enables parallel processing of large data sets distributed on a cluster of machines. Begin with an introduction to big data and the various sources and characteristics of data available today. Look at challenges involved in processing big data and options available to address them. Next a brief overview of Hadoop its role in processing big data and the functions of its components such as the Hadoop Distributed File System (HDFS) MapReduce and YARN (Yet Another Resource Negotiator). Explore the working of Hadoops MapReduce framework to process data in parallel on a cluster of machines. Recall steps involved in building a MapReduce application and specifics of the Map phase in processing each row of the input files data. Recognize the functions of the Shuffle and Reduce phases in sorting and interpreting the output of the Map phase to produce a meaningful output. To conclude complete an exercise on the fundamentals of Hadoop and MapReduce.