MapReduce Algorithm Example

Filed Under: Big Data

Welcome to MapReduce algorithm example. Before writing MapReduce programs in CloudEra Environment, first we will discuss how MapReduce algorithm works in theory with some simple MapReduce example in this post. In my next posts, we will discuss about How to develop a MapReduce Program to perform WordCounting and some more useful and simple examples.

MapReduce Algorithm

MapReduce is a Distributed Data Processing Algorithm, introduced by Google in it’s MapReduce Tech Paper.

MapReduce Algorithm is mainly inspired by Functional Programming model. ( Please read this post “Functional Programming Basics” to get some understanding about Functional Programming , how it works and it’s major advantages).

MapReduce algorithm is mainly useful to process huge amount of data in parallel, reliable and efficient way in cluster environments.

I hope, everyone is familiar with “Divide and Conquer” algorithm. It uses Divide and Conquer technique to process large amount of data.

It divides input task into smaller and manageable sub-tasks (They should be executable independently) to execute them in-parallel.

MapReduce Algorithm Steps

MapReduce Algorithm uses the following three main steps:

  1. Map Function
  2. Shuffle Function
  3. Reduce Function

Here we are going to discuss each function role and responsibility in MapReduce algorithm. If you don’t understand it well in this section, don’t get panic. Please read next section, where we use one simple word counting example to explain them in-detail. Once you read next section again come back to this section re-read it again. I bet you will definitely understand these 3 steps or functions very well.

Map Function

Map Function is the first step in MapReduce Algorithm. It takes input tasks (say DataSets. I have given only one DataSet in below diagram.) and divides them into smaller sub-tasks. Then perform required computation on each sub-task in parallel.

This step performs the following two sub-steps:

  1. Splitting
  2. Mapping
  • Splitting step takes input DataSet from Source and divide into smaller Sub-DataSets.
  • Mapping step takes those smaller Sub-DataSets and perform required action or computation on each Sub-DataSet.

The output of this Map Function is a set of key and value pairs as <Key, Value> as shown in the below diagram.

MapReduce Example Mapping Function

MapReduce First Step Output:

map reduce algorithm map function output

Shuffle Function

It is the second step in MapReduce Algorithm. Shuffle Function is also know as “Combine Function”.

It performs the following two sub-steps:

  1. Merging
  2. Sorting

It takes a list of outputs coming from “Map Function” and perform these two sub-steps on each and every key-value pair.

  • Merging step combines all key-value pairs which have same keys (that is grouping key-value pairs by comparing “Key”). This step returns <Key, List<Value>>.
  • Sorting step takes input from Merging step and sort all key-value pairs by using Keys. This step also returns <Key, List<Value>> output but with sorted key-value pairs.

Finally, Shuffle Function returns a list of <Key, List<Value>> sorted pairs to next step.

MapReduce algorithm shuffle function

MapReduce Second Step Output:

MapReduce algorithm shuffle function output

Reduce Function

It is the final step in MapReduce Algorithm. It performs only one step : Reduce step.

It takes list of <Key, List<Value>> sorted pairs from Shuffle Function and perform reduce operation as shown below.

MapReduce algorithm reduce function

MapReduce Final Step Output:

MapReduce algorithm reduce function output

Final step output looks like first step output. However final step <Key, Value> pairs are different than first step <Key, Value> pairs. Final step <Key, Value> pairs are computed and sorted pairs.

We can observe the difference between first step output and final step output with some simple example. We will discuss same steps with one simple example in next section.

That’s it all three steps of MapReduce Algorithm.

MapReduce Example – Word Count

In this section, we are going to discuss about “How MapReduce Algorithm solves WordCount Problem” theoretically. We will implement a Hadoop MapReduce Program and test it in my coming post.

Problem Statement:
Count the number of occurrences of each word available in a DataSet.

Input DataSet
Please find our example Input DataSet file in below diagram. Just for simplicity, we are going to use simple small DataSet. However, Real-time applications use very huge amount of Data.

MapReduce Example wordcount inputfile content

Client Required Final Result

MapReduce Example wordcount final output

MapReduce – Map Function (Split Step)

MapReduce Example wordcount mapping split step

MapReduce – Map Function (Mapping Step)

MapReduce Example wordcount mapping mapping step

MapReduce – Shuffle Function (Merge Step)

MapReduce Example wordcount mapping merge step1

MapReduce Example wordcount mapping merge step2

MapReduce – Shuffle Function (Sorting Step)

MapReduce Example wordcount mapping sorting step

MapReduce – Reduce Function (Reduce Step)

MapReduce Example wordcount mapping reduce step

MapReduce 3 Step Process With WordCount Example

MapReduce Example wordcount complete steps

That’s it all about MapReduce Algorithm and map reduce example step by step. It’s time to start Developing and testing MapReduce Programs. First, We are going to develop same “WordCounting” example in my coming post.

Please drop me a comment if you like my post or have any issues/suggestions.

Reference: Wikipedia, Google Research Paper

Comments

  1. PURUSHOTTAM says:

    Thanks for sharing well articulated map-reduce function. it was easy to understand.

  2. Ali Muttaleb says:

    It’s was simple and nice test as I did in my work also. to reduce the features

  3. Vatsal Mathur says:

    I need help in completing my assignment of map only algorithm

  4. Bhumika says:

    This article helped a lot to understand map reduce

  5. evelyn says:

    great article..it was of help.thnx

  6. mir says:

    In Hadoop, MapReduce is a calculation that decomposes large manipulation jobs into individual tasks that can be executed in parallel cross a cluster of servers. The results of tasks can be joined together to compute final results.

  7. marina says:

    Perfect, thanks a lot

  8. ARUNA RAJENDRAN says:

    EXCELLENT AND VERY USEFUL

    1. Djelel says:

      hello,

      do you have a pseudo algorithme for shuffle step in mapreduce function for an example ??

      thanks

  9. Megha Gupta says:

    I have a question in mind that instead of following such a long process, word count can be done easily by taking buckets of each word and placing the words in its own bucket. when all the words have been placed in the bucket, apply counter (counter will count the number of words in the bucket). Our question (count the no of words) gets solved by using buckets also which is more easy and saves time, Please reply for the same.

    1. bunny says:

      the process you mentioned will be used for small sets of data,,,,but with large data sets point of view buckets may not be advisable

  10. Shashi says:

    Excellent !

  11. Kotesh says:

    very very useful article….

  12. Kavita says:

    Its an excellent tutorial Thank You so much

  13. krishna N says:

    very good example.

  14. Sarvar says:

    Good article with nice explanation. It seems there’s small mistake. I think merging step will be in reduce process just after sorting step.

    1. Sarvar says:

      sorry, my mistake

  15. Sharan says:

    Very well written and displayed. Very helpful for the beginners to understand the basic concepts.

  16. Hadis says:

    Excellent
    very very good
    thank you very much

  17. Suresh R says:

    Very nice to start leanring Hadoop

  18. charmy shah says:

    Hello Pankaj,

    I am learning hadoop ritenow, trying to do basic wordcount example on eclipse but I am getting error,

    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
    at org.apache.hadoop.conf.Configuration.(Configuration.java:173)
    at WordCount.main

    can you please help how to solve that, I tried everything I can but I cannot solve the problem

    Charmy

  19. Gunjan Bhadra says:

    nice one

  20. Ramchandra Nichite says:

    This artical is very easy for understanding. I want some mathematical solved example .
    plz help me..

  21. liam says:

    Excellent!This is the best article for introduce Mapredure Algorithm I have ever read.

  22. sachin says:

    It is good but I need some algorithm or program to understand I mean I get your use-case or example. But it takes lot of time to implement this .So, I need ur hrlp…

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages