Hadoop Architecture – YARN, HDFS and MapReduce

Filed Under: Big Data

Before reading this post, please go through my previous post at “Hadoop 1.x: Architecture and How it Works” to get basic knowledge about Hadoop.

Hadoop Architecture

In this post, we are going to discuss about Apache Hadoop 2.x Architecture and How it’s components work in detail.

Post’s Brief Table of Contents

  • Hadoop 2.x Architecture
  • Hadoop 2.x Major Components
  • How Hadoop 2.x Major Components Works

Hadoop 2.x Architecture

Apache Hadoop 2.x or later versions are using the following Hadoop Architecture. It is a Hadoop 2.x High-level Architecture. We will discuss in-detailed Low-level Architecture in coming sections.

hadoop architecture

  • Hadoop Common Module is a Hadoop Base API (A Jar file) for all Hadoop Components. All other components works on top of this module.
  • HDFS stands for Hadoop Distributed File System. It is also know as HDFS V2 as it is part of Hadoop 2.x with some enhanced features. It is used as a Distributed Storage System in Hadoop Architecture.
  • YARN stands for Yet Another Resource Negotiator. It is new Component in Hadoop 2.x Architecture. It is also know as “MR V2”.
  • MapReduce is a Batch Processing or Distributed Data Processing Module. It is also know as “MR V1” as it is part of Hadoop 1.x with some updated features.
  • Remaining all Hadoop Ecosystem components work on top of these three major components: HDFS, YARN and MapReduce. We will discuss all Hadoop Ecosystem components in-detail in my coming posts.

When compared to Hadoop 1.x, Hadoop 2.x Architecture is designed completely different. It has added one new component : YARN and also updated HDFS and MapReduce component’s Responsibilities.

Hadoop 2.x Major Components

Hadoop 2.x has the following three Major Components:

  • HDFS
  • YARN
  • MapReduce

These three are also known as Three Pillars of Hadoop 2. Here major key component change is YARN. It is really game changing component in BigData Hadoop System.

How Hadoop 2.x Major Components Works

Hadoop 2.x components follow this architecture to interact each other and to work parallel in a reliable, highly available and fault-tolerant manner.

Hadoop 2.x Components High-Level Architecture

hadoop 2 architecture diagram

  • All Master Nodes and Slave Nodes contains both MapReduce and HDFS Components.
  • One Master Node has two components:
    1. Resource Manager(YARN or MapReduce v2)
    2. HDFS

    It’s HDFS component is also knows as NameNode. It’s NameNode is used to store Meta Data.

  • In Hadoop 2.x, some more Nodes acts as Master Nodes as shown in the above diagram. Each this 2nd level Master Node has 3 components:
    1. Node Manager
    2. Application Master
    3. Data Node
  • Each this 2nd level Master Node again contains one or more Slave Nodes as shown in the above diagram.
  • These Slave Nodes have two components:
    1. Node Manager
    2. HDFS

    It’s HDFS component is also knows as Data Node. It’s Data Node component is used to store actual our application Big Data. These nodes does not contain Application Master component.

Hadoop 2.x Components In-detail Architecture

hadoop components and architecture

Hadoop 2.x Architecture Description

    Resource Manager:

  • Resource Manager is a Per-Cluster Level Component.
  • Resource Manager is again divided into two components:
    1. Scheduler
    2. Application Manager
  • Resource Manager’s Scheduler is :
    1. Responsible to schedule required resources to Applications (that is Per-Application Master).
    2. It does only scheduling.
    3. It does care about monitoring or tracking of those Applications.

    Application Master:

  • Application Master is a per-application level component. It is responsible for:
    1. Managing assigned Application Life cycle.
    2. It interacts with both Resource Manager’s Scheduler and Node Manager
    3. It interacts with Scheduler to acquire required resources.
    4. It interacts with Node Manager to execute assigned tasks and monitor those task’s status.

    Node Manager:

  • Node Manager is a Per-Node Level component.
  • It is responsible for:
    1. Managing the life-cycle of the Container.
    2. Monitoring each Container’s Resources utilization.

    Container:

  • Each Master Node or Slave Node contains set of Containers. In this diagram, Main Node’s Name Node is not showing the Containers. However, it also contains a set of Containers.
  • Container is a portion of Memory in HDFS (Either Name Node or Data Node).
  • In Hadoop 2.x, Container is similar to Data Slots in Hadoop 1.x. We will see the major differences between these two Components: Slots Vs Containers in my coming posts.

NOTE:-

  • Resource Manager is Per-Cluster component where as Application Master is per-application component.
  • Both Hadoop 1.x and Hadoop 2.x Architectures follow Master-Slave Architecture Model.

NOTE:-
Both Hadoop 1.x and 2.x Architecture posts (my previous post and this post) are still in progress. But you can read it once to get some idea. I’m going to do investigate about Hadoop 2 Architecture in detail and will update images and description accordingly on Monday.

That’s it all about Hadoop 2.x Architecture and How it’s Major Components work. Now we got some clear picture about both Hadoop 1.x and Hadoop 2.x systems.

It’s time to compare both Hadoop 1.x and Hadoop 2.x to find out: The major drawbacks of Hadoop 1.x, The Major benefits of Hadoop 2.x and Why They have redesigned complete Architecture. Please read my next post to get these useful information.

Please drop me a comment if you like my post or have any issues/suggestions.

Comments

  1. Anuj says:

    Here are the responsibilities of Application Manager:

    The ApplicationsManager is responsible for maintaining a collection of submitted applications. After application submission, it first validates the application’s specifications and rejects any application that requests unsatisfiable resources for its ApplicationMaster (i.e., there is no node in the cluster that has enough resources to run the ApplicationMaster itself). It then ensures that no other application was already submitted with the same application ID—a scenario that can be caused by an erroneous or a malicious client. Finally, it forwards the admitted application to the scheduler. This component is also responsible for recording and managing finished applications for a while before they are completely evacuated from the ResourceManager’s memory. When an application finishes, it places an ApplicationSummary in the daemon’s log file. Finally, the ApplicationsManager keeps a cache of completed applications long after applications finish to support users’ requests for application data (via web UI or command line). The configuration property yarn.resourcemanager.max-completed-applications controls the maximum number of such finished applications that the ResourceManager remembers at any point of time. The cache is a first-in, first-out list, with the oldest applications being moved out to accommodate freshly finished applications.

    https://stackoverflow.com/a/30967803

  2. Dilip says:

    Node Manager and Data Node components are Slave components. How come they are in Master Nodes??

    1. Shikhar Nigam says:

      I also have the same question, seems like there is some mistake in the shown diagram

  3. Rohit Singh says:

    Nice post RamBabu

  4. krish says:

    potential typo : Scheduler does not manage status the application

  5. Venkat says:

    Hi Rambabu,
    It was good explanation,but I need two more explanations required here.
    1)Responsibility of Application Manager
    2)Why 2nd level Master Nodes came into picture as we have Master Node in 1st level,what is major differnece between first level and second level Master Nodes.

    Thanks
    Venkat

  6. Arul says:

    Thank you very much for your great effort for collecting all document and publicising as nice article.

  7. Hareesh says:

    Hi Rambabu,

    Really Great work. thanks for your effort.

    a small typo please correct it. in below headings, it should be 2x.

    Hadoop 1.x Components In-detail Architecture
    Hadoop 1.x Architecture Description

    1. Rambabu says:

      Thank you. Updated those typo errors.

      Ram

  8. RaghuNathChowdary.Kolla says:

    HI
    The explanation is good

  9. Raghu says:

    Hi

    I liked your explanations. I am waiting for 2.x detailed architecture. Once done can you please send me on raghuram1656@gmail. com

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages