Before reading this post, please go through my previous post at “Hadoop 1.x: Architecture and How it Works” to get basic knowledge about Hadoop.
Table of Contents
In this post, we are going to discuss about Apache Hadoop 2.x Architecture and How it’s components work in detail.
Post’s Brief Table of Contents
- Hadoop 2.x Architecture
- Hadoop 2.x Major Components
- How Hadoop 2.x Major Components Works
Hadoop 2.x Architecture
Apache Hadoop 2.x or later versions are using the following Hadoop Architecture. It is a Hadoop 2.x High-level Architecture. We will discuss in-detailed Low-level Architecture in coming sections.
- Hadoop Common Module is a Hadoop Base API (A Jar file) for all Hadoop Components. All other components works on top of this module.
- HDFS stands for Hadoop Distributed File System. It is also know as HDFS V2 as it is part of Hadoop 2.x with some enhanced features. It is used as a Distributed Storage System in Hadoop Architecture.
- YARN stands for Yet Another Resource Negotiator. It is new Component in Hadoop 2.x Architecture. It is also know as “MR V2”.
- MapReduce is a Batch Processing or Distributed Data Processing Module. It is also know as “MR V1” as it is part of Hadoop 1.x with some updated features.
- Remaining all Hadoop Ecosystem components work on top of these three major components: HDFS, YARN and MapReduce. We will discuss all Hadoop Ecosystem components in-detail in my coming posts.
When compared to Hadoop 1.x, Hadoop 2.x Architecture is designed completely different. It has added one new component : YARN and also updated HDFS and MapReduce component’s Responsibilities.
Hadoop 2.x Major Components
Hadoop 2.x has the following three Major Components:
These three are also known as Three Pillars of Hadoop 2. Here major key component change is YARN. It is really game changing component in BigData Hadoop System.
How Hadoop 2.x Major Components Works
Hadoop 2.x components follow this architecture to interact each other and to work parallel in a reliable, highly available and fault-tolerant manner.
Hadoop 2.x Components High-Level Architecture
- All Master Nodes and Slave Nodes contains both MapReduce and HDFS Components.
- One Master Node has two components:
- Resource Manager(YARN or MapReduce v2)
It’s HDFS component is also knows as NameNode. It’s NameNode is used to store Meta Data.
- Node Manager
- Application Master
- Data Node
- Node Manager
It’s HDFS component is also knows as Data Node. It’s Data Node component is used to store actual our application Big Data. These nodes does not contain Application Master component.
Hadoop 2.x Components In-detail Architecture
Hadoop 2.x Architecture Description
- Resource Manager is a Per-Cluster Level Component.
- Resource Manager is again divided into two components:
- Application Manager
- Responsible to schedule required resources to Applications (that is Per-Application Master).
- It does only scheduling.
- It does care about monitoring or tracking of those Applications.
- Managing assigned Application Life cycle.
- It interacts with both Resource Manager’s Scheduler and Node Manager
- It interacts with Scheduler to acquire required resources.
- It interacts with Node Manager to execute assigned tasks and monitor those task’s status.
- Managing the life-cycle of the Container.
- Monitoring each Container’s Resources utilization.
- Resource Manager is Per-Cluster component where as Application Master is per-application component.
- Both Hadoop 1.x and Hadoop 2.x Architectures follow Master-Slave Architecture Model.
Both Hadoop 1.x and 2.x Architecture posts (my previous post and this post) are still in progress. But you can read it once to get some idea. I’m going to do investigate about Hadoop 2 Architecture in detail and will update images and description accordingly on Monday.
That’s it all about Hadoop 2.x Architecture and How it’s Major Components work. Now we got some clear picture about both Hadoop 1.x and Hadoop 2.x systems.
It’s time to compare both Hadoop 1.x and Hadoop 2.x to find out: The major drawbacks of Hadoop 1.x, The Major benefits of Hadoop 2.x and Why They have redesigned complete Architecture. Please read my next post to get these useful information.
Please drop me a comment if you like my post or have any issues/suggestions.