Before reading this post, please go through my previous posts to get some Basic knowledge about BigData Hadoop 1.x and 2.x.
In this post, we are going to discuss about Difference between Hadoop 1.x and Hadoop 2.x, Hadoop 1.x Architecture Drawbacks or Limitations and How Hadoop 2.x Architecture solves Hadoop 1.x Limitations in detail.
Apache Hadoop Latest version is 2.7.0.
Table of Contents
Hadoop V.1.x Components
Apache Hadoop V.1.x has the following two major Components
- HDFS (HDFS V1)
- MapReduce (MR V1)
In Hadoop V.1.x, these two are also know as Two Pillars of Hadoop.
Hadoop V.2.x Components
Apache Hadoop V.2.x has the following three major Components
- HDFS V.2
- YARN (MR V2)
- MapReduce (MR V1)
In Hadoop V.2.x, these two are also know as Three Pillars of Hadoop.
Hadoop 1.x Limitations
Hadoop 1.x has many limitations or drawbacks. Main drawback of Hadoop 1.x is that MapReduce Component in it’s Architecture. That means it supports only MapReduce-based Batch/Data Processing Applications.
Hadoop 1.x has the following Limitations/Drawbacks:
- It is only suitable for Batch Processing of Huge amount of Data, which is already in Hadoop System.
- It is not suitable for Real-time Data Processing.
- It is not suitable for Data Streaming.
- It supports upto 4000 Nodes per Cluster.
- It has a single component : JobTracker to perform many activities like Resource Management, Job Scheduling, Job Monitoring, Re-scheduling Jobs etc.
- JobTracker is the single point of failure.
- It does not support Multi-tenancy Support.
- It supports only one Name Node and One Namespace per Cluster.
- It does not support Horizontal Scalability.
- It runs only Map/Reduce jobs.
- It follows Slots concept in HDFS to allocate Resources (Memory, RAM, CPU). It has static Map and Reduce Slots. That means once it assigns resources to Map/Reduce jobs, it cannot re-use them even though some slots are idle.
For Example:- Suppose, 10 Map and 10 Reduce Jobs are running with 10 + 10 Slots to perform a computation. All Map Jobs are doing their tasks but all Reduce jobs are idle. We cannot use these Idle jobs for other purpose.
NOTE:- In Summary, Hadoop 1.x System is a Single Purpose System. We can use it only for MapReduce Based Applications.
Differences between Hadoop 1.x and Hadoop 2.x
If we observe the components of Hadoop 1.x and 2.x, Hadoop 2.x Architecture has one extra and new component that is : YARN (Yet Another Resource Negotiator).
It is the game changing component for BigData Hadoop System.
- New Components and API
- Hadoop 1.x Job Tracker
As shown in the below diagram, Hadoop 1.x is re-architected and introduced new component to solve Hadoop 1.x Limitations.
As shown in the below diagram, Hadoop 1.x Job Tracker component is divided into two components:
- Resource Manager:-
- Application Master:-
To manage resources in cluster
To manage applications like MapReduce, Spark etc.
How Hadoop 2.x solves Hadoop 1.x Limitations
Hadoop 2.x has resolved most of the Hadoop 1.x limitations by using new architecture.
- By decoupling MapReduce component responsibilities into different components.
- By Introducing new YARN component for Resource management.
- By decoupling component’s responsibilities, it supports multiple namespace, Multi-tenancy, Higher Availability and Higher Scalability.
Hadoop 2.x YARN Benefits
Hadoop 2.x YARN has the following benefits.
- Highly Scalability
- Highly Availability
- Supports Multiple Programming Models
- Supports Multi-Tenancy
- Supports Multiple Namespaces
- Improved Cluster Utilization
- Supports Horizontal Scalability
That’s it all about Differences between Hadoop 1.x and Hadoop 2.x. We will discuss some more BigData and Hadoop Basics in my coming posts.
Please drop me a comment if you like my post or have any issues/suggestions.