Differences between Hadoop 1.x and Hadoop 2.x, Hadoop 1.x Limitations and Hadoop 2.x YARN Benefits

Filed Under: Big Data

Before reading this post, please go through my previous posts to get some Basic knowledge about BigData Hadoop 1.x and 2.x.

  1. BigData Hadoop 1.x Architecture and Components
  2. BigData Hadoop 2.x Architecture and Components

In this post, we are going to discuss about Difference between Hadoop 1.x and Hadoop 2.x, Hadoop 1.x Architecture Drawbacks or Limitations and How Hadoop 2.x Architecture solves Hadoop 1.x Limitations in detail.

Apache Hadoop Latest version is 2.7.0.

Hadoop V.1.x Components

Apache Hadoop V.1.x has the following two major Components

  1. HDFS (HDFS V1)
  2. MapReduce (MR V1)

In Hadoop V.1.x, these two are also know as Two Pillars of Hadoop.

hadoop1.x-components

Hadoop V.2.x Components

Apache Hadoop V.2.x has the following three major Components

  1. HDFS V.2
  2. YARN (MR V2)
  3. MapReduce (MR V1)

In Hadoop V.2.x, these two are also know as Three Pillars of Hadoop.

hadoop2.x-components

Hadoop 1.x Limitations

Hadoop 1.x has many limitations or drawbacks. Main drawback of Hadoop 1.x is that MapReduce Component in it’s Architecture. That means it supports only MapReduce-based Batch/Data Processing Applications.

Hadoop 1.x has the following Limitations/Drawbacks:

  • It is only suitable for Batch Processing of Huge amount of Data, which is already in Hadoop System.
  • It is not suitable for Real-time Data Processing.
  • It is not suitable for Data Streaming.
  • It supports upto 4000 Nodes per Cluster.
  • It has a single component : JobTracker to perform many activities like Resource Management, Job Scheduling, Job Monitoring, Re-scheduling Jobs etc.
  • JobTracker is the single point of failure.
  • It does not support Multi-tenancy Support.
  • It supports only one Name Node and One Namespace per Cluster.
  • It does not support Horizontal Scalability.
  • It runs only Map/Reduce jobs.
  • It follows Slots concept in HDFS to allocate Resources (Memory, RAM, CPU). It has static Map and Reduce Slots. That means once it assigns resources to Map/Reduce jobs, it cannot re-use them even though some slots are idle.
  • For Example:- Suppose, 10 Map and 10 Reduce Jobs are running with 10 + 10 Slots to perform a computation. All Map Jobs are doing their tasks but all Reduce jobs are idle. We cannot use these Idle jobs for other purpose.

NOTE:- In Summary, Hadoop 1.x System is a Single Purpose System. We can use it only for MapReduce Based Applications.

Differences between Hadoop 1.x and Hadoop 2.x

If we observe the components of Hadoop 1.x and 2.x, Hadoop 2.x Architecture has one extra and new component that is : YARN (Yet Another Resource Negotiator).

It is the game changing component for BigData Hadoop System.

  • New Components and API
  • As shown in the below diagram, Hadoop 1.x is re-architected and introduced new component to solve Hadoop 1.x Limitations.

    hadoop1_vs_hadoop2

  • Hadoop 1.x Job Tracker
  • As shown in the below diagram, Hadoop 1.x Job Tracker component is divided into two components:

    1. Resource Manager:-
    2. To manage resources in cluster

    3. Application Master:-
    4. To manage applications like MapReduce, Spark etc.

    hadoop1_jobtracker_hadoop2

  • Hadoop 1.x supports only one namespace for managing HDFS filesystem whereas Hadoop 2.x supports multiple namespaces.
  • Hadoop 1.x supports one and only one programming model: MapReduce. Hadoop 2.x supports multiple programming models with YARN Component like MapReduce, Interative, Streaming, Graph, Spark, Storm etc.
  • Hadoop 1.x has lot of limitations in Scalability. Hadoop 2.x has overcome that limitation with new architecture.
  • Hadoop 2.x has Multi-tenancy Support, but Hadoop 1.x doesn’t.
  • Hadoop 1.x HDFS uses fixed-size Slots mechanism for storage purpose whereas Hadoop 2.x uses variable-sized Containers.
  • Hadoop 1.x supports maximum 4,000 nodes per cluster where Hadoop 2.x supports more than 10,000 nodes per cluster.

How Hadoop 2.x solves Hadoop 1.x Limitations

Hadoop 2.x has resolved most of the Hadoop 1.x limitations by using new architecture.

  • By decoupling MapReduce component responsibilities into different components.
  • By Introducing new YARN component for Resource management.
  • By decoupling component’s responsibilities, it supports multiple namespace, Multi-tenancy, Higher Availability and Higher Scalability.

Hadoop 2.x YARN Benefits

Hadoop 2.x YARN has the following benefits.

  • Highly Scalability
  • Highly Availability
  • Supports Multiple Programming Models
  • Supports Multi-Tenancy
  • Supports Multiple Namespaces
  • Improved Cluster Utilization
  • Supports Horizontal Scalability

That’s it all about Differences between Hadoop 1.x and Hadoop 2.x. We will discuss some more BigData and Hadoop Basics in my coming posts.

Please drop me a comment if you like my post or have any issues/suggestions.

Comments

  1. ravi says:

    can any one let me know the command to start jobhistory server in hadoop 2.x?

    1. Rambabu says:

      Are you talking about “Starting MapReduce JobHistory Server”?

    2. Rambabu says:

      We use this command
      $ ./mr-jobhistory-daemon.sh start historyserver

  2. Aravind says:

    Hi,
    Can you let me know how the below operations happen in hadoop 2.x

    1. Computation
    2. Reading Data

    Also could you let me know how the drawbacks of hadoop 1.x like scalabality, single point of failure, multiple namesapces etc are solved

  3. veeru says:

    bhayya naku 4 post le kanipistunnai entha search chesina. i need next posts also. nenu recent ga hadoop course join ayyanu . please next post lu kooda (https://www.journaldev.com/8806/differences-between-hadoop1-and-hadoop2) post cheyara . okavela post chesi vunte ela chudali….nenu LinkedIN lo chusanu andhra university lo chadivaru ani anduke telugu lo request chestunna..

    1. Rambabu says:

      Hi Veeru, Thanks for reading my posts.
      As I am working on other tutorials,I don’t have enough time to deliver next Hadoop posts. I have plan to deliver complete BigData Hadoop Ecosystem tutorials soon.
      Will start Hadoop pending posts soon.
      Many thanks,
      Ram

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages