Install Hadoop on Ubuntu

Filed Under: Big Data

In this lesson, we will see how we can get started with Apache Hadoop by installing it on our Ubuntu machine. Installing and running Apache Hadoop can be tricky and that’s why we’ll try to keep this lesson as simple and informative as possible.

In this installation guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:

ubuntu machine version

Ubuntu Version

Also, if you just want quickly explore Hadoop, read CloudEra Hadoop VMWare Single Node Environment Setup.

Prerequisite for Installing Hadoop on Ubuntu

Before we can start installing Hadoop, we need to update Ubuntu with the latest software patches available:

sudo apt-get update && sudo apt-get -y dist-upgrade

Next, we need to install Java on the machine as Java is the main Prerequisite to run Hadoop. Java 6 and above versions are supported for Hadoop. Let’s install Java 8 for this lesson:

sudo apt-get -y install openjdk-8-jdk-headless

To install Hadoop, make a directory and move inside it:

mkdir jd-hadoop && cd jd-hadoop

Installing Hadoop on Ubuntu

Now that we’re ready with the basic setup for Hadoop on our Ubuntu machine, let’s download Hadoop installation files so that we can work on its configuration as well:

wget http://mirror.cc.columbia.edu/pub/software/apache/hadoop/common/hadoop-3.0.1/hadoop-3.0.1.tar.gz

We’re going to use the Hadoop 3.0.1 version for Hadoop. Find the latest version for Hadoop here. Once the file is downloaded, run the following command to unzip the file:

tar xvzf hadoop-3.0.1.tar.gz

This might take few moments as the archive is big in size. At this moment, Hadoop should be unarchived in your current directory:

hadoop installer unarchived

Hadoop Unarchived

Adding Hadoop user account

We will create a separate Hadoop user on our machine to keep HDFS separate from our original file system. We can first create a User group on our machine:

addgroup hadoop

You should see something like this:

adding hadoop user on ubuntu

Ubuntu Adding User Group

Now we can add a new user to this group:

useradd -G hadoop jdhadoopuser

Note that I am running all commands as a root user. Now, we have a user called jdhadoopuser in the hadoop group.

Finally, we’ll provide root access to jdhadoopuser user. To do this, open the /etc/sudoers file with this command:

sudo visudo

Now, enter this as the last line in the file:

jdhadoopuser ALL=(ALL) ALL

As of now, file should look like this:

adding hadoop user to sudo users

Making root user

Hadoop Single Node Setup: Standalone Mode

Hadoop on a Single Node means that Hadoop will run as a single Java process. This mode is usually used only in debugging environments and not for production use. With this mode, we can run simple Map R programs which process a smaller amount of data.

Rename the hadoop archive as currently present to hadoop only:

mv /root/jd-hadoop/hadoop-3.0.1 /root/jd-hadoop/hadoop

Now, provide ownership of this directory to the jdhadoopuser.

chown -R jdhadoopuser:hadoop /root/jd-hadoop/hadoop

A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:


mv hadoop /usr/local/
cd /usr/local/

Now, edit the .bashrc file to add Hadoop and Java to path using this command:

vi ~/.bashrc

Add these lines to the end of the .bashrc file:


# Configure Hadoop and Java Home
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export PATH=$PATH:$HADOOP_HOME/bin

Now, it is time to tell Hadoop as well where Java is present. We can do this by providing this path in hadoop-env.sh file. In separate Hadoop installations, the location of this file can be different. To find where this file is, run the following command right outside the hadoop directory:

find hadoop/ -name hadoop-env.sh

When I visit the directory I am shown, I can see the needed file present there:

hadoop env file

Hadoop Env file


Now, edit the file:

vi hadoop-env.sh

On the last line, enter the following and save it:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Testing Hadoop Installation on Ubuntu

We can test Hadoop installation by executing a sample application now which comes pre-made with Hadoop, a word counter example JAR. Just execute the following command:

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.1.jar wordcount /usr/local/hadoop/README.txt /root/jd-hadoop/Output

Once you execute the following command, we see the file part-r-00000 as an output:

hadoop install on ubuntu and running simple program

Output file


If you want, you can see the content of this file with following command:

cat part-r-00000

Now that this example ran, this means that Hadoop has been successfully installed on your system!

Conclusion

In this lesson, we saw how we can install Apache Hadoop on an Ubuntu server and start executing sample programs with it. Read more Big Data Posts to gain deeper knowledge of available Big Data tools and processing frameworks.

Comments

  1. bdevils464 says:

    Thanks for sir u provided these informations

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages