HDFS Commands

Filed Under: Java

In this lesson on Apache Hadoop HDFS commands, we will go through the most common commands which are used for Hadoop administration and to manage files present on a Hadoop cluster.

HDFS Commands

Hive commands can be run on any Hadoop cluster or you’re free to use any of the VMs offered by Hortonworks, Cloudera etc.

In this guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:

hadoop cluster ubuntu version

Ubuntu Version

Finally, we will make use of Hadoop v3.0.1 for this lesson:

Hadoop version

Hadoop version

Let’s get started.

Hadoop HDFS Commands

We will start with some very basic help commands and go into more detail as we go through this lesson.

Getting all HDFS Commands

The simplest help command for Hadoop HDFS is the following with which we get all the available commands in Hadoop and how to use them:

hadoop fs -help

Let’s see the output for this command:

HDFS commands fs help

Hadoop fs help


The output was quite long actually as this prints all the available commands a brief on how to use those commands as well.

Help on specific Hadoop command

The information printed from the last command was quite big as it printed all the commands. Finding help for a specific command is tricky in that output. Here is a command to narrow your search:

hadoop fs -help ls

Let’s see the output of this command:

Hadoop specific command guide

Hadoop specific command guide

Usage of specific Hadoop command

to know the syntax of each command, we don’t need t go anywhere apart from the terminal itself. We can know the syntax of a command on how to use it, use the usage option:

hadoop fs -usage ls

Let’s see the output of this command:

Usage of HDFS Commands

Usage of Hadoop Command


Apart from usage, it also shows all possible options for the command specified.

Listing fs files and directories

To list all the available files and subdirectories under default directory, just use the following command:

hadoop fs -ls

Let’s see the output for this command:

Listing all files

Listing all files


We ran this in the root directory and that’s why the output.

Making HDFS Directory

We can make a new directory for Hadoop File System using the following command:

hadoop fs -mkdir /root/journaldev_bigdata

Note that if you create a new directory inside the /user/ directory, Hadoop will have read/write permissions on the directory but with other directories, it only has read permission by default.

Copying file from Local file System to Hadoop FS

To copy a file from Local file System to Hadoop FS, we can use a simple command:

hadoop fs -copyFromLocal derby.log /root/journaldev_bigdata

Let’s see the output for this command:

Copy File from local fs to HDFS

Copy File from local fs to HDFS


If instead of copying the file, you just want to move it, just make use of the -moveFromLocal option.

Disk Usage

We can see the disk usage of files under HDFS in a given directory with a simple option as shown:

hadoop fs -du /root/journaldev_bigdata/

Let’s see the output for this command:

Disk Usage of a directory in HDFS

Disk Usage of a directory in HDFS


If you simply want to check disk usage of complete HDFS, run the following command:

Let’s see the output for this command:

Disk Usage of complete HDFS

Disk Usage of complete HDFS

Empty Trash Data

When we are sure that no files in the trash are usable, we can empty the trash in HDFS by deleting all files with the following command:

hadoop fs -expunge

This will simply delete all Trashed data in the HDFS and creates no output.

Modifying replication factor for a file

As we already know, replication factor is the count by which a file is replicated across as Hadoop cluster and in its HDFS. We can modify the replication factor of a file using the following command:

hadoop fs -setrep -w 1 /root/journaldev_bigdata/derby.log

Let’s see the output of this command:

Modify replication factor in HDFS

Modify replication factor in HDFS

Updating Hadoop Directory permissions

If you face permission related issues in Hadoop, run the following command:

hadoop fs -chmod 700 /root/journaldev_bigdata/

With this command, you can provide and formulate the permissions given to a HDFS directory and restrict its access.

Removing HDFS Directory

We can remove an entire HDFS directory using the rm command:

hadoop fs -rm -r /root/journaldev_bigdata

Let’s see the output for this command:

Removing directory from HDFS

Removing directory from HDFS

That’s all for a quick roundup on Hadoop HDFS commands.

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages