Introduction to Hadoop Distributed File System (HDFS)

This article will cover basic Hadoop Distributed File System (HDFS) commands in Linux. You’ll learn how to create and list directories, move, delete, read files and more.

Hadoop FS command line

The Hadoop FS command line is a simple way to access and interface with HDFS. Below are some basic HDFS commands in Linux, including operations like creating directories, moving files, deleting files, reading files, and listing directories.

To use HDFS commands, start the Hadoop services using the following command:

sbin/start-all.sh

To check if Hadoop is up and running:

jps

Below cover several basic HDFS commands, along with a list of more File system commands given command -help.

mkdir:

To create a directory, similar to Unix ls command.

Options:
-p : Do not fail if the directory already exists

$ hadoop fs -mkdir  [-p]

ls:

List directories present under a specific directory in HDFS, similar to Unix ls command. The -lsr command can be used for recursive listing of directories and files.

Options:
-d : List the directories as plain files
-h : Format the sizes of files to a human-readable manner instead of number of bytes
-R : Recursively list the contents of directories

$ hadoop fs -ls [-d] [-h] [-R]

copyFromLocal:

Copy files from the local file system to HDFS, similar to -put command. This command will not work if the file already exists. To overwrite the destination if the file already exists, add -f flag to command.

Options:

-p : Preserves access and modification time, ownership and the mode
-f : Overwrites the destination

$ hadoop fs -copyFromLocal [-f] [-p] …

copyToLocal:

Copy files from HDFS to local file system, similar to -get command.

$ hadoop fs -copyToLocal [-p] [-ignoreCrc] [-crc] ...

cat:

Display contents of a file, similar to Unix cat command.

$ hadoop fs -cat /user/data/sampletext.txt

cp:

Copy files from one directory to another within HDFS, similar to Unix cp command.

$ hadoop fs -cp /user/data/sample1.txt /user/hadoop1
$ hadoop fs -cp /user/data/sample2.txt /user/test/in1

mv:

Move files from one directory to another within HDFS, similar to Unix mv command.

$ hadoop fs -mv /user/hadoop/sample1.txt /user/text/

rm:

Remove a file from HDFS, similar to Unix rm command. This command does not delete directories. For recursive delete, use command -rm -r.

Options:
-r : Recursively remove directories and files
-skipTrash : To bypass trash and immediately delete the source
-f : Mention if there is no file existing
-rR : Recursively delete directories

$ hadoop fs -mv /user/hadoop/sample1.txt /user/text/

getmerge:

Merge a list of files in one directory on HDFS into a single file on local file system. One of the most important and useful commands when trying to read the contents of map reduce job or pig job’s output files.

$ hadoop fs -getmerge /user/data

setrep:

Change replication factor of a file to a specific instead of default replication factor for remaining in HDFS. If it is a directory, then the command will recursively change in the replication of all the files in the directory tree given the input provided.

Options:
-w : Request the command wait for the replication to be completed (potentially takes a long time)
-r : Accept for backwards compatibility and has no effect

$ hadoop fs -setrep [-R] [-w]

touchz:

Creates an empty file in HDFS.

$ hadoop fs -touchz URI

test:

Test an HDFS file’s existence of an empty file or if it is a directory or not.

Options:
-w : Request the command wait for the replication to be completed (potentially takes a long time)
-r : Accept for backwards compatibility and has no effect

$ hadoop fs -setrep [-R] [-w]

appendToFile:

Appends the contents of all given local files to the provided destination file on HDFS. The destination file will be created if it doesn’t already exist.

$ hadoop fs -appendToFile

chmod:

Change the permission of a file, similar to Linux shell’s command but with a few exceptions.

<MODE> Same as mode used for the shell’s command with the only letters recognized are ‘rwxXt’

<OCTALMODE> Mode specified in 3 or 4 digits. It is not possible to specify only part of the mode, unlike the shell command.

Options:
-R : Modify the files recursively

$ hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH

chown:

Change owner and group of a file, similar to Linux shell’s command but with a few exceptions.

Options:
-R : Modify the files recursively

$ hadoop fs -chown [-R] [OWNER][:[GROUP]] PATH

df:

Show capacity (free and used space) of the file system. The status of the root partitions are provided if the file system has multiple partitions and no path is specified.

Options:
-h : Format the sizes of files to a human-readable manner instead of number of bytes

$ hadoop fs -df [-h] [<path> ...]

du:

Show size of each file in the directory.

Options:
-s : Show total summary size
-h : Format the sizes of files to a human-readable manner instead of number of bytes

$ hadoop fs -df [-h] [<path> ...]

tail:

Show the last 1KB of the file.

Options:
-f : Show appended data as the file grows

$ hadoop fs -tail [-f]

List of HDFS Commands:

Use command -help to get the below list of possible HDFS commands.

user@ubuntu1:~$ hadoop fs -help
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] <path> ...]
	[-expunge]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-d] [-h] [-R] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-usage [cmd ...]]

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer