How to check size of HDFS directory

+1 vote

In case of Linux filesystems we use du -sh. But is there any way to check directory size in case of HDFS?

May 3, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
48,485 views

12 answers to this question.

+1 vote

You can view the size of the files and directories in a specific directory with the du command. The command will show you the space (in bytes) used by the files that match the file pattern you specify. If it’s a file, you’ll get the length of the file. The syntax of the du command is as follows:

hdfs dfs -du -h /"path to specific hdfs directory"

image

Note the following about the output of the du –h command shown here:

The first column shows the actual size (raw size) of the files that users have placed in the various HDFS directories.

The second column shows the actual space consumed by those files in HDFS.

Hope this will answer your query to some extent.

For details, You can even check out Hadoop Ecosystem tools with the Hadoop big data course.

answered May 3, 2018 by nitinrawat895
• 11,380 points
0 votes
hdfs dfs -du [-s] [-h] URI [URI …]

answered Dec 7, 2018 by Nishant
0 votes

hadoop fs -du -s -h /path/to/dir

answered Dec 7, 2018 by abhijeet
0 votes

To get the size in Gb, you can try this:

hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'
answered Dec 7, 2018 by Narayan
0 votes
hadoop fs -du /user/hadoop/dir1 \
    /user/hadoop/file1 \
    hdfs://domain.com/user/hadoop/dir1 
answered Dec 7, 2018 by Nisha
0 votes
hdfs dfs -du -s -h /$DirectoryName
answered Dec 7, 2018 by Chunnu
0 votes

Using the following command, you'll get the size in %:

sudo -u hdfs hadoop fs –df
answered Dec 7, 2018 by Khush
0 votes

To check the size under a particular directory:

sudo -u hdfs hadoop fs -du -h /user
answered Dec 7, 2018 by Bunty
0 votes
hdfs dfs -du -s dir_name
answered Dec 7, 2018 by Yadav
0 votes

Another way to show size in GB:

hadoop fs -dus  /path/to/dir  |   awk '{print $2/1024**3 " G"}' 
answered Dec 7, 2018 by Anil
+1 vote

It is the same syntax. Use the following command

hadoop fs -du -s [DIR_NAME]
answered Jun 6, 2019 by Sowmya
0 votes

Hi,

You can check the size of the Hadoop directory. Hadoop has a command in its filesystem that you can use as shown below.

$ hadoop fs -du -s -h /path/to/dir
answered Dec 16, 2020 by MD
• 95,460 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to check the size of a file in Hadoop HDFS?

You can use the  hadoop fs -ls command to ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,220 points
13,620 views
+1 vote
1 answer

How to check a HDFS directory size?

Hi@akhtar, You can check the size of the ...READ MORE

answered Oct 5, 2020 in Big Data Hadoop by MD
• 95,460 points
927 views
+1 vote
1 answer

How to get status of hdfs directory using python?

import commands hdir_list = commands.getoutput('hadoop fs -ls hdfs: ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,792 views
0 votes
1 answer

How can I use my host machine’s web browser to check my HDFS services running in the VM?

The sole purpose of the virtual machine ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,511 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,658 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,108 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,240 views
0 votes
1 answer
0 votes
1 answer

How to extract only few lines of data from HDFS?

Here also in case of Hadoop, it is ...READ MORE

answered May 2, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
11,593 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
2,280 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP