What is the use of sequence file in Hadoop

0 votes
I read about sequence file format in few blogs. Since, I am still new to hadoop I am not actually able to understand what is the application or purpose of sequence files. So, it would be really helpful if anyone can explain me what actually is a sequence file and where it is used in hadoop?
Apr 6, 2018 in Big Data Hadoop by Damon Salvatore
• 5,980 points
9,578 views

1 answer to this question.

0 votes

Sequence files are binary files containing serialized key/value pairs. You can compress a sequence file at the record (key-value pair) or block levels. This is one of the advantage of using sequence file. Also, sequebce files are binary files, they provide faster read/write than that of text file format.

Problem With Small Files and Hadoop

Now, one of the main problem that sequence file format solves is the problem of processing too many small files in Hadoop. As you know Hadoop is not good for processing large number of small files as referencing (memory) large amounts of small files generates a lot of overhead for the namenode. Besides this memory overhead, the second problem arises in terms of number of mappers as more number of mappers will be executed for each files (as the file size is smaller than that of block).

Solution: Sequence File

Sequence files allows you to solve this problem of small files. As discussed sequence file are the files containing key-value pairs. So, you can use it to hold multiple key-value pairs where the key can be unique file metadata, like  filename+timestamp and value is the content of the ingested file. Now, this way you are  able to club too many small files as a single file and then you can use this for processing as an input for MapReduce. This is the reason why sequence files often are used in custom-written map-reduce programs.

Let me know in case you have more confusion.

answered Apr 6, 2018 by Ashish
• 2,650 points
In the above answer how we are clubbing the multiple small text files into a single key value sequence file, Kindly explain
Hi@sujith,

You can go through the below-given link.

https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/SequenceFile.html

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the use of fsimage in hadoop?

The NameNode stores modifications to the file ...READ MORE

answered Dec 21, 2018 in Big Data Hadoop by Omkar
• 69,220 points
25,035 views
0 votes
1 answer

What is the extension of archive file created in Hadoop?

Hadoop archive is a facility which packs ...READ MORE

answered Dec 9, 2021 in Big Data Hadoop by Kavya
• 700 points
2,130 views
0 votes
12 answers

What is Zookeeper? What is the purpose of Zookeeper in Hadoop Ecosystem?

Hey, Apache Zookeeper says that it is a ...READ MORE

answered Apr 29, 2019 in Big Data Hadoop by Gitika
• 65,770 points
29,844 views
0 votes
1 answer

What is the slaves file configuration in Hadoop?

The main idea behind is the master ...READ MORE

answered Apr 24, 2018 in Big Data Hadoop by Shubham
• 13,490 points
4,464 views
0 votes
1 answer

What is the meaning of Write Ahead Log in Hadoop?

Write Ahead Log (WAL) is a file ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Sunil
2,127 views
0 votes
1 answer

What is the usage of Configured class in Hadoop programs?

Configured is a default implementation of the Configurable interface - ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,361 views
0 votes
1 answer

What is the command to count number of lines in a file in hdfs?

hadoop fs -cat /example2/doc1 | wc -l READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 69,220 points
4,023 views
0 votes
1 answer

What is the use of parser in Apache pig?

Hey, It is correct that it comes under ...READ MORE

answered May 8, 2019 in Big Data Hadoop by Gitika
• 65,770 points
1,156 views
+1 vote
1 answer

Is it necessary to use Zookeeper in Hadoop Stack?

ZooKeeper is a centralized service for maintaining ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by Ashish
• 2,650 points
820 views
0 votes
1 answer

What Distributed Cache is actually used for in Hadoop?

Basically distributed cache allows you to cache ...READ MORE

answered Apr 3, 2018 in Big Data Hadoop by Ashish
• 2,650 points
2,145 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP