Difference between Text and String in Hadoop

0 votes

Elaborate the difference between org.apache.hadoop.io.Text and java.lang.String in the Apache Hadoop framework. Is it not possible to use String instead of introducing a new Text class? I have tried to find the difference and I don't understand it yet. Can anyone explain to me these with suitable examples?

Aug 8, 2019 in Big Data Hadoop by nitinrawat895
• 11,380 points

1 answer to this question.

0 votes

The binary representation of a Text object is a variable-length integer containing the number of bytes in the UTF-8 representation of the string, followed by the UTF-8 bytes themselves.

Text is a replacement for the UTF8 class, which was deprecated because it didn’t support strings whose encoding was over 32,767 bytes and because it used Java’s modified UTF-8.

Furthermore, Text uses standard UTF-8, which makes it potentially easier to interoperate with other tools that understand UTF-8.

Following are some of the differences in brief related to its functioning with respect to String:

Indexing: Because of its emphasis on using standard UTF-8, there are some differences between Text and the Java String class. Indexing for the Text class is in terms of position in the encoded byte sequence, not the Unicode character in the string, or the Java char code unit (as it is for String).

For instance, charAt() returns an int representing a Unicode code point, unlike the String variant that returns a char.

Iteration: Iterating over the Unicode characters in Text is complicated by the use of byte offsets for indexing since you can’t just increment the index.

Mutable: Another difference with String is that Text is mutable (like all Writable implementations in Hadoop, except NullWritable, which is a singleton). You can reuse a Text instance by calling one of the set()methods on it.

Resorting to String: The text doesn’t have as rich an API for manipulating strings as java.lang.String, so in many cases, you need to convert the text object to a String. This is done in the usual way, using the toString()method:

I hope this helps.

answered Aug 8, 2019 by ravikiran
• 4,620 points

Related Questions In Big Data Hadoop

0 votes
1 answer
0 votes
1 answer

What is the difference between Hadoop MapReduce and built-in MapReduce?

Differences are as follows: Hadoop's MR can be ...READ MORE

answered Sep 11, 2018 in Big Data Hadoop by Frankie
• 9,830 points
0 votes
1 answer

What is the difference between MapReduce and YARN in Hadoop?

MapReduce: MapReduce is an algorithm used to store ...READ MORE

answered Dec 19, 2018 in Big Data Hadoop by Omkar
• 69,220 points
0 votes
10 answers

What is the difference between Mongodb and Hadoop?

MongoDB is a NoSQL database, whereas Hadoop is ...READ MORE

answered Jun 20, 2018 in Big Data Hadoop by jenny_code
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
0 votes
1 answer
0 votes
1 answer

Explain to me the difference between HBase and HDFS.

Hadoop generally consists of three major components: HDFS It ...READ MORE

answered Apr 12, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP