Serde input and output

–1 vote

What is the link between Input and output format with respect to serde?

Dec 27, 2018 in Big Data Hadoop by digger
• 26,740 points

1 answer to this question.

0 votes

Input Processing

Hive's execution engine (referred to as just engine henceforth) first uses the configured InputFormat to read in a record of data (the value object returned by the RecordReader of the InputFormat).

The engine then invokes Serde.deserialize() to perform deserialization of the record. There is no real binding that the deserialized object returned by this method indeed be a fully deserialized one.

The engine also gets hold of the ObjectInspector to use by invoking Serde.getObjectInspector(). This has to be a subclass of structObjectInspector since a record representing a row of input data is essentially a struct type.

The engine passes the deserialized object and the object inspector to all operators for their use in order to get the needed data from the record. The object inspector knows how to construct individual fields out of a deserialized record. For example, StructObjectInspector has a method called getStructFieldData() which returns a certain field in the record. This is the mechanism to access individual fields.

Output Processing

Output is analogous to input. The engine passes the deserialized Object representing a record and the corresponding ObjectInspector to Serde.serialize(). In this context serialization means converting the record object to an object of the type expected by the OutputFormat which will be used to perform the write. To perform this conversion, the serialize() method can make use of the passed ObjectInspector to get the individual fields in the record in order to convert the record to the appropriate type.

answered Dec 27, 2018 by Omkar
• 69,220 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Can we use different input and output format classes?

Yes, InputFormatClass and OutputFormatClass are independent of ...READ MORE

answered Jul 22, 2019 in Big Data Hadoop by Jishan
0 votes
1 answer

How to solve error caused due to output types of mapper and reducer not matching?

job.setOutputValueClass will set the types expected as ...READ MORE

answered Jul 9, 2019 in Big Data Hadoop by Rishab
0 votes
1 answer

Output types of mapper and reducer does not match

job.setOutputValueClass will set the types expected as ...READ MORE

answered Jul 22, 2019 in Big Data Hadoop by Reena
0 votes
1 answer

input split and block size with examples

Hi@siva, Hadoop HDFS split large files into small ...READ MORE

answered Jul 13, 2020 in Big Data Hadoop by MD
• 95,460 points
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
0 votes
1 answer

How to concatenate hdfs files and store in output file?

You can use a combination of cat and put command. Something ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
0 votes
1 answer

Difference between hive.exec.compress.output=true; and mapreduce.output.fileoutputformat.compress=true;

Hey there! The definition of these two properties ...READ MORE

answered Dec 28, 2018 in Big Data Hadoop by Omkar
• 69,220 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP