How to find in incorrect file records in hive

0 votes
Suppose 1000 records are present in one Json file and saving all records in HIVE Table. In that records one record is incorrect, how to find that error record?
Jul 25, 2019 in Big Data Hadoop by Robby
2,660 views

1 answer to this question.

0 votes

A value with a wrong datatype causes the generated MR job to crash. ignore.malformed.json does not seem to fix it.

Here is the sample data, mixed2.json

{"f1":"hello", "f2":7}

{"f1":"goodbye", "f2":8}

{"f1":"this", "f2":9}

{"f1":"that", "f2":"ten"}

Here is the sample Hive script, mixed2.hive. The first query (on f1) works. The other queries (on * and f2) crash. It would be nice to see NULL or something else. The get_json_object() function actually returns the bad string, so it prints "ten"!

drop table mixed2;

create table mixed2 (f1 string, f2 int)

row format serde 'org.openx.data.jsonserde.JsonSerDe'

with serdeproperties ("ignore.malformed.json" = "true")

stored as textfile;


load data inpath '/tmp/mixed2.json' overwrite into table mixed2;


select f1 from mixed2;

select f2 from mixed2;

select * from mixed2;

You should declare then the column as "String" instead of int. The SerDe will be able to read the numbers into strings, then you can CAST them in hive.

Abnormalities upto some extent can be taken care of but if the schema entirely changes then we can't load data at all.

answered Jul 25, 2019 by Ritu

Related Questions In Big Data Hadoop

0 votes
1 answer

How to find the default database in Hive?

Yes, you can find out which database ...READ MORE

answered May 20, 2019 in Big Data Hadoop by Shiro
4,720 views
0 votes
1 answer

How to get the column name printed in a file along with the output in Hive?

Hi @Neethu, Regarding your query, I would suggest ...READ MORE

answered Jul 2, 2020 in Big Data Hadoop by Gitika
• 65,770 points
1,019 views
0 votes
1 answer

How to find the number of blocks for a file in Hadoop?

Hi@akhtar, You can use Hadoop file system command to ...READ MORE

answered Oct 13, 2020 in Big Data Hadoop by MD
• 95,460 points
2,275 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
2,220 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,034 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,540 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,855 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,616 views
0 votes
1 answer

How to create a Hive table from sequence file stored in HDFS?

There are two SerDe for SequenceFile as ...READ MORE

answered Dec 18, 2018 in Big Data Hadoop by Omkar
• 69,220 points
5,001 views
+1 vote
2 answers

How to find previous records from a data set in Pig??

Hi, You can use ToDate() and SubtractDuration() function ...READ MORE

answered Jan 23, 2020 in Big Data Hadoop by MD
• 95,460 points
1,428 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP