Hadoop How to keep duplicates in Hive using collect set

0 votes

I want to keep the duplicates in hive when I use collect_set(). Example:

hash_id | num_of_cats
=====================
abcdef            5
abcdef            4
abcdef            3
fndflka            1
fndflka            2
fndflka            3
djsb33            7
djsb33            7
djsb33            7

should return:

hash_agg | cats_aggregate
===========================
abcdef   Array<int>(5,4,3)
fndflka   Array<int>(1,2,3)
djsb33   Array<int>(7,7,7)
Nov 2, 2018 in Big Data Hadoop by slayer
• 29,370 points
2,518 views

1 answer to this question.

0 votes
SELECT
    hash_id, COLLECT_LIST(num_of_cats) AS aggr_set
FROM
    <tablename>
WHERE
    <condition>
GROUP BY
    hash_id
;
answered Nov 2, 2018 by Omkar
• 69,220 points

Related Questions In Big Data Hadoop

+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 58,020 points
1,680 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 69,220 points
9,789 views
0 votes
1 answer

Hadoop Hive Hbase: How to insert data into Hbase using Hive (JSON file)?

You can use the get_json_object function to parse the ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,929 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,028 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,536 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,831 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,611 views
0 votes
1 answer

Hadoop Hive: How to split string in Hive?

You can use the split function along ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
12,015 views
0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 69,220 points
8,947 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP