Hadoop Hive partitioning

0 votes
When and Why do we use Hive Partitioning?
Dec 14, 2018 in Big Data Hadoop by digger
• 26,740 points
1,128 views

1 answer to this question.

0 votes

Partitioning:

Hive has been one of the preferred tools for performing queries on large datasets, especially when the full table scan is done on the datasets.

In the case of tables which are not partitioned, all the files in a table’s data directory are read and then filters are applied to it as a subsequent phase. This becomes a slow and expensive affair especially in cases of large tables.

Without partitioning, Hive reads all the data in the directory and applies the query filters on it. This is slow and expensive since all data has to be read.

Very often users need to filter the data on specific column values. To apply the partitioning in hive, users need to understand the domain of the data on which they are doing the analysis.

With this knowledge, identification of the frequently queried or accessed columns becomes easy and then partitioning feature of Hive can be applied on the selected columns.

Owing to the fact that Partitions are horizontal slices of data, larger sets of data can be separated into more manageable chunks.

When to use hive partitioning:

When any user wants data contained within a table to be split across multiple sections in hive table, use of partition is suggested.

The entries for the various columns of the dataset are segregated and stored in their respective partition. When we write the query to fetch the values from the table, only the required partitions of the table are queried, which reduces the time taken by the query to yield the result.

answered Dec 14, 2018 by Omkar
• 69,220 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Setting Hive/Hadoop property using Hive Query

You can set Hadoop & Hive conf ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,976 views
0 votes
1 answer

Installing Hive & Hadoop in VM

For Hadoop installation, I would recommend you ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 13,490 points
4,384 views
0 votes
1 answer

What are some of the famous visualization tools which can be integrated with Hadoop & Hive?

I have personally used two visualization tools ...READ MORE

answered May 1, 2018 in Big Data Hadoop by coldcode
• 2,090 points
1,986 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,029 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,536 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,832 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,612 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,519 views
0 votes
1 answer

Hadoop Hive: How to split string in Hive?

You can use the split function along ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
12,016 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP