How to select partition in Hive

0 votes

Could you please explain me How to select a column for a partition?

Feb 19, 2019 in Big Data Hadoop by Karan
10,318 views

1 answer to this question.

0 votes

Follow these steps:

A. Create Database

------------------

create database retail123;

B. Select Database

------------------

use retail123;

C. Create table for storing transactional records

-------------------------------------------------

create table txnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE,

category STRING, product STRING, city STRING, state STRING, spendby STRING)

row format delimited

fields terminated by ','

stored as textfile;

D. Load the data into the table

-------------------------------

LOAD DATA LOCAL INPATH 'txns1.txt' OVERWRITE INTO TABLE txnrecords;

E. Describing metadata or schema of the table

---------------------------------------------

describe txnrecords;

F. Counting no of records

-------------------------

select count(*) from txnrecords;

G. Counting total spending by category of products

--------------------------------------------------

select category, sum(amount) from txnrecords group by category;

H. 10 customers

--------------------

select custno, sum(amount) from txnrecords group by custno limit 10;

I. Create partitioned table

---------------------------

create table txnrecsByCat(txnno INT, txndate STRING, custno INT, amount DOUBLE,

product STRING, city STRING, state STRING, spendby STRING)

partitioned by (category STRING)

clustered by (state) INTO 10 buckets

row format delimited

fields terminated by ','

stored as textfile;

J. Configure Hive to allow partitions

-------------------------------------

However, a query across all partitions could trigger an enormous MapReduce job if the table data and number of partitions are large. A highly suggested safety measure is putting Hive into strict mode, which prohibits queries of partitioned tables without a WHERE clause that filters on partitions. You can set the mode to nonstrict, as in the following session:

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.exec.dynamic.partition=true;

set hive.enforce.bucketing=true;

K. Load data into partition table

----------------------------------

from txnrecords txn INSERT OVERWRITE TABLE txnrecsByCat PARTITION(category)

select txn.txnno, txn.txndate,txn.custno, txn.amount,txn.product,txn.city,txn.state,

txn.spendby, txn.category DISTRIBUTE BY category;

select * from  txnrecsByCat;
answered Feb 19, 2019 by Omkar
• 69,220 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to check if a particular partition exists in Hive?

Hey, Basically, with the following query, we can ...READ MORE

answered Jun 26, 2019 in Big Data Hadoop by Gitika
• 65,770 points
9,760 views
0 votes
1 answer

How to select particular column In a table in hive?

Hive is a high-level language to analyze ...READ MORE

answered Jul 31, 2019 in Big Data Hadoop by Killian
10,251 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
2,216 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,027 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,534 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,828 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,611 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,517 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
8,519 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP