How to create smaller table from big table in HIVE

Question

I currently have a Hive table that has 1.5 billion rows. I would like to create a smaller table (using the same table schema) with about 1 million rows from the original table. Ideally, the new rows would be randomly sampled from the original table, but getting the top 1M or bottom 1M of the original table would be ok, too. How would I do this?

digger · Answer 1 · Sep 24, 2018

You could probably best use Hive's built-in sampling methods.

INSERT OVERWRITE TABLE my_table_sample 
SELECT * FROM my_table 
TABLESAMPLE (1m ROWS) t;

This syntax was introduced in Hive 0.11. If you are running an older version of Hive, you'll be confined to using the PERCENT syntax like so.

INSERT OVERWRITE TABLE my_table_sample 
SELECT * FROM my_table 
TABLESAMPLE (1 PERCENT) t;

answered Sep 24, 2018 by digger
• 26,740 points

How to create smaller table from big table in HIVE

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

How to create a Hive table from sequence file stored in HDFS?

How to create a parquet table in hive and store data in it from a hive table?

How to create a managed table in Hive?

How to save Spark dataframe as dynamic partitioned table in Hive?

Hadoop Mapreduce word count Program

hadoop fs -put command?

Hadoop dfs -ls command?

Is there a way to copy data from one one Hadoop distributed file system(HDFS) to another HDFS?

How to transpose/pivot data in hive?

How to execute python script in hadoop file system (hdfs)?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES