Why is Hive called as Data Warehouse

0 votes
Hive is called as Data Warehouse. But why is it so. It is basically used to write query on top of HDFS. So my storage is still the HDFS. I know we can create tables, run DDLs on Hive, is that the reason it is called as Data Warehouse?
Jul 26, 2019 in Big Data Hadoop by Will

1 answer to this question.

0 votes

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarise Big Data and makes querying and analyzing easy.

A little history about Apache Hive will help you understand why it came into existence. When Facebook started gathering data and ingesting it into Hadoop, the data was coming in at the rate of tens of GBs per day back in 2006. Then, in 2007, it grew to 1TB/day and within a few years increased to around 15TBs/day. Initially, Python scripts were written to ingest the data in Oracle databases, but with the increasing data rate and also the diversity in the sources/types of incoming data, this was becoming difficult. The Oracle instances were getting filled pretty fast and it was time to develop a new kind of system that handled large amounts of data. It was Facebook that first built Hive, so that most people who had SQL skills could use the new system with minimal changes, compared to what was required with other RDBMs.

The main features of Hive are:

  • It stores schema in a database and processes data into HDFS which is why its named as data warehouse tool.
  • It is designed for OLAP.
  • It provides an SQL-type language for querying, called HiveQL or HQL.
  • It is familiar, fast, scalable and extensible.
answered Jul 26, 2019 by Joshua

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the difference between a Big Data Warehouse and a traditional Data Warehouse?

Hadoop is similar in architecture to MPP data ...READ MORE

answered Aug 10, 2018 in Big Data Hadoop by Frankie
• 9,830 points
0 votes
1 answer

What is the difference between a Big Data Warehouse and a traditional Data Warehouse

Hadoop is similar in architecture to MPP data ...READ MORE

answered Aug 10, 2018 in Big Data Hadoop by Frankie
• 9,830 points
0 votes
1 answer
–1 vote
1 answer

Why is Hive not good for OLTP?

Apache Hive is mainly used for batch processing i.e. ...READ MORE

answered Jan 7, 2019 in Big Data Hadoop by Omkar
• 69,220 points
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
+2 votes
5 answers

How to transpose/pivot data in hive?

Below is also a way for Pivot SELECT ...READ MORE

answered Oct 12, 2018 in Big Data Hadoop by Rahul
+3 votes
1 answer

Getting Connection Error while loading data into table using cloudera hive

Hey Nafeesa, Itseems that Hive is not able ...READ MORE

answered Oct 4, 2018 in Big Data Hadoop by Vardhan
• 13,150 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP