Spark Vs Hive LLAP Question

0 votes
I have done lot of research on Hive and Spark SQL. I still don't understand why spark SQL is needed to build applications where hive does everything using execution engines like Tez, Spark, and LLAP. Note: LLAP is much more faster than any other execution engines.

Spark SQL connects hive using Hive Context and does not support any transactions.

Hive does all the transactions over Spark SQL.
Jul 16, 2019 in Big Data Hadoop by Vishnu
3,672 views

1 answer to this question.

0 votes

While Apache Hive and Spark SQL perform the same action, retrieving data, each does the task in a different way. However, Hive is planned as an interface or convenience for querying data stored in HDFS. Though, MySQL is planned for online operations requiring many reads and writes.

Apache Hive:
Apache Hive is built on top of Hadoop. Moreover, It is an open source data warehouse system. Also, helps for analyzing and querying large datasets stored in Hadoop files. First, we have to write complex Map-Reduce jobs. But, using Hive, we just need to submit merely SQL queries. Users who are comfortable with SQL, Hive is mainly targeted towards them.

Spark SQL:
In Spark, we use Spark SQL for structured data processing. Moreover, we get more information on the structure of data by using SQL. Also, gives information on computations performed. One can achieve extra optimization in Apache Spark, with this extra information. Although, Interaction with Spark SQL is possible in several ways. Such as DataFrame and the Dataset API.​

Usage
Apache Hive:

  • Schema flexibility and evolution.
  • Also, can portion and bucket, tables in Apache Hive.
  • As JDBC/ODBC drivers are available in Hive, we can use it.​

Spark SQL:

  • Basically, it performs SQL queries.
  • Through Spark SQL, it is possible to read data from existing Hive installation.
  • We get the result as Dataset/DataFrame if we run Spark SQL with another programming language.​

Limitations
Apache Hive:

  • It does not offer real-time queries and row level updates.
  • Also provides acceptable latency for interactive data browsing.
  • Hive does not support online transaction processing.
  • In Apache Hive, the latency for queries is generally very high.​

Spark SQL:

  • It does not support union type
  • Although, no provision of error for oversize of varchar type
  • It does not support transactional table
  • However, no support for Char type
  • It does not support time-stamp in Avro table.​

Conclusion​
Hence, we can not say SparkSQL is not a replacement for Hive neither is the other way. As a result, we have seen that SparkSQL is more spark API and developer friendly. Also, SQL makes programming in spark easier. While, Hive’s ability to switch execution engines, is efficient to query huge data sets. Although, we can just say it’s usage is totally depends on our goals. Apart from it, we have discussed we have discussed Usage as well as limitations above.

answered Jul 16, 2019 by Karan

Related Questions In Big Data Hadoop

0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8, 2019 in Big Data Hadoop by Vijay Dixon
• 190 points
12,754 views
0 votes
1 answer

Bucketing vs Partitioning in HIve

Partition divides large amount of data into ...READ MORE

answered Jul 9, 2018 in Big Data Hadoop by Data_Nerd
• 2,390 points
27,053 views
0 votes
2 answers

Which of these will vanish: Flink vs Spark?

At first glance, Flink and Spark would ...READ MORE

answered Aug 13, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
1,270 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
8,511 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,015 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,528 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,739 views
–1 vote
1 answer

How we can run spark SQL over hive tables in our cluster?

Open spark-shell. scala> import org.apache.spark.sql.hive._ scala> val hc = ...READ MORE

answered Dec 26, 2018 in Big Data Hadoop by Omkar
• 69,220 points
1,521 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP