Spark Vs Hive LLAP Question

Question

I have done lot of research on Hive and Spark SQL. I still don't understand why spark SQL is needed to build applications where hive does everything using execution engines like Tez, Spark, and LLAP. Note: LLAP is much more faster than any other execution engines.

Spark SQL connects hive using Hive Context and does not support any transactions.

Hive does all the transactions over Spark SQL.

Karan · Answer

While Apache Hive and Spark SQL perform the same action, retrieving data, each does the task in a different way. However, Hive is planned as an interface or convenience for querying data stored in HDFS. Though, MySQL is planned for online operations requiring many reads and writes.Apache Hive:Apache Hive is built on top of Hadoop. Moreover, It is an open source data warehouse system. Also, helps for analyzing and querying large datasets stored in Hadoop files. First, we have to write complex Map-Reduce jobs. But, using Hive, we just need to submit merely SQL queries. Users who are comfortable with SQL, Hive is mainly targeted towards them.Spark SQL:In Spark, we use Spark SQL for structured data processing. Moreover, we get more information on the structure of data by using SQL. Also, gives information on computations performed. One can achieve extra optimization in Apache Spark, with this extra information. Although, Interaction with Spark SQL is possible in several ways. Such as DataFrame and the Dataset API.&#8203;UsageApache Hive:Schema flexibility and evolution.Also, can portion and bucket, tables in Apache Hive.As JDBC/ODBC drivers are available in Hive, we can use it.&#8203;Spark SQL:Basically, it performs SQL queries.Through Spark SQL, it is possible to read data from existing Hive installation.We get the result as Dataset/DataFrame if we run Spark SQL with another programming language.&#8203;LimitationsApache Hive:It does not offer real-time queries and row level updates.Also provides acceptable latency for interactive data browsing.Hive does not support online transaction processing.In Apache Hive, the latency for queries is generally very high.&#8203;Spark SQL:It does not support union typeAlthough, no provision of error for oversize of varchar typeIt does not support transactional tableHowever, no support for Char typeIt does not support time-stamp in Avro table.&#8203;Conclusion&#8203;Hence, we can not say SparkSQL is not a replacement for Hive neither is the other way. As a result, we have seen that SparkSQL is more spark API and developer friendly. Also, SQL makes programming in spark easier. While, Hive&#8217;s ability to switch execution engines, is efficient to query huge data sets. Although, we can just say it&#8217;s usage is totally depends on our goals. Apart from it, we have discussed we have discussed Usage as well as limitations above.

Spark Vs Hive LLAP Question

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

How to connect Spark to a remote Hive server?

Bucketing vs Partitioning in HIve

Which of these will vanish: Flink vs Spark?

How to save Spark dataframe as dynamic partitioned table in Hive?

How do I get number of columns in each line from a delimited file??

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

How we can run spark SQL over hive tables in our cluster?

How can we retrieve/get complete HQL hive query from hive,spark and tez?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES