Which query to use for better performance join in SQL or using Dataset API

0 votes
I'm a bit curious, when i'm using data from Hbase and doing analysis using Spark , which one is faster? Spark SQL join or Dataframe Join ?
Apr 19, 2018 in Apache Spark by Ashish
• 2,650 points
1,817 views

1 answer to this question.

0 votes

DataFrames and SparkSQL performed almost about the same, although with analysis involving aggregation and sorting SparkSQL had a slight advantage.

Hope this helps

answered Apr 19, 2018 by kurt_cobain
• 9,350 points

Related Questions In Apache Spark

0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

answered Jun 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
928 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,686 views
0 votes
1 answer

How to get SQL configuration in Spark using Python?

You can get the configuration details through ...READ MORE

answered Mar 18, 2019 in Apache Spark by John
1,249 views
0 votes
1 answer

How to merge data frames using joins?

You can use the merge function with ...READ MORE

answered Apr 12, 2018 in Data Analytics by kappa3010
• 2,090 points
936 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,650 points
13,749 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21, 2019 in Apache Spark by anonymous
72,345 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,390 points
7,141 views
0 votes
1 answer

reduceByKey or reduceByKeyLocally , which should be preferred ?

Yes, they both merge the values using ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points
2,476 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points
7,841 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP