Trending questions in Apache Spark

0 votes
1 answer

Invalid syntax in spark

There's a problem with your syntax. There ...READ MORE

Jan 31, 2019 in Apache Spark by Omkar
• 69,220 points
2,130 views
0 votes
1 answer

Changing port for Block Managers

By default, the port of which the ...READ MORE

Mar 10, 2019 in Apache Spark by Siri
494 views
0 votes
1 answer

Why is Spark map output compressed?

Spark thinks that it is a good ...READ MORE

Feb 24, 2019 in Apache Spark by Wasim
1,083 views
0 votes
1 answer

Companion objects in Scala

When a singleton object is named the ...READ MORE

Feb 24, 2019 in Apache Spark by Uma
1,069 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

Feb 13, 2019 in Apache Spark by Omkar
• 69,220 points
1,343 views
0 votes
1 answer

Unresolved dependency issue on sbt package command

Check if you are able to access ...READ MORE

Jan 3, 2019 in Apache Spark by Omkar
• 69,220 points
2,970 views
0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

Aug 17, 2018 in Apache Spark by zombie
• 3,790 points
9,676 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

Apr 19, 2018 in Apache Spark by Ashish
• 2,650 points
13,767 views
0 votes
1 answer

Error using double map.

You have forgotten to mention the case ...READ MORE

Feb 11, 2019 in Apache Spark by Omkar
• 69,220 points
733 views
+1 vote
1 answer

Spark interview

Preparing for an interview? We have something ...READ MORE

Feb 7, 2019 in Apache Spark by Edureka
• 2,960 points
818 views
0 votes
1 answer

Error while using Spark SQL filter API

You have to use "===" instead of ...READ MORE

Feb 4, 2019 in Apache Spark by Omkar
• 69,220 points
804 views
0 votes
1 answer

Query regarding a spark split logic

First, import the data in Spark and ...READ MORE

Feb 9, 2019 in Apache Spark by Omkar
• 69,220 points
521 views
0 votes
1 answer

Languages supported by Apache Spark?

Apache Spark supports the following four languages:  Scala, ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,169 views
–1 vote
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

Jan 3, 2019 in Apache Spark by Omkar
• 69,220 points
1,760 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
3,457 views
–1 vote
1 answer

Deciding number of spark context objects

How many spark context objects you should ...READ MORE

Jan 16, 2019 in Apache Spark by Omkar
• 69,220 points
775 views
0 votes
1 answer

Spark and Scale Auxiliary constructor doubt

println("Slayer") is an anonymous block and gets ...READ MORE

Jan 8, 2019 in Apache Spark by Omkar
• 69,220 points
739 views
0 votes
1 answer

Is there an API for implementing graphs in Spark?

GraphX is the Spark API for graphs and ...READ MORE

Jan 5, 2019 in Apache Spark by Frankie
• 9,830 points
741 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,694 views
0 votes
1 answer

How to open/stream .zip files through Spark?

You can try and check this below ...READ MORE

Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
2,489 views
0 votes
1 answer

Filter, Option or FlatMap in spark

If, for option 2, you mean have ...READ MORE

Nov 9, 2018 in Apache Spark by Frankie
• 9,830 points
2,752 views
+1 vote
2 answers

Apache Spark vs Apache Spark 2

Spark 2 doesn't differ much architecture-wise from ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,350 points
9,182 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,459 views
0 votes
1 answer

Is 'sparkline' a method?

I suggest you to check 2 things That jquery.sparkline.js is actually ...READ MORE

Nov 9, 2018 in Apache Spark by Frankie
• 9,830 points
1,275 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

Sep 19, 2018 in Apache Spark by zombie
• 3,790 points
3,118 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,987 views
0 votes
1 answer

What are the levels of parallelism in spark streaming ?

> In order to reduce the processing ...READ MORE

Jul 27, 2018 in Apache Spark by zombie
• 3,790 points
4,887 views
0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
6,259 views
0 votes
1 answer

Difference between sparkContext, JavaSparkContext, SQLContext, & SparkSession?

Yes, there is a difference between the ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points
5,323 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,554 views
0 votes
1 answer

Internal work of Spark

Spark revolves around the concept of a ...READ MORE

Oct 11, 2018 in Apache Spark by nitinrawat895
• 11,380 points
921 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points
5,939 views
0 votes
1 answer

Persistence Levels in Spark

Spark has various persistence levels to store ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,350 points
6,034 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points
7,860 views
0 votes
1 answer

In what kind of use cases has Spark outperformed Hadoop in processing?

I can list some but there can ...READ MORE

Sep 19, 2018 in Apache Spark by zombie
• 3,790 points
1,130 views
0 votes
1 answer

What happens to RDD when one of the nodes goes down?

Whenever a node goes down, Spark knows ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,814 views
0 votes
1 answer

How to stop INFO messages displaying on Spark console?

Just do the following: Edit your conf/log4j.properties file ...READ MORE

Aug 21, 2018 in Apache Spark by nitinrawat895
• 11,380 points
2,353 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,390 points
7,159 views
0 votes
1 answer

Does Spark provide the storage layer too?

No, it doesn’t provide storage layer but ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,533 views
0 votes
2 answers

Which cluster type should I choose for Spark?

Spark is agnostic to the underlying cluster ...READ MORE

Aug 21, 2018 in Apache Spark by zombie
• 3,790 points
2,023 views
0 votes
1 answer

Functions of Spark SQL?

Spark SQL is capable of: Loading data from ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,508 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

Jun 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,061 views
0 votes
1 answer

What do we mean by an RDD in Spark?

The full form of RDD is a ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,047 views
+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,350 points
2,319 views
0 votes
1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,598 views
+1 vote
3 answers

Which cluster type should I choose for Spark?

According to me, start with a standalone ...READ MORE

Jun 27, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,570 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,350 points
5,558 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

Jul 26, 2018 in Apache Spark by zombie
• 3,790 points
1,520 views
0 votes
1 answer

What makes Spark faster than MapReduce?

Let's first look at mapper side differences Map ...READ MORE

Jul 27, 2018 in Apache Spark by Neha
• 6,300 points
1,488 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,954 views