Spark error throws stack overflow when union a lot

0 votes

Getting this error throws stackoverflow when union a lot of RDD. When I use "++" to combine a lot of RDDs, I got error stack over flow error.

This the the way I generated:

val collection = (for (
  path <- files
) yield sc.textFile(path)).reduce(_ union _)

Can anyone say how to resolve this?

Jul 31, 2019 in Apache Spark by Sunny
4,966 views

1 answer to this question.

0 votes

Hey,

Use SparkContext.union(...) instead to union many RDDs at once

You don't want to do it one at a time like that since RDD.union() creates a new step in the lineage (an extra set of stack frames on any computation) for each RDD, whereas SparkContext.union() makes it all at once. This will ensure not getting a stack overflow error.

Since RDD.union() creates a new step in the lineage (an extra set of stack frames on any computation) for each RDD, whereas SparkContext.union() makes it all at once. 

answered Jul 31, 2019 by Gitika
• 65,770 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer
0 votes
2 answers

Error : split value is not a member of org.apache.spark.sql.Row

var d=rdd2col.rdd.map(x=>x.split(",")) or val names=rd ...READ MORE

answered Aug 5, 2020 in Apache Spark by Ramkumar Ramasamy.
11,928 views
0 votes
1 answer

Error : split value is not a member of org.apache.spark.sql.Row

spark.read.csv is used when loading into a ...READ MORE

answered Jul 22, 2019 in Apache Spark by Firoz
3,085 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,028 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,535 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,830 views
+1 vote
1 answer

Error: value textfile is not a member of org.apache.spark.SparkContext

Hi, Regarding this error, you just need to change ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,770 points
4,358 views
0 votes
3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE

answered Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
92,729 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP