Spark error throws stack overflow when union a lot

Getting this error throws stackoverflow when union a lot of RDD. When I use "++" to combine a lot of RDDs, I got error stack over flow error.

This the the way I generated:

val collection = (for (
  path <- files
) yield sc.textFile(path)).reduce(_ union _)

Can anyone say how to resolve this?

Jul 31, 2019 in Apache Spark by Sunny
• 5,325 views

1 answer to this question.

Hey,

Use SparkContext.union(...) instead to union many RDDs at once

You don't want to do it one at a time like that since RDD.union() creates a new step in the lineage (an extra set of stack frames on any computation) for each RDD, whereas SparkContext.union() makes it all at once. This will ensure not getting a stack overflow error.

Since RDD.union() creates a new step in the lineage (an extra set of stack frames on any computation) for each RDD, whereas SparkContext.union() makes it all at once.

answered Jul 31, 2019 by Gitika
• 65,770 points

Related Questions In Apache Spark

0 votes

1 answer

Spark Machine Learning pipeline works fine in Spark 1.6, but it gives error when executed on Spark 2.x?

You need to change the following: val pipeline ...READ MORE

answered May 31, 2018 in Apache Spark by Shubham
• 13,490 points • 1,144 views

0 votes

1 answer

Spark Error: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.

There seems to be a problem with ...READ MORE

answered May 24, 2019 in Apache Spark by Jishan
• 11,060 views

0 votes

2 answers

Error : split value is not a member of org.apache.spark.sql.Row

var d=rdd2col.rdd.map(x=>x.split(",")) or val names=rd ...READ MORE

answered Aug 5, 2020 in Apache Spark by Ramkumar Ramasamy.
• 12,400 views

0 votes

1 answer

Error : split value is not a member of org.apache.spark.sql.Row

spark.read.csv is used when loading into a ...READ MORE

answered Jul 22, 2019 in Apache Spark by Firoz
• 3,346 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 5,905 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 11,383 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 2,828 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 110,937 views

+1 vote

1 answer

Error: value textfile is not a member of org.apache.spark.SparkContext

Hi, Regarding this error, you just need to change ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,770 points • 4,632 views

0 votes

3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE

answered Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points • 93,129 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP