How do you load this multiline data in spark as a single record

0 votes
tsv File data sample below.

Question:
Attempting to load a tsv file, sample data indicated below into spark. The issue is that the data is split into two lines within the CallerAddress field, which is enclosed in double quotes. Notice the initial double quote, as in "101 ... and the next line with the ending double quote as in, STE 3305" .  How do you load this in spark as a single record?

---------------------------------------------

CID  CallLocation CallerLocation CallerAddress CallerCity CallerState CallerZip CallDateUTC  Status CallDuration  DateKey

211258030  GA, ATLANTA ATLANTA, GA "101  MARIETTA ST NW

STE 3305" ATLANTA GA 30303 2020-11-06 14:49:19 Answered 180 20201106
Nov 21, 2020 in Apache Spark by Ruben
• 180 points
2,212 views

1 answer to this question.

0 votes
Hi@Ruben,

I think you can add an escape option to get this working properly. Add the time of reading the file you can add this option.
answered Nov 23, 2020 by MD
• 95,460 points

Related Questions In Apache Spark

+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,770 points
5,025 views
+1 vote
1 answer
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
6,112 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,459 views
0 votes
1 answer
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,400 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,790 points
1,520 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
3,456 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

answered May 14, 2020 in Apache Spark by MD
• 95,460 points
5,061 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP