Read multiple xml files in Spark

Spark is used for streaming. Suppose there are 500 xml files. How to read 500 xml files in spark?

Jul 25, 2019 in Apache Spark by Zeba
• 5,831 views

1 answer to this question.

You can do this using globbing. See the Spark dataframeReader "load" method. Load can take a single path string, a sequence of paths, or no argument for data sources that don't have paths (i.e. not HDFS or S3 or other file systems).

val df = sqlContext.read.format("com.databricks.spark.xml")

.option("inferschema","true")

.option("rowTag", "address") //the root node of your xml to be treated as row

.load("/path/to/files/*.xml")

load can take a long string with comma separated paths

.load("/path/to/files/File1.xml, /path/to/files/File2.xml")

answered Jul 25, 2019 by Jack

Related Questions In Apache Spark

0 votes

1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 8,210 views

0 votes

1 answer

Not able to preserve shuffle files in Spark

You lose the files because by default, ...READ MORE

answered Feb 24, 2019 in Apache Spark by Rana
• 1,676 views

0 votes

1 answer

Spark: Read from Hive, store in HDFS

Below is an example of reading data ...READ MORE

answered Jul 26, 2019 in Apache Spark by Lohit
• 2,943 views

+1 vote

1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,770 points • 5,177 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 5,905 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 11,383 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 2,828 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 110,931 views

+5 votes

11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21, 2019 in Apache Spark by anonymous
• 73,003 views

+1 vote

1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points • 8,691 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP