Copy file from local to hdfs from the spark job in yarn mode

Question

How can I copy the file from local to hdfs from the spark job in yarn mode? Means, hdfs dfs -put command equivalent for spark. Because I have a file in local I need to preprocess it the need to put the file in hdfs and then apply the transformation logic.

score 0 · Answer 1 · Jul 16, 2019

Please refer to the below code:

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.FileSystem

import org.apache.hadoop.fs.Path

val hadoopConf = new Configuration()

val hdfs = FileSystem.get(hadoopConf)


val srcPath = new Path("/home/edureka/Documents/data")

val destPath = new Path("hdfs:///tranferrred_data")


hdfs.copyFromLocalFile(srcPath, destPath)

Any Spark Job that you are executing, you might want to include the above code snippet according to your requirement use spark-submit to deploy your code in the cluster. Every time you deploy your spark application, the data in your local gets transferred to the hdfs and then you can perform your transformations accordingly.

You might use the below dependencies:

libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"

libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"