To generate the output file, you can use the method saveAsTextFile(<hdfs_path>).
Refer to the below example for your reference,
Create project skeleton -
Please follow correct folder structure à and do sbt package to build or create the jar file required for spark-submit
Project folder à { [ src à mainà scala à source code.scala ] | [ build.sbt ] }
From web console follow below commands to create project structure and add source code and build file
$ mkdir wordpro
$ cd wordpro
vi build.sbt ==> add build file
==========================================================
build.sbt
name := "WordcountFirstapp"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
==========================================================
$ mkdir src
$ cd src
$ mkdir main
$ cd main
$ mkdir scala
$ cd scala
$ vi wordpro.scala
======================================================================
Add the code and save it :
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
object WordCount {
def main(args: Array[String]) = {
val conf = new SparkConf()
.setAppName("WordCount")
val sc = new SparkContext(conf)
val test = sc.textFile("hdfs:///user/edureka_361253/wordsam.txt")
test.flatMap( line => line.split(" "))
.map( word => (word, 1) )
.reduceByKey(_ + _ )
.saveAsTextFile("hdfs:///user/edureka_361253/sampleOp")
sc.stop
}
}
================================================================
Now build the project to create jar file - sbt package
Go to terminal à cd to project folder à do sbt package
After build, project folder and target folder is created.
Once build is finished - use spark submit command
syntax
spark-submit --class <class/object name> --deploy-mode <xyz> --master <abc > <complete jar path>
Use below command
spark2-submit --class WordCount --deploy-mode client --master yarn /mnt/home/edureka_361253/wordpro/target/scala-2.10/wordcountfirstapp_2.10-1.0.jar
Once executed, check the final output folder where we saved the output.
Note: Now if you try to run the same, it will through error as already output folder has been created.Changing the name of out folder and rebuild would be needed.
Hope this helps.