Spark2-submit does not generate output file

0 votes

Hi.

I have attached is my scala code, sbt package code and spark2-submit process. It does not generate an output file for me. Where can I generate output file if I run code from sparkshell?

/* code work fine*/
import org.apache.spark.sql.types._
import org.apache.spark.storage.StorageLevel
import scala.io.Source
import scala.collection.mutable.HashMap
import java.io.File
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import scala.collection.mutable.ListBuffer
import org.apache.spark.util.IntParam
import org.apache.spark.util.StatCounter
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.spark.rdd._
import org.apache.spark.sql.{DataFrame, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}

object module5_sol32 {
        def main(args: Array[String])
 {
val conf = new SparkConf().setAppName("mod5sol")
val sc:SparkContext = new SparkContext(conf)
val sqlContext: SQLContext = new SQLContext(sc)

/*val file1 = sc.textFile("AppleStore.csv")
file1.flatMap(line => line.split(",")).map(word => (word, 1)).reduceByKey(_ + _).saveAsTextFile("Ans1")
*/

val df: DataFrame = sqlContext.read.format("csv").option("header", "true").load("AppleStore.csv")
df.registerTempTable("Apple5")
var dfsize=sqlContext.sql("select size_bytes Size, (size_bytes/1024) In_MB, ((size_bytes/1024)/1024) In_GB from Apple5").write.format("csv").save("Ans3Apple
Store.csv")}                  
}


/* sbt package */
[edureka_400169@ip-20-0-41-190 wordpro]$ cat build.sbt
name := "sparkLearning"

version := "1.0"

scalaVersion := "2.11.8"

val sparkVersion = "1.6.1"

libraryDependencies ++= Seq(

"org.apache.spark" % "spark-core_2.10" % sparkVersion,

"org.apache.spark" % "spark-sql_2.10" % sparkVersion

)

--------------------spar2 -submit--------------
spark2-submit --class module5_sol32 --deploy-mode client --master yarn /mnt/home/edureka_400169/wordpro/target/s
cala-2.11/sparklearning_2.11-1.0.jar
18/12/20 22:54:45 INFO spark.SparkContext: Running Spark version 2.1.0.cloudera2
18/12/20 22:54:46 INFO spark.SecurityManager: Changing view acls to: edureka_400169
18/12/20 22:54:46 INFO spark.SecurityManager: Changing modify acls to: edureka_400169
18/12/20 22:54:46 INFO spark.SecurityManager: Changing view acls groups to: 
18/12/20 22:54:46 INFO spark.SecurityManager: Changing modify acls groups to: 
18/12/20 22:54:46 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(edureka_400169);
 groups with view permissions: Set(); users  with modify permissions: Set(edureka_400169); groups with modify permissions: Set()
18/12/20 22:54:46 INFO util.Utils: Successfully started service 'sparkDriver' on port 40117.
18/12/20 22:54:46 INFO spark.SparkEnv: Registering MapOutputTracker
18/12/20 22:54:46 INFO spark.SparkEnv: Registering BlockManagerMaster
18/12/20 22:54:46 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/12/20 22:54:46 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/12/20 22:54:46 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-924c371b-e398-423f-87b8-ae4f2dbca78b
18/12/20 22:54:46 INFO memory.MemoryStore: MemoryStore started with capacity 100.8 MB
18/12/20 22:54:46 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/12/20 22:54:46 INFO spark.SparkContext: Added JAR file:/mnt/home/edureka_400169/wordpro/target/scala-2.11/sparklearning_2.11-1.0.jar at spark://20.0.41.1
90:40117/jars/sparklearning_2.11-1.0.jar with timestamp 1545346486544
18/12/20 22:54:47 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
18/12/20 22:54:47 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per containe
r)
18/12/20 22:54:47 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
18/12/20 22:54:47 INFO yarn.Client: Setting up container launch context for our AM
18/12/20 22:54:47 INFO yarn.Client: Setting up the launch environment for our AM container
18/12/20 22:54:47 INFO yarn.Client: Preparing resources for our AM container
18/12/20 22:54:48 INFO yarn.Client: Uploading resource file:/tmp/spark-e36ee555-370a-4ee2-b140-ced8ee51e436/__spark_conf__3198462750408085652.zip -> hdfs://
nameservice1/user/edureka_400169/.sparkStaging/application_1528714825862_69872/__spark_conf__.zip
18/12/20 22:54:48 INFO spark.SecurityManager: Changing view acls to: edureka_400169
18/12/20 22:54:48 INFO spark.SecurityManager: Changing modify acls to: edureka_400169
18/12/20 22:54:48 INFO spark.SecurityManager: Changing view acls groups to: 
18/12/20 22:54:48 INFO spark.SecurityManager: Changing modify acls groups to: 
18/12/20 22:54:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(edureka_400169);
 groups with view permissions: Set(); users  with modify permissions: Set(edureka_400169); groups with modify permissions: Set()
 groups with view permissions: Set(); users  with modify permissions: Set(edureka_400169); groups with modify permissions: Set()
18/12/20 22:54:48 INFO yarn.Client: Submitting application application_1528714825862_69872 to ResourceManager
18/12/20 22:54:48 INFO impl.YarnClientImpl: Submitted application application_1528714825862_69872
18/12/20 22:54:48 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1528714825862_69872 and attemptId None
18/12/20 22:54:49 INFO yarn.Client: Application report for application_1528714825862_69872 (state: ACCEPTED)
18/12/20 22:54:49 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.default
         start time: 1545346391093
         final status: UNDEFINED
         tracking URL: http://ip-20-0-21-161.ec2.internal:8088/proxy/application_1528714825862_69872/
         user: edureka_400169
18/12/20 22:54:50 INFO yarn.Client: Application report for application_1528714825862_69872 (state: ACCEPTED)
18/12/20 22:54:50 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
18/12/20 22:54:50 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> 
ip-20-0-21-161.ec2.internal,ip-20-0-21-196.ec2.internal, PROXY_URI_BASES -> http://ip-20-0-21-161.ec2.internal:8088/proxy/application_1528714825862_69872,ht
tp://ip-20-0-21-196.ec2.internal:8088/proxy/application_1528714825862_69872), /proxy/application_1528714825862_69872
18/12/20 22:54:51 INFO yarn.Client: Application report for application_1528714825862_69872 (state: RUNNING)
18/12/20 22:54:51 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 20.0.31.210
         ApplicationMaster RPC port: 0
         queue: root.default
         start time: 1545346391093
         final status: UNDEFINED
         tracking URL: http://ip-20-0-21-161.ec2.internal:8088/proxy/application_1528714825862_69872/
         user: edureka_400169
18/12/20 22:54:51 INFO cluster.YarnClientSchedulerBackend: Application application_1528714825862_69872 has started running.
18/12/20 22:54:51 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38539.
18/12/20 22:54:51 INFO netty.NettyBlockTransferService: Server created on 20.0.41.190:38539
18/12/20 22:54:51 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/12/20 22:54:51 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 20.0.41.190, 38539, None)
18/12/20 22:54:51 INFO storage.BlockManagerMasterEndpoint: Registering block manager 20.0.41.190:38539 with 100.8 MB RAM, BlockManagerId(driver, 20.0.41.190
, 38539, None)
18/12/20 22:54:51 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 20.0.41.190, 38539, None)
18/12/20 22:54:51 INFO storage.BlockManager: external shuffle service port = 7337
18/12/20 22:54:51 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 20.0.41.190, 38539, None)
18/12/20 22:54:51 INFO util.log: Logging initialized @6779ms
18/12/20 22:54:51 INFO scheduler.EventLoggingListener: Logging events to hdfs://nameservice1/user/spark/applicationHistory/application_1528714825862_69872
18/12/20 22:54:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (20.0.31.210:55368) with ID 2
18/12/20 22:54:54 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-20-0-31-210.ec2.internal:33474 with 366.3 MB RAM, BlockManagerId(2, 
ip-20-0-31-210.ec2.internal, 33474, None)
18/12/20 22:54:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (20.0.31.210:55366) with ID 1
18/12/20 22:54:54 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-20-0-31-210.ec2.internal:34025 with 366.3 MB RAM, BlockManagerId(1, 
ip-20-0-31-210.ec2.internal, 34025, None)
18/12/20 22:54:54 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
18/12/20 22:54:54 INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to
 the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
18/12/20 22:54:54 INFO internal.SharedState: Warehouse path is '/user/hive/warehouse'
18/12/20 22:54:54 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.1.0 using file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/h
adoop/../hive/lib/commons-logging-1.1.3.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hive-exec-1.1.0-cdh5.11.1.jar:file
:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hive-exec.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/..
/hive/lib/hive-jdbc-1.1.0-cdh5.11.1-standalone.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hive-jdbc-1.1.0-cdh5.11.1.j
ar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hive-jdbc-standalone.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.
p0.4/lib/hadoop/../hive/lib/hive-jdbc.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hive-metastore-1.1.0-cdh5.11.1.jar:f
ile:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hive-metastore.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/h
adoop/../hive/lib/hive-serde-1.1.0-cdh5.11.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hive-serde.jar:file:/opt/clou
dera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hive-service-1.1.0-cdh5.11.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/
hadoop/../hive/lib/hive-service.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/libfb303-0.9.3.jar:file:/opt/cloudera/parc
els/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/libthrift-0.9.3.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/log
4j-1.2.16.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hbase-client.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11
.1.p0.4/lib/hadoop/../hive/lib/hbase-common.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hbase-hadoop-compat.jar:file:/
opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hbase-hadoop2-compat.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/
hadoop/../hive/lib/hbase-protocol.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/hbase-server.jar:file:/opt/cloudera/parc
els/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/htrace-core.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/ST4-4.0
.4.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/accumulo-core-1.6.0.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11
.1.p0.4/lib/hadoop/../hive/lib/accumulo-fate-1.6.0.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/accumulo-start-1.6.0.ja
r:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/accumulo-trace-1.6.0.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p
0.4/lib/hadoop/../hive/lib/activation-1.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/ant-1.9.1.jar:file:/opt/cloudera
/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/ant-launcher-1.9.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive
/lib/antlr-2.7.7.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/antlr-runtime-3.4.jar:file:/opt/cloudera/parcels/CDH-5.11
.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/apache-log4j-extras-1.2.17.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/asm-
3.2.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/asm-commons-3.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.
p0.4/lib/hadoop/../hive/lib/asm-tree-3.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/avro.jar:file:/opt/cloudera/parce
ls/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/bonecp-0.8.0.RELEASE.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib
/calcite-avatica-1.0.0-incubating.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/calcite-core-1.0.0-incubating.jar:file:/
opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/calcite-linq4j-1.0.0-incubating.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.
1.p0.4/lib/hadoop/../hive/lib/commons-beanutils-1.9.2.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/commons-beanutils-co
re-1.8.0.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/commons-cli-1.2.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.
11.1.p0.4/lib/hadoop/../hive/lib/commons-codec-1.4.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/commons-collections-3.2
.2.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/commons-compiler-2.7.6.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5
.11.1.p0.4/lib/hadoop/../hive/lib/commons-compress-1.4.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/commons-configura
tion-1.6.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/commons-dbcp-1.4.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5
.11.1.p0.4/lib/hadoop/../hive/lib/commons-digester-1.8.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/commons-el-1.0.jar:
file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/../hive/lib/commons-httpclient-3.0.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.
...............

...........2 to 3 pages is having this 

yarn-api.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/client/hadoop-yarn-client-2.6.0-cdh5.11.1.jar:file:/opt/cloudera/parcels/CDH-
5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/client/hadoop-yarn-client.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/client/hadoop-yarn-common
-2.6.0-cdh5.11.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/client/hadoop-yarn-common.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1
.cdh5.11.1.p0.4/lib/hadoop/client/hadoop-yarn-server-common-2.6.0-cdh5.11.1.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/client/had
oop-yarn-server-common.jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/client/htrace-core4-4.0.1-incubating.jar:file:/opt/cloudera/par
name := "sparkLearning"
18/12/20 22:54:55 INFO hive.metastore: Connected to metastore.
18/12/20 22:54:55 INFO metadata.Hive: Registering function userdate UnixtimeToDate
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.load(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
        at module5_sol32$.main(m5sol3.scala:31)
        at module5_sol32.main(m5sol3.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:744)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/12/20 22:54:55 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/20 22:54:55 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
18/12/20 22:54:55 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
18/12/20 22:54:55 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
18/12/20 22:54:55 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
18/12/20 22:54:55 INFO cluster.YarnClientSchedulerBackend: Stopped
18/12/20 22:54:55 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/12/20 22:54:55 INFO memory.MemoryStore: MemoryStore cleared
18/12/20 22:54:55 INFO storage.BlockManager: BlockManager stopped
18/12/20 22:54:55 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/12/20 22:54:55 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/12/20 22:54:55 INFO spark.SparkContext: Successfully stopped SparkContext
18/12/20 22:54:55 INFO util.ShutdownHookManager: Shutdown hook called
18/12/20 22:54:55 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e36ee555-370a-4ee2-b140-ced8ee51e436
[edureka_400169@ip-20-0-41-190 scala-2.11]$ 
 
Feb 24, 2019 in Apache Spark by Shri
4,860 views

1 answer to this question.

0 votes

To generate the output file, you can use the method saveAsTextFile(<hdfs_path>). 

Refer to the below example for your reference,

Create project skeleton - 

 Please follow correct folder structure à and do sbt package to build or create the jar file required for spark-submit

Project folder à  {  [   src à mainà scala à source code.scala ]     [ build.sbt ] }

From web console follow below commands to create project structure and add source code and build file

$ mkdir wordpro

$ cd wordpro

vi build.sbt   ==> add build file

==========================================================

build.sbt

name := "WordcountFirstapp"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"

==========================================================

$ mkdir src

$ cd src

$ mkdir main 

$ cd main 

$ mkdir scala

$ cd scala 

$ vi wordpro.scala

======================================================================

Add the code and save it : 

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.rdd.RDD.rddToPairRDDFunctions


object WordCount {

 def main(args: Array[String]) = {

  val conf = new SparkConf()

   .setAppName("WordCount")


  val sc = new SparkContext(conf)


  val test = sc.textFile("hdfs:///user/edureka_361253/wordsam.txt")


  test.flatMap( line => line.split(" "))

   .map( word => (word, 1) )

   .reduceByKey(_ + _ )

   .saveAsTextFile("hdfs:///user/edureka_361253/sampleOp")


  sc.stop

 }

}

================================================================

Now build the project to create jar file -  sbt package

Go to terminal à cd to project folder à do sbt package 

After build, project folder and target folder is created.

Once build is finished - use spark submit command 

 syntax

 

spark-submit  --class <class/object name>  --deploy-mode <xyz>    --master <abc > <complete jar path>

Use below command

spark2-submit --class WordCount  --deploy-mode client --master yarn   /mnt/home/edureka_361253/wordpro/target/scala-2.10/wordcountfirstapp_2.10-1.0.jar

Once executed, check the final output folder where we saved the output. 

Note: Now if you try to run the same, it will through error as already output folder has been created.Changing the name of out folder and rebuild would be needed.

Hope this helps.

answered Feb 24, 2019 by Esha

Related Questions In Apache Spark

0 votes
1 answer

Spark Submit: class does not exists

In the command, you have mentioned the ...READ MORE

answered Jul 26, 2019 in Apache Spark by Jimmy
1,650 views
0 votes
1 answer
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,443 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

answered Feb 13, 2019 in Apache Spark by Omkar
• 69,220 points
1,336 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,015 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,528 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,740 views
0 votes
1 answer

Why is Spark map output compressed?

Spark thinks that it is a good ...READ MORE

answered Feb 24, 2019 in Apache Spark by Wasim
1,081 views
0 votes
1 answer

Not able to preserve shuffle files in Spark

You lose the files because by default, ...READ MORE

answered Feb 24, 2019 in Apache Spark by Rana
1,457 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP