Why doesn t my Spark Yarn client runs on all available worker machines

I am running an application on Spark cluster using yarn client mode with 4 nodes. Other then Master node there are three worker nodes available but spark execute the application on only two workers. Workers are selected at random, there aren't any specific workers that get selected each time application is run.

For the worker not being used following lines got printed in the logs

**INFO  Client:54*

     client token: N/A
      diagnostics: N/A
      ApplicationMaster host: 192.168.0.67
      ApplicationMaster RPC port: 0
      queue: default
      start time: 1550748030360
      final status: UNDEFINED
      tracking URL: http://aiserver:8088/proxy/application_1550744631375_0004/
      user: root

Find below Spark submit command:

spark-submit --master yarn --class com.i2c.chprofiling.App App.jar --num-executors 4 --executor-cores 3 --conf "spark.locality.wait.node=0"

Why doesn't my Spark Yarn client runs on all available worker machines?

Feb 22, 2019 in Apache Spark by Uzair Ahmad

edited Feb 22, 2019 by Omkar • 8,886 views

Have you tried using --deploy-mode cluster option?

commented Feb 22, 2019 by Omkar
• 69,180 points

I tried using cluster mode but getting following exception.

diagnostics: Application application_1550748865132_0022 failed 2 times due to AM Container for appattempt_1550748865132_0022_000002 exited with exitCode: 13
For more detailed output, check application tracking page:http://aiserver:8088/cluster/app/application_1550748865132_0022Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1550748865132_0022_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
        at org.apache.hadoop.util.Shell.run(Shell.java:482)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 13
Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1550819137278
         final status: FAILED
         tracking URL: http://aiserver:8088/cluster/app/application_1550748865132_0022
         user: root

Any help will be highly appreciated

commented Feb 22, 2019 by Uzair Ahmad

Have you set the master in your code to be local?

SparkConf.setMaster("local[*]")

commented Feb 22, 2019 by Omkar
• 69,180 points

No it is set to "yarn"

commented Feb 22, 2019 by anonymous

Try --master yarn-client. This is the solution for most of the error code 13

commented Feb 22, 2019 by Omkar
• 69,180 points

Hi receive above mentioned exception only when using --deploy-mode cluster as suggested by you. In client mode I dont receive this exception but my available number of worker nodes are not being utilized. Please refer to description