How to change the spark Session configuration in Pyspark

Question

I am trying to change the default configuration of Spark Session. But it is not working.

spark_session  = SparkSession.builder
                      .master("ip")
                      .enableHiveSupport()
                      .getOrCreate()

spark_session.conf.set("spark.executor.memory", '8g')
spark_session.conf.set('spark.executor.cores', '3')
spark_session.conf.set('spark.cores.max', '3')
spark_session.conf.set("spark.driver.memory",'8g')
sc = spark_session.sparkContext

But if I put the configuration in Spark submit, then it works fine for me.

spark-submit --master ip --executor-cores=3 --diver 8G sample.py

Shubham · Answer 1 · May 29, 2018

You are not changing the configuration of PySpark. Just open pyspark shell and check the settings:

sc.getConf().getAll()

Now you can execute the code and again check the setting of the Pyspark shell.

You first have to create conf and then you can create the Spark Context using that configuration object.

config = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '3'), ('spark.driver.memory','8g')])
sc.stop()
sc = pyspark.SparkContext(conf=config)

After that it will work.

To know more about Pyspark, it's recommended that you join PySpark Training today.

Thanks.

answered May 29, 2018 by Shubham
• 13,490 points

score 0 · Answer 2 · Dec 10, 2018

Adding to Shubham's answer, after updating the configuration, you have to stop the spark session and create a new spark session.

spark.sparkContext.stop()

spark = SparkSession.builder.config(conf=conf).getOrCreate()

answered Dec 10, 2018 by Hilight

score 0 · Answer 3 · Dec 10, 2018

This should work

spark = SparkSession.builder.config(conf=conf1).getOrCreate()
sc = spark.sparkContext

answered Dec 10, 2018 by Shikar

score 0 · Answer 4 · Dec 10, 2018

You can dynamically load properties. First create a new empty conf and then pass your conf on run-time:

val sc = new SparkContext(new SparkConf())

spark-submit --master ip --executor-cores=3 --diver 8G sample.py

answered Dec 10, 2018 by Vini

Gitika · Answer 5 · Dec 14, 2020

You aren't actually overwriting anything with this code. Just so you can see for yourself try the following.

As soon as you start pyspark shell type:

sc.getConf().getAll()

This will show you all of the current config settings. Then try your code and do it again. Nothing changes.

What you should do instead is create a new configuration and use that to create a SparkContext. Do it like this:

conf = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '3'), ('spark.driver.memory','8g')])
sc.stop()
sc = pyspark.SparkContext(conf=conf)

Then you can check yourself just like above with: