How to replace null values in Spark DataFrame

+1 vote

Announcement! Career Guide 2019 is out now. Explore careers to become a Big Data Developer or Architect!

I want to remove null values from a csv file. So tried the following things.

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/usr/local/spark/cars.csv")

After loading the file it looks like as shown below. Now, I want to remove null values.
image

So, I do this :

df.na.fill("e",Seq("blank"))
But the null values didn't change.Can anyone help me?

May 31, 2018 in Apache Spark by kurt_cobain
• 9,350 points

edited Dec 15, 2020 by MD 75,351 views

8 answers to this question.

0 votes

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame df that you have defined earlier.

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame to a new value.

You can even check out the details of a successful Spark developers with the Pyspark training course.

answered May 31, 2018 by nitinrawat895
• 11,380 points
0 votes
val map = Map("comment" -> "a", "blank" -> "a2")

df.na.fill(map).show()
answered Dec 10, 2018 by Sute
0 votes
df1 = df.na().fill("e",Seq("blank"));
answered Dec 10, 2018 by Shanti
0 votes
String[] colNames = {"NameOfColumn"}
dataframe = dataframe.na.fill("ValueToBeFilled", colNames)
answered Dec 10, 2018 by Sada
0 votes
def isEvenOption(n: Integer): Option[Boolean] = {
  val num = Option(n).getOrElse(return None)
  Some(num % 2 == 0)
}

val isEvenOptionUdf = udf[Option[Boolean], Integer](isEvenOption)

Source: Dealing with null in Spark

answered Dec 10, 2018 by Mohan
For ,we have to use, drop()

DF.na.drop()
.show(false)

drop() will remove all the null from the DF
0 votes

Hi i hope this will help for you.

option("nullValue","defaultvalue")

val df = sqlContext.read.format("com.databricks.spark.csv").option("nullValue","defaultvalue").option("header", "true").load("/usr/local/spark/cars.csv"

answered Feb 5, 2019 by Srinivasreddy
• 140 points
Is a closed parenthesis missing at the end of the command?
Sir, Can you please explain this code?
0 votes
in spark 2.x you can directly use df.dropna()  you can drop null from dataframe
answered Mar 29, 2020 by gaurav
0 votes

Hi,

In Spark, fill() function of DataFrameNaFunctions class is used to replace NULL values on the DataFrame column with either zero(0), empty string, space, or any constant literal values.

//Replace all integer and long columns
df.na.fill(0)
    .show(false)

//Replace with specific columns
df.na.fill(0,Array("population"))
  .show(false)
answered Dec 15, 2020 by MD
• 95,460 points

Related Questions In Apache Spark

+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

answered Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,350 points
2,317 views
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3, 2019 in Apache Spark by Omkar
• 69,220 points
5,087 views
+1 vote
1 answer
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

answered May 14, 2020 in Apache Spark by MD
• 95,460 points
5,057 views
+1 vote
2 answers
0 votes
1 answer

Different Spark Ecosystem

Spark has various components: Spark SQL (Shark)- for ...READ MORE

answered Jun 4, 2018 in Apache Spark by kurt_cobain
• 9,350 points
868 views
0 votes
1 answer

Minimizing Data Transfers in Spark

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Jun 19, 2018 in Apache Spark by Data_Nerd
• 2,390 points
1,378 views
0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8, 2019 in Big Data Hadoop by Vijay Dixon
• 190 points
12,765 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,950 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 88,750 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP