Create dataframe for Avro file

Question

I have an Avro file. I want to do some operations on it. I was wondering if it is possible to work with Avro files using dataframe?

score 0 · Answer 1 · Jul 22, 2019

Yes, we can work with Avro files using dataframe. The easiest way to work with Avro data files in Spark applications is by using the DataFrame API. The spark-avro library includes Avro methods in SQLContext for reading and writing Avro files:

Scala Example with Function

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)

// The Avro records are converted to Spark types, filtered, and
// then written back out as Avro records
val df = sqlContext.read.avro("input_dir")
df.filter("age > 5").write.avro("output_dir")

You can also specify "com.databricks.spark.avro" in the format method:

Scala Example with Format

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.avro").load("input_dir")
df.filter("age > 5").write.format("com.databricks.spark.avro").save("output_dir")