I am new to Hadoop and programming, and I am a little confused about Avro schema evolution. I will explain what I understand about Avro so far.
Avro is a serialization tool that stores binary data with its json schema at the top. The schema looks like this.
{
"namespace":"com.trese.db.model",
"type":"record",
"doc":"This Schema describes about Product",
"name":"Product",
"fields":[
{"name":"product_id","type": "long"},
{"name":"product_name","type": "string","doc":"This is the name of the product"},
{"name":"cost","type": "float", "aliases":["price"]},
{"name":"discount","type": "float", "default":5}
]
}
Now my question is why we need evolution? I have read that we can use default in the schema for new fields; but if we add a new schema in the file, that earlier schema will be overwritten. We cannot have two schema's for a single file.
Another question is, what are reader and writer schema's and how do they help?