How to append data to a parquet file

Question

I am trying to append some data to my parquet file and for that, I'm using the following code:

ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);

final GenericRecord record = new GenericData.Record(avroSchema);

parquetWriter.write(record);

But this creates a new file, it does not append the file. What should I do to append the file?

Omkar · Answer 1 · Jan 11, 2019

Try using Spark API to append the file. Refer to the following code:

df.write.mode('append').parquet('parquet_data_file')

answered Jan 11, 2019 by Omkar
• 69,220 points

How to achieve this using java's ParquetWriter API?

commented Feb 4, 2020 by anonymous

It creates second parquet file, it does not append data to the existing one

commented Mar 13, 2020 by anonymous

Hi,

It will append the data. Are you saying that it creates a new partitions?

Follow the bellow example it will give you some idea.

 val data = Seq(("James ","","Smith","36636","M",3000),
     |       ("Michael ","Rose","","40288","M",4000),
     |       ("Robert ","","Williams","42114","M",4000),
     |       ("Maria ","Anne","Jones","39192","F",4000),
     |       ("Jen","Mary","Brown","","F",-1)
     |     );

val columns= Seq("firstname","middlename","lastname","dob","gender","salary");

import spark.sqlContext.implicits._

val df = data.toDF(columns:_*)

df.write.parquet("/user/people.parquet")

val parqDF = spark.read.parquet("/user/people.parquet")

parqDF.show()
+---------+----------+--------+-----+------+------+
|firstname|middlename|lastname|  dob|gender|salary|
+---------+----------+--------+-----+------+------+
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
+---------+----------+--------+-----+------+------+

df.write.mode("append").parquet("/user/people.parquet")

val parqDF = spark.read.parquet("/user/people.parquet")

parqDF.show()
+---------+----------+--------+-----+------+------+
|firstname|middlename|lastname|  dob|gender|salary|
+---------+----------+--------+-----+------+------+
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
+---------+----------+--------+-----+------+------+