I ran the following script with two files as input, the output was split into two file part-m-00000 and part-m-00001. I couldn't understand why, please assist me? Note: The size in only 8.2 MB for each file.
REGISTER PIG/PigUDF.jar;
A = LOAD "PIG/HealthCare/Input/healthcare_Sample_dataset1.csv" USING PigStorage(",") AS (patientID:int, name:chararray, date:chararray, phoneNumber:chararray, eMail:chararray, SSN:chararray, gender:chararray, disease:chararray, age:chararray);
B = LOAD "PIG/HealthCare/Input/healthcare_Sample_dataset2.csv" USING PigStorage(",") AS (patientID:int, name:chararray, date:chararray, phoneNumber:chararray, eMail:chararray, SSN:chararray, gender:chararray, disease:chararray, age:chararray);
C = UNION A, B;
D = FOREACH C GENERATE patientID, com.kamran.pig.udf.encryptField(name,"12345678abcdefgh"), com.kamran.pig.udf.encryptField(date,"12345678abcdefgh"), com.kamran.pig.udf.encryptField(phoneNumber,"12345678abcdefgh"), com.kamran.pig.udf.encryptField(eMail,"12345678abcdefgh"), com.kamran.pig.udf.encryptField(SSN,"12345678abcdefgh"), com.kamran.pig.udf.encryptField(gender,"12345678abcdefgh"), com.kamran.pig.udf.encryptField(disease,"12345678abcdefgh"), age;
STORE D INTO "PIG/HealthCare/Output/HealthCareOutput.csv";