Converting text file to Orc:
Using Spark, the Textfile is basically converted into a data frame and then stored in Orc format. Below is the scala program.
|
object OrcConv {
def main(args : Array[String]){
val conf = new SparkConf().setMaster("local").setAppName("OrcConv");
val sc = new SparkContext(conf);
val sqlContext = new HiveContext(sc);
val file = sc.textFile("path");
val schemaString = "name age";
val schema = StructType(schemaString.split(" ").map(fieldName => StructField(fieldName,StringType,true)));
val rowRDD=file.map(_.split(",")).map(p => Row(p(0), p(1)));
val fileSchemaRDD = sqlContext.createDataFrame(rowRDD,schema);
fileSchemaRDD.write.orc("path");
}
} |
For converting text file to json, you will just have to make one change in the above program: you will have to write the below statement,
fileSchemaRDD.write.orc("path");
as below,
fileSchemaRDD.write.json("path");
Hope this helps!!
If you need to learn more about Scala, It's recommended to join Scala Certification course today.
Thank you!