In a Spark DataFrame how can I flatten the struct

0 votes
Can anyone help me in understanding that how can I flatten the struct in Spark Data frame?
May 24, 2018 in Apache Spark by code799
6,171 views

2 answers to this question.

0 votes
You can go ahead and use the flatMap method.
answered May 24, 2018 by Shubham
• 13,490 points
+1 vote
// Collect data from input avro file and create dataset
            Dataset<Row> inputRecordsCollection = spark.read().format("avro").load(inputFile);

            // Flatten data of nested schema file to remove nested fields
            inputRecordsCollection.createOrReplaceTempView("inputFileTable");
            

          //function call to flatten data  

           String fileSQL = flattenSchema(inputRecordsCollection.schema(), null);
            Dataset<Row> inputFlattRecords = spark.sql("SELECT " + fileSQL + " FROM inputFileTable");
            inputFlattRecords.show(10);

public static String flattenSchema(StructType schema, String prefix) {
        final StringBuilder selectSQLQuery = new StringBuilder();

        for (StructField field : schema.fields()) {
            final String fieldName = field.name();

            if (fieldName.startsWith("@")) {
                continue;
            }

            String colName = prefix == null ? fieldName : (prefix + "[0]." + fieldName);
            String colNameTarget = colName.replace("[0].", "_");

            DataType dtype = field.dataType();
            if (dtype.getClass().equals(ArrayType.class)) {
                dtype = ((ArrayType) dtype).elementType();

            }
            if (dtype.getClass().equals(StructType.class)) {
                selectSQLQuery.append(flattenSchema((StructType) dtype, colName));
            } else {
                selectSQLQuery.append(colName);
                selectSQLQuery.append(" as ");
                selectSQLQuery.append(colNameTarget);
            }

            selectSQLQuery.append(",");
        }

        if (selectSQLQuery.length() > 0) {
            selectSQLQuery.deleteCharAt(selectSQLQuery.length() - 1);
        }

        return selectSQLQuery.toString();

    }
answered Jul 4, 2019 by Dhara dhruve

Related Questions In Apache Spark

+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,532 views
0 votes
1 answer
0 votes
1 answer
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,790 points
20,785 views
0 votes
1 answer

Spark: How can i create temp views in user defined database instead of default database?

You can try the below code: df.registerTempTable(“airports”) sqlContext.sql(" create ...READ MORE

answered Jul 14, 2019 in Apache Spark by Ishan
4,549 views
0 votes
1 answer

How do I access the Map Task ID in Spark?

You can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Jul 23, 2019 in Apache Spark by ravikiran
• 4,620 points
1,722 views
+1 vote
2 answers
0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8, 2019 in Big Data Hadoop by Vijay Dixon
• 190 points
12,874 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

answered Jan 1, 2019 in Apache Spark by anonymous
20,056 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,996 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP