Python XML file to pandas dataframe

Question

How to convert an xml file to pandas dataframe?

score 0 · Answer 1 · Aug 1, 2019

Here's an example code:

import pandas as pd 

import xml.etree.ElementTree as et 

    

xtree = et.parse("student.xml")

xroot = xtree.getroot() 


df_cols = ["name", "email", "grade", "age"]

out_df = pd.DataFrame(columns = df_cols)


for node in xroot: 

    s_name = node.attrib.get("name")

    s_mail = node.find("email").text if node is not None else None

    s_grade = node.find("grade").text if node is not None else None

    s_age = node.find("age").text if node is not None else None

    

    out_df = out_df.append(pd.Series([s_name, s_mail, s_grade, s_age],

                                     index = df_cols), 

                           ignore_index = True)

1.In the above code we have imported pandas and ElementTree,

ElementTree breaks the xml document into a tree structure which is easy to work with

2.We have parsed or extracted the xml file and stored in xtree,

Every part of a tree (root included) has a tag that describes the element.

3.df_clos has the coloumn names which is in xml and which we want to store in dataframe

out_df here all the coloumns are stored in a dataframe

4. A for loop to extract all the data and we are storing the data in the variable i,e s_name,s_mail etc,

here find() finds the first child with a particular tag

5.In Out_df we are appending the data which has been converted to dataframe