How to parse the Excel format XML using Python

0 votes

I have Excel sheet saved as XML format and I would like to parse this using the Python.

Can you please let me know how can I parse XML using python.

Excel sheet were having two sheets Data_XY and Data_AB.

Data_XY sheet has two coumns X and Y and few rows.

After parsing  this sheet I want like X values and Y values in dictionaries.

Below is the XML of same sheet:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
  <Author>Hikmat Bahadur Rawal</Author>
  <LastAuthor>Hikmat Bahadur Rawal</LastAuthor>
  <Created>2015-06-05T18:17:20Z</Created>
  <LastSaved>2020-09-24T07:16:51Z</LastSaved>
  <Version>16.00</Version>
 </DocumentProperties>
 <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
  <AllowPNG/>
 </OfficeDocumentSettings>
 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
  <WindowHeight>12650</WindowHeight>
  <WindowWidth>22260</WindowWidth>
  <WindowTopX>32767</WindowTopX>
  <WindowTopY>32767</WindowTopY>
  <RefModeR1C1/>
  <ProtectStructure>False</ProtectStructure>
  <ProtectWindows>False</ProtectWindows>
 </ExcelWorkbook>
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="11" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s62">
   <Borders>
    <Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1"/>
    <Border ss:Position="Left" ss:LineStyle="Continuous" ss:Weight="1"/>
    <Border ss:Position="Right" ss:LineStyle="Continuous" ss:Weight="1"/>
    <Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1"/>
   </Borders>
  </Style>
  <Style ss:ID="s63">
   <Borders>
    <Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1"/>
    <Border ss:Position="Left" ss:LineStyle="Continuous" ss:Weight="1"/>
    <Border ss:Position="Right" ss:LineStyle="Continuous" ss:Weight="1"/>
    <Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1"/>
   </Borders>
   <Interior ss:Color="#ED7D31" ss:Pattern="Solid"/>
  </Style>
 </Styles>
 <Worksheet ss:Name="Data_XY">
  <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="6" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="14.5">
   <Row>
    <Cell ss:StyleID="s63"><Data ss:Type="String">X</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">Y</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">1</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">4</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">2</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">5</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">3</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">6</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">4</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">7</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">5</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">8</Data></Cell>
   </Row>
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <PageSetup>
    <Header x:Margin="0.3"/>
    <Footer x:Margin="0.3"/>
    <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
   </PageSetup>
   <Selected/>
   <Panes>
    <Pane>
     <Number>3</Number>
     <ActiveRow>17</ActiveRow>
     <ActiveCol>5</ActiveCol>
    </Pane>
   </Panes>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
 <Worksheet ss:Name="Data_AB">
  <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="6" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="14.5">
   <Row>
    <Cell ss:StyleID="s63"><Data ss:Type="String">A</Data></Cell>
    <Cell ss:StyleID="s63"><Data ss:Type="String">B</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">1</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">7</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">2</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">8</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">3</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">9</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">4</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">10</Data></Cell>
   </Row>
   <Row>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">5</Data></Cell>
    <Cell ss:StyleID="s62"><Data ss:Type="Number">11</Data></Cell>
   </Row>
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <PageSetup>
    <Header x:Margin="0.3"/>
    <Footer x:Margin="0.3"/>
    <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
   </PageSetup>
   <Panes>
    <Pane>
     <Number>3</Number>
     <ActiveRow>8</ActiveRow>
     <ActiveCol>6</ActiveCol>
    </Pane>
   </Panes>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
</Workbook>

Sep 24, 2020 in Python by Hikmat
• 120 points
6,682 views
Hi ,
I am also having excelconvertable XML file and I need to read the data from XML and convert into Pyspark dataframe...did you get any answer for this?

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Python

0 votes
2 answers

how to print the current time using python?

print(datetime.datetime.today()) READ MORE

answered Feb 14, 2019 in Python by Shashank
• 1,370 points
1,418 views
0 votes
3 answers

How to get the return value from a thread using python?

FWIW, the multiprocessing module has a nice interface for ...READ MORE

answered Dec 15, 2020 in Python by Roshni
• 10,480 points
108,851 views
0 votes
1 answer

How to use read a WSDL file from the file system using Python suds?

Hi, good question. It is a very simple ...READ MORE

answered Jan 21, 2019 in Python by Nymeria
• 3,560 points
8,719 views
0 votes
2 answers

How to add a certain time delay to the code using Python?

You can use time.sleep(duration in second) READ MORE

answered Feb 14, 2019 in Python by Shashank
• 1,370 points
1,380 views
0 votes
1 answer

How to output the rows which are affected using SQLAlchemy in Python?

Hi, good question. This is actually not ...READ MORE

answered Feb 15, 2019 in Python by Nymeria
• 3,560 points
3,540 views
0 votes
0 answers

How to install python modules using wheeling format?

Can you show the installation of pyaudio ...READ MORE

Jun 21, 2019 in Python by Waseem
• 4,540 points
634 views
0 votes
0 answers

How to save the import csv file to mongodb using pyspark (or python)?

I have this code, and I want ...READ MORE

Oct 9, 2019 in Python by Ahmed
• 310 points
2,512 views
0 votes
1 answer

How to get the latest file in a folder using python?

Hello @kartik,  would suggest using glob.iglob() instead of the glob.glob(), as ...READ MORE

answered May 27, 2020 in Python by Niroj
• 82,840 points
11,896 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP