Extracting tabular data from multiple xlsx files with inconsistent structures

0 votes

I have several Excel spreadsheets with a variety of information, including tables. From every file, I must be able to access a specific table. I thought of using pandas skiprows, but each sheet has a different line where the table is located, both at the beginning and end of the sheet. For the sake of the example below, I need to retrieve the final table with the headers "Well," "Content," etc. and convert it to a dataframe. To be consistent, the table's row in this example is 115; nevertheless, this varies from file to file. The distance from the sheet's finish is also erratic and irregular. Any help is much appreciated!

enter image description here

I have looked into openpyxl but didn't find anything which would isolate the table based on header values. I have also looked into pd.read_excel skiprows and/or indexing the dataframe using iloc. The issue here is the inconsistent position of the table, and variable size of the table.

Mar 25, 2023 in Others by Kithuzzz
• 38,000 points
647 views

1 answer to this question.

0 votes

For a cell that is consistent across all files (in this case, "Basic Settings" is always the same distance away from the bottom of the table), I was able to solve this problem by first getting the index of the first row and then the index of the nearest row below. As follows:

#defining the header index by getting the first column heading
header_index = raw_table[raw_table[0].eq('Well')].index.values[0]
#defining the footer dimensions based on the first consistent title 
footer_index = raw_table[raw_table[0].eq('Basic settings')].index.values[0]
#Slicing the table according to the indices determined above, the footer is *strong text*3 
#below the end, so subtracting 3 
cropped_table = raw_table[header_index:footer_index-3]'''
answered Mar 25, 2023 by narikkadan
• 63,600 points

Related Questions In Others

0 votes
0 answers

Copying values from multiple excel files to a single one with python

I want to replicate the values from ...READ MORE

Apr 11, 2023 in Others by Kithuzzz
• 38,000 points
917 views
0 votes
1 answer

How to convert data from txt files to Excel files using python

Hi , there are few steps to ...READ MORE

answered Feb 16, 2022 in Others by Edureka
• 13,690 points
14,024 views
0 votes
1 answer

How to open .xlsx files in MS Excel from VS Code?

Hello, to open xlxs files, or files ...READ MORE

answered Feb 17, 2022 in Others by gaurav
• 23,260 points
4,142 views
0 votes
1 answer

What is the easiest way to convert an Excel spreadsheet with tabular data to JSON?

Assuming you really mean easiest and are not necessarily ...READ MORE

answered Dec 29, 2022 in Others by narikkadan
• 63,600 points
13,424 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,434 views
0 votes
1 answer
0 votes
1 answer

Create unique rows in Excel with limited data to be used in multiple columns

This setup isn't readily generalizable, though since ...READ MORE

answered Oct 14, 2022 in Others by narikkadan
• 63,600 points
730 views
0 votes
1 answer

Word Mail Merge with Excel data has to be saved in different files with custom names

Try this: Public Sub Mail_Merge() On Error GoTo ErrH Dim ...READ MORE

answered Nov 6, 2022 in Others by narikkadan
• 63,600 points
1,783 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP