All the data fomr column B to column K are numbers stored as text in excel file.
I have uploaded the excel file in dropbox as a sample to test with.
sample data text
Download it and save in /tmp/tsm.xlsx.
tsm.xlsx for testing
After reading it into a dataframe, I discover that the last column K's data type is str, whereas columns B through J's data types are all numbers:
import pandas as pd
sexcel = '/tmp/tsm.xlsx'
df = pd.read_excel(sexcel,sheet_name='ratios_annual')
row_num = len(df)
for id in range(row_num):
print('the data type in last column--K is',type(df.iloc[id,-1]))
print('the data type in column--J is',type(df.iloc[id,-2]))
the data type in last column--K is <class 'str'>
the data type in column--J is <class 'numpy.float64'>
the data type in last column--K is <class 'str'>
the data type in column--J is <class 'numpy.float64'>
When viewing the file in Excel, it is clear that all of the numbers in columns B through K are text-based. Why does it read into a data frame with different types?
Please examine the sample data that you can download.