Why the data type in last columns is str instead of float

Question

All the data fomr column B to column K are numbers stored as text in excel file.

I have uploaded the excel file in dropbox as a sample to test with.
sample data text

Download it and save in /tmp/tsm.xlsx.

After reading it into a dataframe, I discover that the last column K's data type is str, whereas columns B through J's data types are all numbers:

import pandas as pd
sexcel = '/tmp/tsm.xlsx'
df = pd.read_excel(sexcel,sheet_name='ratios_annual')
row_num = len(df)
for id in range(row_num):
    print('the data type in last column--K is',type(df.iloc[id,-1]))
    print('the data type in  column--J is',type(df.iloc[id,-2]))

the data type in last column--K is <class 'str'>
the data type in  column--J is <class 'numpy.float64'>
the data type in last column--K is <class 'str'>
the data type in  column--J is <class 'numpy.float64'>

When viewing the file in Excel, it is clear that all of the numbers in columns B through K are text-based. Why does it read into a data frame with different types?
Please examine the sample data that you can download.

narikkadan · Answer 1 · Feb 13, 2023

Have you attempted to float-ify that column explicitly? Pandas attempts to infer data types correctly by default, but if the data is imperfect, it may make a mistake (e.g. column has a character somewhere, etc.). You might find the problem by changing that column's type to float.