I'm attempting to use Apache Airflow and the read excel python pandas function to read several.xls files that are stored on a NAS storage.
This is the code I'm using:
df = pd.read_excel('folder/sub_folder_1/sub_folder_2/file_name.xls', sheet_name=April, usecols=[0,1,2,3], dtype=str, engine='xlrd')
This worked for a time, but recently I have been getting this error for several of those files:
Excel 2007 xlsb file; not supported
[...]
xlrd.biffh.XLRDError: Excel 2007 xlsb file; not supported
These files are obviously.xls files, but my code seems to be mistaking them for unsupported.xlsb files. I'd prefer a method of indicating that these are.xls files, or alternatively, a method of reading xlsb files.
I'm not sure if this is important, however, these files are updated by an outside team. If they changed a parameter without my knowledge, I could be seeing a different error, but I doubt it.
Can someone please help me with this?