Copying values from multiple excel files to a single one with python

Question

I want to replicate the values from a folder of numerous excel files that has values in columns a through g into a single excel file.

The following piece of code has been tested by me:

`  import os
    import pandas as pd

   # Specify the input folder containing Excel files
    input_folder = 'C:/Users/User/Desktop/FilesToProcess'

   # Create an empty DataFrame to store the selected data
    selected_data = pd.DataFrame()

   # Loop through all Excel files in the input folder
    for file_name in os.listdir(input_folder):
    if file_name.endswith('.xlsx'):
   # Read in the Excel file and select the desired columns
    df = pd.read_excel(os.path.join(input_folder, file_name), usecols='A:G')
   # Append the selected data to the main DataFrame
    selected_data = selected_data.append(df)[df.columns.tolist()]

   # Specify the output file name and sheet name
    output_file = 'C:/Users/User/Desktop/output1.xlsx'
    output_sheet_name = 'Sheet1'

   # Write the selected columns to a new Excel file
    with pd.ExcelWriter(output_file) as writer:
    selected_data.to_excel(writer, sheet_name=output_sheet_name, index=False)`

Although it initially functions as intended, after a few files, the columns become messed up. It starts copying values in columns > G specifically. Due to the nature of the data, I am unable to provide a result example.

narikkadan · Answer 1 · Apr 11, 2023

You most likely have a few columns in a few files that don't adhere to your standards. For instance, the column name in some excel files can be "Column1," whereas in others it might be "Column 1" or "Column 1." When you concatenate, pandas will produce new columns as a result, leaving some of the columns in some rows empty.