The command --split-by is used to specify the column of the table used to generate splits for imports. This means that it specifies which column will be used to create the split while importing the data into the cluster.
Basically it is used to improve the import performance to achieve faster parallelism.