When should Data Binning be used in data processing

Question

In data pre-processing, Data Binning is a technique to convert continuous values of a feature to categorical ones. For example, sometimes, the values of age feature in datasets are replaced with one of intervals such as:

[10,25),
[25,40),
[40,55].

When is the best time to use Data Binning? Does it (always) lead to a better result in a predication system or it may work as a trial and error?

Nandini · Answer 1 · Mar 3, 2022

Mostly by trial and error. When you bin a continuous variable, you automatically discard some data. Many algorithms would prefer to make a forecast using a continuous input, and many would bin the continuous data themselves. If your continuous variable is noisy, meaning the values were not recorded precisely, binning is a good idea. Binning could therefore help to lessen the loudness. Equal width binning and equal frequency binning are examples of binning strategies. When your continuous variable is poorly distributed, I would advocate avoiding equal width binning.

Ignite Your Future with Machine Learning Training!