Batch normalization is a technique that is commonly employed with very deep neural networks. With each mini-batch, the outputs of layers after many layers fluctuate, and the layer must keep chasing a shifting objective.
However, because the fluctuations remain within a restricted range and do not confront the difficulty of a shifting target, this is not a significant issue for shallow neural networks. So, for shallow neural networks, you may train without batch normalization and everything will work well.