Why?

ML Models can overfit - i.e. learn (too much) the training data and perform poorly on real data. Regularisation discourages that.

How?

It’s a penalty in the loss term that just discourages the model from placing too much importance on specific features (for example in Logistic Regression). (i.e. forcing it to learn a “simple” function)

Dropout:

  • Random neurons (e.g. 20%) are ignored during training.
  • The remaining neurons have to compensate by learning more distributed representations.