LASSO and Ridge Regularization .. Simply Explained

Bassem Essam
Nerd For Tech
Published in
3 min readApr 25, 2021

--

After understanding the basic principles of linear regression and gradient descent, It is time to move forward a bit and review some techniques that improve the performance of ordinary linear regression models. The most common techniques are LASSO regularization (L1 Regularization) and Ridge Regularization (L2 Regularization). First, we need to know what “Regularization” means. Simply Regularization is the process of adding information to prevent over-fitting.

The over-fitting problem occurs when the error of the model is minimum in the training phase, but the performance of the model with testing data points is poor. That means that the model is not generalized and cannot be used in production.

LASSO Regularization:

LASSO stands for Least Absolute Shrinkable and Selection Operator. As mentioned in the regularization definition, it is the process of adding information to prevent the over-fitting problem, so a small modification will be done to the cost function of the ordinary least square as shown below.

Cost Function of LASSO Regression Model

To understand the effect of this additional term “Penalty term”, let’s assume that the best-fit line is passing through all data points hence the sum of square error is zero. The additional term ( λ |slope| )should be minimized to minimize the cost function. Minimizing the slope means that it makes the line less steep and this will not make the line passing through all data points and this will help to prevent over-fitting.
The next question we should ask is “what is λ?”. It is the regularization parameter which is a positive integer. If λ is too high, this will minimize or “shrink” the slope to zero. This is the main added value of LASSO. It can suppress the coefficients of useless features (highly correlated features) and hence, it makes the feature selection for our linear regression model. On the other hand, If λ is equal to zero, the loss function will turn into the “Ordinary Least Square” model. The regularization parameter (λ) can be determined by cross-validation in a way that avoids under-fitting (when λ is too high) and over-fitting (when λ is too low).

Ridge Regularization (L2 Regularization):

Ridge regularization is another variation for LASSO as the term added to the cost function is as shown below.

Cost Function of Ridge Regression Model

In Ridge regularization, the penalty term can approach zero but will not be zero as it squares the coefficient (slope). Ridge Regularization cannot be used in feature selection, so when to use LASSO and Ridge?
If you have many features with high correlation and you need to take away the useless features then LASSO is the better solution.
If the number of features greater than the number of observations and many features with multi-collinearity, Ridge regularization is a better solution.

  • Elastic-Net Regression is a mix between LASSO and Ridge by adding both terms (L1 Regularization and L2 Regularization) to calculate the cost function and hence the model coefficients.

References:

--

--