Understanding linear regression concepts well is the best way to establish a solid base for all machine learning and deep learning algorithms, so even you have long experience in deploying advanced ML/DL models, it is good to refresh your basics and maintain your structures.
The goal of linear regression models is to predict or forecast an independent random variable based on dependent variables. For example, if we have a data set of salaries of a group of engineers based on their years of experience. Our task is to train a model with our data set so that it can predict the salary of an engineer that is not included in the training data set.
The scatter plot of the data set is shown below.

As shown in the plot, our data points seem to form a linear relationship between the salary (independent variable) and the years of experience (dependent variable). Our task is to find the best line that fits these data points. How can we determine that this fit line is the best fit line?

Multiple possible lines to fit the data points

As we can see there are multiple possible lines to fit the data points
As we can see many possible lines can fit the training data set.
The least-square approach is used to determine the best fitting line. In this technique, the errors (The sum of distances between each data point and the fit line) will be calculated for each possible line. The line with minimum error will be our best-fitting line. This is the simple idea of getting a generalized fitting line that can be used for the prediction of any new data point introduced to our model.
Let’s dive a bit into the details of a simple linear regression model.
We have to square the errors to avoid that negative and positive errors cancel each other. So, if we express the fitting line by the following equation:

where, y is the independent variable. x is the independent variable, a is the slope of the line and b is the intercept of the line (the intersection of the line with the y-axis).

Hence, the errors (it is called sum square residuals) can be expressed as follows:

So, we can substitute it by its equation.

We can find the slope (a) that gives the minimum square residuals. By plotting a graph between the various values of the fitting slope and the calculated square residuals.

The slope of the best-fitting line is the slope corresponding to the minimum sum square residuals.

Using the same technique, we can try many values of intercept and find the global minima point for the intercept (b) with minimum sum square residuals. It can be shown in a 3-D graph as we have the slope and the intercept are two variables versus the sum of square residuals.

The questions that arise in this sequence, what if we have more than one dependent variable, which is the case in real-world cases? What are the assumptions of linear regression models? How can we find the global minima points in applications?

The answers to these questions are very important in understanding the complete picture of linear regression and I will try to answer them in the following blogs.

You can reach me at: https://www.linkedin.com/in/bassemessam/

References:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store