Gradient Descent..Simply Explained With A Tutorial

Bassem Essam
4 min readMar 3, 2021

In the previous blog Linear Regression, A general overview was given about simple linear regression. Now it’s time to know how to train your simple linear regression model and how to get the line that fits your data set.

Gradient Descent is simply a technique to find the point of minimum error (Sum of squared residuals), which represents the coefficient (a) and intercept (b) of the best-fit line in the line equation y=ax+b

Let’s re-invent the wheel to determine the coefficients of our linear regression model with few lines of code. After that, compare what we did by the output of the linear regression model of scikit learn.

Let’s re-invent the wheel to determine the coefficients of our linear regression model with few lines of code. After that, compare what we did by the output of the linear regression model of scikit learn.

Gradient Descent is simply a technique to find the point of minimum error (Sum of squared residuals), which represents the the coefficient (a) and intercept (b) of best-fit line in the line equation y=ax+b

Let’s re-invent the wheel to determine the coefficients of our linear regression model with few lines of code. After that, compare what we did by the output of linear regression model of scikit learn.

1- Finding coefficients of simple linear regression

First, we will import a simple data set of the salary of a group of engineers versus their years of experience. Our linear model should predict the salary of any new data point of an engineer.
The data set can be downloaded from the following link.

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
df=pd.read_csv("Salary_Data.csv")

Now let’s see a plot of our data points. It can be shown that there is a linear relationship between Years of experience and salary.

plt.scatter(df['YearsExperience'],df['Salary'])
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

The trails 100 lines that pass through our data points are shown as below.
We have the sum of square residuals for different lines, let’s plot it against the slopes.

slopes=np.linspace(-20000,20000,100) #slopes of trail lines 
x_line=np.linspace(0,12,100) #x-axis scale
intercept=np.linspace(-1*(df['Salary'].mean()),df['Salary'].mean(),100) #The intercepts of our lines vary around the mean of y data points.
ss_residuals=[] # A list to store the values of SS Residuals for each ine
for i,slope in enumerate (slopes):
y_line=slope*x_line+intercept[i] #The equation of each line
residuals=0
#Calculating The SS Residuals of each line
for j,salary in enumerate (df['Salary']):
error=(salary-(slope*df['YearsExperience'][j]+intercept[i]))**2
residuals=residuals+error
ss_residuals.append(residuals)
plt.plot(x_line,y_line)
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.scatter(df['YearsExperience'],df['Salary'])
plt.show()

The trails 100 lines that passes through our data points is shown as below.

We have the sum of square residuals for different lines, let’s plot it against the slopes.

plt.plot(slopes,ss_residuals,'-')
plt.xlabel('slope')
plt.ylabel('Sum of Square Residuals')
plt.show()

For this graph, we see that the errors decline gradually till the global minima point, and then the errors rise again.

The point with the minimum sum of square residuals represents the slope of the best-fit line. In order to find the minima, we apply the following code lines to find the index of this point in ss_residuals list and find the corresponding slope of it.

#while loop will continue until gets the minimum point
j=0
while ss_residuals[j] != min(ss_residuals):
j=j+1
print(j)

The index of minimum value in ss_residuals list is 70.

Let’s plot out best-fit line with data points to view the result.

slopes=np.linspace(-20000,20000,100)
x_line=np.linspace(0,12,100)
intercept=np.linspace(-1*(df['Salary'].mean()),df['Salary'].mean(),100)
y_line1=slopes[70]*x_line+intercept[70]
plt.plot(x_line,y_line,'-r')
plt.plot(x_line,y_line1,'-b')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.scatter(df['YearsExperience'],df['Salary'])
plt.show()
ss_residuals[70]
>> 8282.828282828283
intercept[70]
>> 31475.989898989894

Great, we have our best-fit line, but is it the best? We should compare our invented wheel by the modern wheel :)

2- Apply scikit-learn Linear model

In scikit-learn it is much easier, we fit our data set to a model and get the coefficients.

from sklearn.linear_model import LinearRegression
lm=LinearRegression(fit_intercept=True)
X=(np.array(df['YearsExperience'])).reshape(-1,1)
lm.fit(X, df['Salary'])
lm.coef_
>> array([9449.96232146])
lm.intercept_
>> 25792.20019866871
lm.score(X,df['Salary'])
>> 0.9569566641435086

We can see that both coefficients are almost the same. Let’s plot both lines together to compare them.

The score of the model is 0.95, which is a great for a best-fit line.

slopes=np.linspace(-20000,20000,100)
x_line=np.linspace(0,12,100)
intercept=np.linspace(-1*(df['Salary'].mean()),df['Salary'].mean(),100)
y_line1=slopes[70]*x_line+intercept[70]
y_line=9449.96232146*x_line+25792.20019866871
plt.plot(x_line,y_line,'-r')
plt.plot(x_line,y_line1,'-b')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.scatter(df['YearsExperience'],df['Salary'])
plt.show()

Cool, but the both lines seem different, but it can be improved by increasing the number of trail lines.

Thank you for reading!

You can reach me at : https://www.linkedin.com/in/bassemessam/

References:

https://www.kaggle.com/karthickveerakumar/salary-data-simple-linear-regression?select=Salary_Data.csv

--

--