# Gradient Descent..Simply Explained With A Tutorial

In the previous blog Linear Regression, A general overview was given about simple linear regression. Now it’s time to know how to train your simple linear regression model and how to get the line that fits your data set.

Gradient Descent is simply a technique to find the point of minimum error (Sum of squared residuals), which represents the coefficient (a) and intercept (b) of the best-fit line in the line equation y=ax+b

Let’s re-invent the wheel to determine the coefficients of our linear regression model with few lines of code. After that, compare what we did by the output of the linear regression model of scikit learn.

Let’s re-invent the wheel to determine the coefficients of our linear regression model with few lines of code. After that, compare what we did by the output of the linear regression model of scikit learn.

Gradient Descent is simply a technique to find the point of minimum error (Sum of squared residuals), which represents the the coefficient (a) and intercept (b) of best-fit line in the line equation y=ax+b

Let’s re-invent the wheel to determine the coefficients of our linear regression model with few lines of code. After that, compare what we did by the output of linear regression model of scikit learn.

# 1- Finding coefficients of simple linear regression

First, we will import a simple data set of the salary of a group of engineers versus their years of experience. Our linear model should predict the salary of any new data point of an engineer.

`import pandas as pdimport matplotlib.pyplot as plt%matplotlib inlineimport numpy as npdf=pd.read_csv("Salary_Data.csv")`

Now let’s see a plot of our data points. It can be shown that there is a linear relationship between Years of experience and salary.

`plt.scatter(df['YearsExperience'],df['Salary'])plt.xlabel('Years of Experience')plt.ylabel('Salary')plt.show()`

The trails 100 lines that pass through our data points are shown as below.
We have the sum of square residuals for different lines, let’s plot it against the slopes.

`slopes=np.linspace(-20000,20000,100) #slopes of trail lines x_line=np.linspace(0,12,100)         #x-axis scaleintercept=np.linspace(-1*(df['Salary'].mean()),df['Salary'].mean(),100)  #The intercepts of our lines vary around the mean of y data points.ss_residuals=[] # A list to store the values of SS Residuals for each inefor i,slope in enumerate (slopes):    y_line=slope*x_line+intercept[i] #The equation of each line    residuals=0#Calculating The SS Residuals of each line        for j,salary in enumerate (df['Salary']):        error=(salary-(slope*df['YearsExperience'][j]+intercept[i]))**2        residuals=residuals+error    ss_residuals.append(residuals)    plt.plot(x_line,y_line)    plt.xlabel('Years of Experience')    plt.ylabel('Salary')plt.scatter(df['YearsExperience'],df['Salary'])plt.show()`

The trails 100 lines that passes through our data points is shown as below.

We have the sum of square residuals for different lines, let’s plot it against the slopes.

`plt.plot(slopes,ss_residuals,'-')plt.xlabel('slope')plt.ylabel('Sum of Square Residuals')plt.show()`

For this graph, we see that the errors decline gradually till the global minima point, and then the errors rise again.

The point with the minimum sum of square residuals represents the slope of the best-fit line. In order to find the minima, we apply the following code lines to find the index of this point in ss_residuals list and find the corresponding slope of it.

`#while loop will continue until gets the minimum pointj=0while ss_residuals[j] != min(ss_residuals):     j=j+1print(j)`

The index of minimum value in ss_residuals list is 70.

Let’s plot out best-fit line with data points to view the result.

`slopes=np.linspace(-20000,20000,100)x_line=np.linspace(0,12,100)intercept=np.linspace(-1*(df['Salary'].mean()),df['Salary'].mean(),100)y_line1=slopes*x_line+interceptplt.plot(x_line,y_line,'-r')plt.plot(x_line,y_line1,'-b')plt.xlabel('Years of Experience')plt.ylabel('Salary')plt.scatter(df['YearsExperience'],df['Salary'])plt.show()ss_residuals>> 8282.828282828283intercept>> 31475.989898989894`

Great, we have our best-fit line, but is it the best? We should compare our invented wheel by the modern wheel :)

# 2- Apply scikit-learn Linear model

In scikit-learn it is much easier, we fit our data set to a model and get the coefficients.

`from sklearn.linear_model import LinearRegressionlm=LinearRegression(fit_intercept=True)X=(np.array(df['YearsExperience'])).reshape(-1,1)lm.fit(X, df['Salary'])lm.coef_>> array([9449.96232146])lm.intercept_>> 25792.20019866871lm.score(X,df['Salary'])>> 0.9569566641435086`

We can see that both coefficients are almost the same. Let’s plot both lines together to compare them.

The score of the model is 0.95, which is a great for a best-fit line.

`slopes=np.linspace(-20000,20000,100)x_line=np.linspace(0,12,100)intercept=np.linspace(-1*(df['Salary'].mean()),df['Salary'].mean(),100)y_line1=slopes*x_line+intercepty_line=9449.96232146*x_line+25792.20019866871plt.plot(x_line,y_line,'-r')plt.plot(x_line,y_line1,'-b')plt.xlabel('Years of Experience')plt.ylabel('Salary')plt.scatter(df['YearsExperience'],df['Salary'])plt.show()`

Cool, but the both lines seem different, but it can be improved by increasing the number of trail lines.