Linear Regression

Shubham Saket
5 min readMay 18, 2021

Simple & effective…

Linear Regression can be defined as modelling a line which illustrates the relationship between the response and explanatory variables.

  • Response : The variable we are trying to predict. (Continuous Variable)
  • Explanatory Variable(s) : The input variables in the regression analysis.

Assumptions of Linear Regression:

  • The explanatory variables should be independent and uncorrelated to each other.
  • The error terms are uncorrelated with each other.
  • The error term has constant variance.

When to use Linear Regression?

  • Graphical Analysis : If the graph between the label and inputs shows a linear nature.
  • Technical Indicator : If the Pearson correlation coefficient between input and output variable is near -1 or 1. We can check the correlation coefficient value from heatmap :
  • To reduce the chances of overfitting we check which two explanatory variables are correlated and we remove one of them from training.

How a line fits?

OLS(Ordinary Least Square Method):

In the above figure l1 length is the difference between the line and actual data point. For a good estimation we would want the l1 to be as small as possible, similarly for all the data points.

Summing up all the errors give us an idea how the line is fitting into our dataset.

where k refers to kth data point in the set.

The problem with total error logic would be the negative and positive number will cancel each other and we will not get proper estimate of the error. An idea to solve this issue could be to use absolute value of the difference.

  • Can we do better ?

Each machine learning algorithm tries to minimize the cost function or the error function. So if we want stop algorithm from making large errors we can penalize the algorithm by multiplying a variable weight. More the error larger the weight.

In OLS we use the error value itself as the weight. So the equation become as below:

  • Can we increase the power of error to 4 from 2 ?

We can increase the power to any even number we want if we can compute the values. Here there is a trade off between the improvement of results and computation complexity. Squaring the errors provide sufficiently good results.

To find the minimum of the error we replace equation of line and differentiate the total error function.

After differentiating the equation we equate the value to zero to get:

By replacing the value of m and c in the equation y = mx + c , we get our required regression line.

Similarly we compute if there are more than one explanatory variables.

  • in OLS method we have to compute the values of the whole data set at once, which implies the larger our dataset more time.

Gradient Descent :

Alternate way to regress a line could be to change the values of the parameters m and c at every point.

  • We start assuming the value of m = 0 and c = 0. Then we calculate the derivative of m and c as below.
  • we update the values :

Where L is the learning rate, we keep it small so that we do not overshoot. It is a tricky thing if we take very small value of L then number of iterations increases and if we take a large value we might overshoot and never reach the global minimum.

Stopping condition for iteration would be no significant change from previous total error value to new total error value.

Goodness of fit:

After we get out regression line we can measure how our model is performing using the following measures of fit.

  • R-squared error : It is defined as the ratio of unexplained variance after modeling to total variance in the dataset. Total variance is calculated by taking mean of the data as predicted value. Its values ranges from 0 to 1.
  • adjusted R-squared error : The value of R-squared error increases as we increase the number of parameters. To make model simpler and reducing the chance of overfitting we adjust the R-squared error by penalizing the model with more number of parameter.

I hope this post helped you in learning basic concept of Linear regression.

--

--