Sum of Squared Residuals: closed form

Consider how input is correlated to output. It looks like a linear relationship

The red dashed lines represent the residuals. A large residual means the point is far from the line, while a small residual means it’s close. How do you think we can find the “best” line that minimizes these residuals?

1. Cost Function

The cost function to minimize is the sum of squared errors between the observed (\(y_i\)) and predicted (\(\hat{y}_i\)) values:

Cost = \( \sum_{i=1}^n \big(y_i - \hat{y}_i\big)^2 \)

It is also written as

\( J(m, b) = \frac{1}{n} \sum_{i=1}^n \left( y_i - \hat{y}_i \right)^2 \)

Substitute the predicted value \( \hat{y}_i = mx_i + b \):

Cost = \( \sum_{i=1}^n \big(y_i - (mx_i + b)\big)^2 \)

You can visualize the cost in a 3d plane against slope and intercept

2. Partial Derivative with Respect to \(b\)

\( \frac{\partial}{\partial b} \text{Cost} = \frac{\partial}{\partial b} \sum_{i=1}^n \big(y_i - (mx_i + b)\big)^2 \)

Apply chain rule. Expand the square term (only terms involving \(b\) matter):

\( \frac{\partial}{\partial b} \big(y_i - (mx_i + b)\big)^2 = 2 \big(y_i - (mx_i + b)\big) \cdot \frac{\partial}{\partial b} \big(- (mx_i + b)\big) \)
\( \frac{\partial}{\partial b} \big(y_i - (mx_i + b)\big)^2 = 2 \big(y_i - (mx_i + b)\big) \cdot ( -1 ) \)
\( \frac{\partial}{\partial b} \big(y_i - (mx_i + b)\big)^2 = -2 \big(y_i - (mx_i + b)\big) \)

Applying derivative across summation

\( \frac{\partial}{\partial b} \text{Cost} = -2 \sum_{i=1}^n \big(y_i - (mx_i + b)\big) \)

Set the derivative to 0 to minimize the cost

\( \sum_{i=1}^n \big(y_i - (mx_i + b)\big) = 0 \)

Solve for b

\( \sum_{i=1}^n y_i - \sum_{i=1}^n (mx_i + b)\big) = 0 \)
\( \sum_{i=1}^n y_i = \sum_{i=1}^n (mx_i + b) \)

Expand the right hand side

\( \sum_{i=1}^n y_i = m \sum_{i=1}^n x_i + b \sum_{i=1}^n 1 \)
\(\sum_{i=1}^n y_i = m \sum_{i=1}^n x_i + b \cdot n \)
\( b \cdot n = \sum_{i=1}^n y_i - m \sum_{i=1}^n x_i \)
\( b = \frac{\sum_{i=1}^n y_i - m \sum_{i=1}^n x_i}{n} \)

Note

\( \frac{\partial}{\partial b} \big(- (mx_i + b)\big) = -1 \)

3. Partial Derivative with Respect to \(m\)

Now, take the derivative of the cost function with respect to \(m\):

\( \frac{\partial}{\partial m} \text{Cost} = \frac{\partial}{\partial m} \sum_{i=1}^n \big(y_i - (mx_i + b)\big)^2 \)

Apply chain rule. Expand the square term (only terms involving \(m\) matter):

\( \frac{\partial}{\partial m} \big(y_i - (mx_i + b)\big)^2 = 2 \big(y_i - (mx_i + b)\big) \cdot \frac{\partial}{\partial m} \big(- (mx_i + b)\big) \)
\( \frac{\partial}{\partial m} \big(y_i - (mx_i + b)\big)^2 = 2 \big(y_i - (mx_i + b)\big) . (-x_i)\)
\( \frac{\partial}{\partial m} \big(y_i - (mx_i + b)\big)^2 = -2x_i \big(y_i - (mx_i + b)\big) \)

Applying derivative across summation

\( \frac{\partial}{\partial m} \text{Cost} = -2 \sum_{i=1}^n x_i \big(y_i - (mx_i + b)\big) \)

Set the derivative to 0 to minimize the cost

\( \sum_{i=1}^n \big(y_i - (mx_i + b)\big) = 0 \)

Solve for m

\( \sum_{i=1}^n x_i \big(y_i - (mx_i + b)\big) = 0 \)
\(\sum_{i=1}^n x_i y_i - \sum_{i=1}^n mx_i^2 - \sum_{i=1}^n bx_i = 0\)

Rearranging terms

\[ \tiny m \sum_{i=1}^n x_i^2 = \sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i \]
\[ \tiny m = \frac{\sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i}{\sum_{i=1}^n x_i^2} \]

Note

\( \frac{\partial}{\partial m} \big(- (mx_i + b)\big) = -x_i \)

Let’s look at both \(b\) and \(m\)

\[ \tiny b = \frac{\sum_{i=1}^n y_i - m \sum_{i=1}^n x_i}{n} \]
\[ \tiny m = \frac{\sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i}{\sum_{i=1}^n x_i^2} \]

Rearrange numerator of \(m\)

Replace b with \( \bar{y} - m\bar{x} \)
\( \sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i = \sum_{i=1}^n x_i y_i - (\bar{y} - m\bar{x}) \sum_{i=1}^n x_i \)

Substituting into numerator

\( \sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i = \sum_{i=1}^n x_i y_i - (\bar{y} - m\bar{x}) \sum_{i=1}^n x_i \)

Expand the numerator

\( \sum_{i=1}^n x_i y_i - (\bar{y} - m\bar{x}) \sum_{i=1}^n x_i = \sum_{i=1}^n x_i y_i - \bar{y} \sum_{i=1}^n x_i + m\bar{x} \sum_{i=1}^n x_i \)

simplified numerator

\( \sum_{i=1}^n x_i y_i - \bar{y} \sum_{i=1}^n x_i + m \bar{x} \sum_{i=1}^n x_i \)
\( \sum_{i=1}^n x_i y_i - n\bar{y}\bar{x} + m \bar{x}^2 \) See fundamenals below for \( \sum_{i=1}^n x_i = n \bar{x} \)
Key Simplifications

Rearrange denominator of \(m\)

Replace with rearranged numerator and denomiators

\( \ m = \frac{\sum_{i=1}^n (x_i – \bar{x})(y_i – \bar{y}) + mn \bar{x}^2}{\sum_{i=1}^n (x_i – \bar{x})^2 + n \bar{x}^2}  \)

How \( mn\bar{x}^2 \) and   \( n\bar{x}^2 \) balance each other ​

Let’s take 2 examples. In one slope is 2 and in the other slope is 100. Note that in both cases the terms  \( mn\bar{x}^2 \) and   \( n\bar{x}^2 \) (shown in red) don’t affect the value of slope 

Note that    \( mn\bar{x}^2 \) and   \( n\bar{x}^2 \) balance each other. Due to this we can drop them from the slope expression to make it more intuitive and focused on the relationship between the deviations o x and y from their respective means

\( \ m = \frac{\text{Cov}(x, y)}{\text{Var}(x)} = \frac{\sum_{i=1}^n (x_i – \bar{x})(y_i – \bar{y})}{\sum_{i=1}^n (x_i – \bar{x})^2} \)

Fundamentals used in above derivation

Mean of x

\(\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \)

Mean of y

\( \quad \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i \)

Sum of deviation from mean is zero

 

 

Apply Closed Form to find a line fitting linearly correlated points

Github notebook

1 thought on “Sum of Squared Residuals: closed form”

Leave a Comment

Your email address will not be published. Required fields are marked *