Sum of Squared Residuals: closed form

Consider how input is correlated to output. It looks like a linear relationship

The red dashed lines represent the residuals. A large residual means the point is far from the line, while a small residual means it’s close. How do you think we can find the “best” line that minimizes these residuals?

1. Cost Function

The cost function to minimize is the sum of squared errors between the observed (\(y_i\)) and predicted (\(\hat{y}_i\)) values:

Cost = \( \sum_{i=1}^n \big(y_i - \hat{y}_i\big)^2 \)

It is also written as

\( J(m, b) = \frac{1}{n} \sum_{i=1}^n \left( y_i - \hat{y}_i \right)^2 \)

Substitute the predicted value \( \hat{y}_i = mx_i + b \):

Cost = \( \sum_{i=1}^n \big(y_i - (mx_i + b)\big)^2 \)

You can visualize the cost in a 3d plane against slope and intercept

2. Partial Derivative with Respect to \(b\)

\( \frac{\partial}{\partial b} \text{Cost} = \frac{\partial}{\partial b} \sum_{i=1}^n \big(y_i - (mx_i + b)\big)^2 \)

Apply chain rule. Expand the square term (only terms involving \(b\) matter):

\( \frac{\partial}{\partial b} \big(y_i - (mx_i + b)\big)^2 = 2 \big(y_i - (mx_i + b)\big) \cdot \frac{\partial}{\partial b} \big(- (mx_i + b)\big) \)

\( \frac{\partial}{\partial b} \big(y_i - (mx_i + b)\big)^2 = 2 \big(y_i - (mx_i + b)\big) \cdot ( -1 ) \)

\( \frac{\partial}{\partial b} \big(y_i - (mx_i + b)\big)^2 = -2 \big(y_i - (mx_i + b)\big) \)

Applying derivative across summation

\( \frac{\partial}{\partial b} \text{Cost} = -2 \sum_{i=1}^n \big(y_i - (mx_i + b)\big) \)

Set the derivative to 0 to minimize the cost

\( \sum_{i=1}^n \big(y_i - (mx_i + b)\big) = 0 \)

Solve for b

\( \sum_{i=1}^n y_i - \sum_{i=1}^n (mx_i + b)\big) = 0 \)

\( \sum_{i=1}^n y_i = \sum_{i=1}^n (mx_i + b) \)

Expand the right hand side

\( \sum_{i=1}^n y_i = m \sum_{i=1}^n x_i + b \sum_{i=1}^n 1 \)

\(\sum_{i=1}^n y_i = m \sum_{i=1}^n x_i + b \cdot n \)

\( b \cdot n = \sum_{i=1}^n y_i - m \sum_{i=1}^n x_i \)

\( b = \frac{\sum_{i=1}^n y_i - m \sum_{i=1}^n x_i}{n} \)

Note

\( \frac{\partial}{\partial b} \big(- (mx_i + b)\big) = -1 \)

3. Partial Derivative with Respect to \(m\)

Now, take the derivative of the cost function with respect to \(m\):

\( \frac{\partial}{\partial m} \text{Cost} = \frac{\partial}{\partial m} \sum_{i=1}^n \big(y_i - (mx_i + b)\big)^2 \)

Apply chain rule. Expand the square term (only terms involving \(m\) matter):

\( \frac{\partial}{\partial m} \big(y_i - (mx_i + b)\big)^2 = 2 \big(y_i - (mx_i + b)\big) \cdot \frac{\partial}{\partial m} \big(- (mx_i + b)\big) \)

\( \frac{\partial}{\partial m} \big(y_i - (mx_i + b)\big)^2 = 2 \big(y_i - (mx_i + b)\big) . (-x_i)\)

\( \frac{\partial}{\partial m} \big(y_i - (mx_i + b)\big)^2 = -2x_i \big(y_i - (mx_i + b)\big) \)

Applying derivative across summation

\( \frac{\partial}{\partial m} \text{Cost} = -2 \sum_{i=1}^n x_i \big(y_i - (mx_i + b)\big) \)

Set the derivative to 0 to minimize the cost

\( \sum_{i=1}^n \big(y_i - (mx_i + b)\big) = 0 \)

Solve for m

\( \sum_{i=1}^n x_i \big(y_i - (mx_i + b)\big) = 0 \)

\(\sum_{i=1}^n x_i y_i - \sum_{i=1}^n mx_i^2 - \sum_{i=1}^n bx_i = 0\)

Rearranging terms

\[ \tiny m \sum_{i=1}^n x_i^2 = \sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i \]

\[ \tiny m = \frac{\sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i}{\sum_{i=1}^n x_i^2} \]

Note

\( \frac{\partial}{\partial m} \big(- (mx_i + b)\big) = -x_i \)

Let’s look at both \(b\) and \(m\)

\[ \tiny b = \frac{\sum_{i=1}^n y_i - m \sum_{i=1}^n x_i}{n} \]

\[ \tiny m = \frac{\sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i}{\sum_{i=1}^n x_i^2} \]

Rearrange numerator of \(m\)

Replace b with \( \bar{y} - m\bar{x} \)

\( \sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i = \sum_{i=1}^n x_i y_i - (\bar{y} - m\bar{x}) \sum_{i=1}^n x_i \)

Substituting into numerator

\( \sum_{i=1}^n x_i y_i - b \sum_{i=1}^n x_i = \sum_{i=1}^n x_i y_i - (\bar{y} - m\bar{x}) \sum_{i=1}^n x_i \)

Expand the numerator

\( \sum_{i=1}^n x_i y_i - (\bar{y} - m\bar{x}) \sum_{i=1}^n x_i = \sum_{i=1}^n x_i y_i - \bar{y} \sum_{i=1}^n x_i + m\bar{x} \sum_{i=1}^n x_i \)

simplified numerator

\( \sum_{i=1}^n x_i y_i - \bar{y} \sum_{i=1}^n x_i + m \bar{x} \sum_{i=1}^n x_i \)

\( \sum_{i=1}^n x_i y_i - n\bar{y}\bar{x} + m \bar{x}^2 \) See fundamenals below for \( \sum_{i=1}^n x_i = n \bar{x} \)

Key Simplifications

Rearrange denominator of \(m\)

Replace with rearranged numerator and denomiators

\( \ m = \frac{\sum_{i=1}^n (x_i – \bar{x})(y_i – \bar{y}) + mn \bar{x}^2}{\sum_{i=1}^n (x_i – \bar{x})^2 + n \bar{x}^2} \)

How \( mn\bar{x}^2 \) and \( n\bar{x}^2 \) balance each other

Let’s take 2 examples. In one slope is 2 and in the other slope is 100. Note that in both cases the terms \( mn\bar{x}^2 \) and \( n\bar{x}^2 \) (shown in red) don’t affect the value of slope

Note that \( mn\bar{x}^2 \) and \( n\bar{x}^2 \) balance each other. Due to this we can drop them from the slope expression to make it more intuitive and focused on the relationship between the deviations o x and y from their respective means

\( \ m = \frac{\text{Cov}(x, y)}{\text{Var}(x)} = \frac{\sum_{i=1}^n (x_i – \bar{x})(y_i – \bar{y})}{\sum_{i=1}^n (x_i – \bar{x})^2} \)

Fundamentals used in above derivation

Mean of x

\(\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \)

Mean of y

\( \quad \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i \)

Sum of deviation from mean is zero

Apply Closed Form to find a line fitting linearly correlated points

Github notebook

Sum of Squared Residuals: closed form

1. Cost Function

2. Partial Derivative with Respect to \(b\)

3. Partial Derivative with Respect to \(m\)

Let’s look at both \(b\) and \(m\)

Rearrange numerator of \(m\)

Key Simplifications

Rearrange denominator of \(m\)

How \( mn\bar{x}^2 \) and \( n\bar{x}^2 \) balance each other

Fundamentals used in above derivation

Mean of x

Mean of y

Sum of deviation from mean is zero

1 thought on “Sum of Squared Residuals: closed form”

Leave a Comment Cancel Reply

Sign up to receive email updates, fresh news and more!

1. Cost Function

2. Partial Derivative with Respect to \(b\)

3. Partial Derivative with Respect to \(m\)

Let’s look at both \(b\) and \(m\)

Rearrange numerator of \(m\)

Key Simplifications

Rearrange denominator of \(m\)

How \( mn\bar{x}^2 \) and \( n\bar{x}^2 \) balance each other ​

Fundamentals used in above derivation

Mean of x

Mean of y

Sum of deviation from mean is zero

Related Posts

1 thought on “Sum of Squared Residuals: closed form”

Leave a Comment Cancel Reply

How \( mn\bar{x}^2 \) and \( n\bar{x}^2 \) balance each other