The analysis-of-variance (ANOVA) approach — whose purpose is mainly to analyyze the quality of the estimated regression — is based on the so-called partioning of sums of squares, whose formula is as follows [Walpole et al., p. 415]
![]()
In short form, the above formula is indicated as
SST = SSR + SSE
The purpose of this short article is to provide the proof for the above formula — the so-called partition of sums of squares — for the case of regression involving a single independent variable x.
First of all, we start from the expansion of the left-hand side of formula (1.1),
![]()
![]()
![]()
where residuals are indicated by the epsilon character. Recall that the formula for the fitted regression line (involving a single independent variable x) is
![]()
Parameters a and b are estimated by the so-called method of least squares, which involves the minimization of the error sum of squares SSE, which means that the derivatives of SSE with respect to a and b are both set to 0:
![Rendered by QuickLaTeX.com \[ SSE = \sum_{i=1}^{n}\hat{\epsilon}_{i}^2 = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 = \sum_{i=1}^{n}(y_i - a - bx_i)^2 \hspace{1cm} (1.4) \]](https://www.fabrizioberloco.com/wp-content/ql-cache/quicklatex.com-9261ce78cb66448e412373d5eb2dcf39_l3.png)
![Rendered by QuickLaTeX.com \[ \frac{\partial (SSE)}{\partial a} = -2 \sum_{i=1}^{n}(y_i - a - bx_i) = 0 \implies \bar{y} = a + b\bar{x} \hspace{1cm} (1.5) \]](https://www.fabrizioberloco.com/wp-content/ql-cache/quicklatex.com-5156b903f6b7f8cd2f8e82b6c92af1e8_l3.png)
![Rendered by QuickLaTeX.com \[ \frac{\partial (SSE)}{\partial b} = -2 \sum_{i=1}^{n}(y_i - \hat{y}_i)x_i = -2 \sum_{i=1}^{n} \hat{\epsilon}_i x_i = 0 \implies \sum_{i=1}^{n} \hat{\epsilon}_i x_i = 0 \hspace{1cm} (1.6) \]](https://www.fabrizioberloco.com/wp-content/ql-cache/quicklatex.com-54e84369c89e595fa6943ba93295c7ae_l3.png)
Equation (1.2) can be rewritten as:
![Rendered by QuickLaTeX.com \[ \sum_{i=1}^{n}(y_i - \bar{y})^2 = \sum_{i=1}^{n}(\hat{y}_i - \bar{y})^2 + \sum_{i=1}^{n}\hat{\epsilon}_{i}^2 + 2 \sum_{i=1}^{n} (\hat{y}_i - \bar{y})\hat{\epsilon}_{i} = \hspace{1cm} (1.7) \]](https://www.fabrizioberloco.com/wp-content/ql-cache/quicklatex.com-5546ebdca2d2172e5c10a94f3ac76606_l3.png)
![]()
![]()
Recalling equation (1.6)
![]()
The sum of errors is expected to be almost zero:
![]()
We replace those values in (1.7) and get:
![]()
which is exactly what we wanted to prove.
References
- Walpole R.E., Meyers R.H., Myers S. L., Ye K. Probability & Statistics for Scientists and Engineers – Eighth Edition. Pearson Prentice Hall, 2007. ISBN 0-13-187711-9.

