Briefly, ŷ=a+bx is the regression equation based on (x,y) data pairs, so ŷ is an estimated output value for some input value x. The hat symbol is used to distinguish estimated from actual. In a dataset, the y values are actually observed values.
So how are a and b calculated? Here are the formulas:
Slope, gradient or rate of change, b=(n∑xiyi-∑xi∑yi)/(n∑xi2-(∑xi)2). Intercept a=(∑yi-b∑xi)/n for 1≤i≤n, where n is the number of data pairs.
We need a table to illustrate this:
x |
y |
x2 |
xy |
53 |
68 |
2809 |
3604 |
112 |
196 |
12544 |
21952 |
435 |
272 |
189225 |
118320 |
509 |
344 |
259081 |
175096 |
642 |
396 |
412164 |
254232 |
955 |
452 |
912025 |
431660 |
∑xi=2706 |
∑yi=1728 |
∑xi2=1787848 |
∑xiyi=1004864 |
(∑xi)2=7322436 |
∑xi∑yi=4675968 |
n∑xi2=10727088 |
n∑xiyi=6029184 |
Putting all these figures together into the formulas where n=6 we get a=108.7451, b=0.3975 (approx).
Take x=435, then ŷ=108.7451+0.3975×435=282 approx, compared with 272 actually.
(The correlation coefficient, r=0.952 approx, which is very good.)