| Content | Guidance, Clarification & Syllabus Links |
|---|---|
| Equation of the regression line of x on y. | Students should know the form of the regression line and understand what it represents. |
| Use of the equation for prediction purposes. | Students should be aware they cannot reliably predict y from x using the x-on-y equation. |
1. Understanding the Regression Line of x on y
The regression line of x on y is a mathematical model used when our goal is to estimate or predict values of x given values of y.
It takes the form:
x = a + by
This line is derived using the least squares principle — the method that minimises the vertical distances between the observed x-values and the predicted ones generated by the model.
It is essential to understand that:
- this line minimizes errors in predicting x, not y;
- the slope depends on the correlation between x and y;
- the regression line is not symmetric — the regression of x on y ≠ regression of y on x.

https://upload.wikimedia.org/wikipedia/commons/b/be/Normdist_regression.png
Suppose we have paired data values (x, y).
If the regression line of x on y is found to be:x = 3.2 + 1.5y
Then for y = 4 we estimate:
x = 3.2 + 1.5(4) = 9.2
2. Using the Regression Line for Predictions
The purpose of a regression line is to predict the most likely value of x for a given value of y.
However, predictions are only meaningful under certain conditions:
- The relationship between x and y must be approximately linear.
- The value of y used for prediction lies realistically within the data range (to avoid extrapolation).
- The regression line of x on y must only be used to predict x, not y.
The most common mistake in exams is using the x on y regression line to estimate y.
If the question asks for a predicted y-value, the regression line of y on x must be used instead.
3. The Danger of Extrapolation
Extrapolation occurs when we use values of y that are outside the observed data range.
The regression model is only valid within the limits of the data we have collected.
Beyond that range, predictions become unreliable because:
- the relationship may not remain linear;
- future trends may change;
- the model cannot account for new behaviour outside the dataset.
Regression lines are commonly used in economics for forecasting growth or inflation.
Misusing regression lines beyond data bounds can lead to poor financial predictions and failed policy decisions.
Regression analysis is excellent for an IA if you investigate a real dataset.
Make sure to justify:
- why a linear model is suitable,
- whether correlation is strong enough,
- and discuss limitations such as extrapolation.
To what extent can prediction models provide knowledge about the future?
Are mathematical predictions inherently reliable, or do they depend on assumptions that may not hold?