SL 4.10 — Regression Line of x on y & Prediction

Content Guidance, Clarification & Syllabus Links
Equation of the regression line of x on y. Students should know the form of the regression line and understand what it represents.
Use of the equation for prediction purposes. Students should be aware they cannot reliably predict y from x using the x-on-y equation.

1. Understanding the Regression Line of x on y

The regression line of x on y is a mathematical model used when our goal is to estimate or predict values of x given values of y.
It takes the form:

x = a + by

This line is derived using the least squares principle — the method that minimises the vertical distances between the observed x-values and the predicted ones generated by the model.
It is essential to understand that:

  • this line minimizes errors in predicting x, not y;
  • the slope depends on the correlation between x and y;
  • the regression line is not symmetric — the regression of x on y ≠ regression of y on x.

Regression analysis - Wikipedia

https://upload.wikimedia.org/wikipedia/commons/b/be/Normdist_regression.png

Example:
Suppose we have paired data values (x, y).
If the regression line of x on y is found to be:x = 3.2 + 1.5y

Then for y = 4 we estimate:
x = 3.2 + 1.5(4) = 9.2

2. Using the Regression Line for Predictions

The purpose of a regression line is to predict the most likely value of x for a given value of y.
However, predictions are only meaningful under certain conditions:

  • The relationship between x and y must be approximately linear.
  • The value of y used for prediction lies realistically within the data range (to avoid extrapolation).
  • The regression line of x on y must only be used to predict x, not y.

🧠 Examiner Tip:
The most common mistake in exams is using the x on y regression line to estimate y.
If the question asks for a predicted y-value, the regression line of y on x must be used instead.

3. The Danger of Extrapolation

Extrapolation occurs when we use values of y that are outside the observed data range.
The regression model is only valid within the limits of the data we have collected.
Beyond that range, predictions become unreliable because:

  • the relationship may not remain linear;
  • future trends may change;
  • the model cannot account for new behaviour outside the dataset.

🌍 Real-World Connection:
Regression lines are commonly used in economics for forecasting growth or inflation.
Misusing regression lines beyond data bounds can lead to poor financial predictions and failed policy decisions.

📊 IA Spotlight:
Regression analysis is excellent for an IA if you investigate a real dataset.
Make sure to justify:

  • why a linear model is suitable,
  • whether correlation is strong enough,
  • and discuss limitations such as extrapolation.

🔍 TOK Perspective:
To what extent can prediction models provide knowledge about the future?
Are mathematical predictions inherently reliable, or do they depend on assumptions that may not hold?