SL 4.11 — hypothesis testing, Chi Square tests and t-tests

Term / concept	Definition / short explanation
Null hypothesis (H₀)	The default claim to be tested (e.g., “no association”, “population mean = μ₀“). We assume H₀ unless data gives strong evidence otherwise.
Alternative hypothesis (H₁)	The claim we suspect may be true instead of H₀ (e.g., “μ ≠ μ₀“, “association exists”). Can be one- or two-sided.
Significance level (α)	The threshold probability for rejecting H₀ (common values 0.05, 0.01, 0.10). If p ≤ α, reject H₀.
p-value	Probability (under H₀) of obtaining data at least as extreme as observed. Small p supports H₁.
χ² statistic	Σ (Observed − Expected)² / Expected measured across cells; compares observed counts to expected under H₀.
Degrees of freedom (df)	For χ² goodness-of-fit df = k − 1 (k categories). For contingency table df = (rows − 1)(cols − 1).

📌 1. Formulating hypotheses (H₀ and H₁)

Follow these rules when writing H₀ and H₁:

State H₀ as an equality or “no effect” claim (e.g., H₀: p = 0.5, H₀: no association).
State H₁ as the alternative (e.g., H₁: p ≠ 0.5, H₁: association exists).
Decide one-tailed vs two-tailed before seeing the data (affects p-value interpretation).

🔍 TOK Perspective

Consider how the wording of hypotheses shapes evidence. Does rejecting H₀ demonstrate the alternative is true, or only that H₀ is unlikely under the data observed?

📌 2. Significance levels and p-values (interpretation)

Decision rule: choose α before testing; if p ≤ α → reject H₀; if p > α → fail to reject H₀.
p-value meaning: not the probability H₀ is true; rather, how surprising the data are if H₀ were true.
Reporting: give numeric p-value and conclusion in context (e.g., “There is evidence at the 5% level that …”).

🌍 Real-World Connection

Medical trials report p-values when testing new treatments. Policymakers must interpret small p with caution — consider effect size and sample design, not p alone.

📌 3. χ² goodness-of-fit test (categorical data)

Purpose: compare observed counts to expected counts under a specified probability model.

State H₀ (e.g., “data follow the claimed distribution”) and H₁ (“do not follow”).
Compute expected counts: Expected = n × p_category for each category.
Calculate χ² = Σ (O − E)² / E across categories.
Degrees of freedom df = k − 1 (k = number of categories). For parameters estimated from data, df reduces accordingly.
Find p-value from χ² distribution with df (use technology in exam). Compare to α.

Worked example — goodness-of-fit

A six-sided die is rolled 120 times; observed counts for faces 1–6 are: 18, 20, 19, 24, 20, 19. Test H₀: die is fair (p = 1/6 each) at α = 0.05.

Expected per face E = 120 × 1/6 = 20. χ² = Σ (O − 20)²/20 = ((−2)² + 0 + (−1)² + 4² + 0 + (−1)²)/20 = (4+0+1+16+0+1)/20 = 22/20 = 1.1.

df = 6 − 1 = 5. p ≈ 0.96 (use GDC). Since p > 0.05, fail to reject H₀; no evidence die is unfair.

📌 4. χ² test for independence (contingency tables)

Purpose: test whether two categorical variables are independent.

Form contingency table of observed counts O_ij (rows × columns).
Compute expected counts E_ij = (row total × column total) / grand total.
Compute χ² = Σ_cells (O_ij − E_ij)² / E_ij.
Degrees of freedom df = (r − 1)(c − 1). Use technology to get p-value and conclusion.
Check expected counts: best practice expected ≥ 5; if several expected ≤ 5, interpret χ² with caution (consider Fisher’s exact test for small tables).

📐 IA Spotlight

Use contingency tables when investigating relationships in survey data (e.g., gender vs. preference). Show how expected counts are computed and discuss limitations when expected counts are small.

Worked example — independence (2×2)

Surveyed 100 students for (A) studies online (Yes/No) and (B) prefers recorded lectures (Yes/No). Observed:

Table: rows = Online study Yes (30), No (70); columns = Prefers recorded Yes (40), No (60).

Expected for cell (Yes, Yes): E = (row total 30 × column total 40) / 100 = 12. Compute χ² across 4 cells (use GDC). df = (2−1)(2−1)=1. Compare p-value to α.

Yates continuity correction: sometimes applied for 2×2 tables with small counts to reduce χ² bias. In exams, technology will usually handle this; mention continuity correction if counts are small.

📌 5. The t-test (comparing two means) — SL perspective

SL conditions: two independent (unpaired) samples; population variances unknown and assumed equal → use pooled two-sample t-test. Technology computes t and p.

Hypotheses: example H₀: μ₁ = μ₂; H₁: μ₁ ≠ μ₂ (two-sided) or >/< for one-sided.
Assumptions: both populations approx normal (especially important for small samples), independent samples, equal variances (pooled t-test).
Test statistic (pooled): technology computes t and df ~ n₁ + n₂ − 2; report p and conclude.

🧠 Examiner Tip

Always state H₀ and H₁ clearly (equation / inequality).
Show method: write which test was used (e.g., “pooled two-sample t-test”) and justify it (independence, approx normal, equal variances).
Include numeric result and context: show t, df (if asked), p-value and interpret in plain language (conclusion about means in context).

📌 6. Use of technology and practical advice

In examinations use GDC or software to compute χ², t and p-values — display key intermediate values (observed & expected counts, t-statistic) for clarity.
Always check assumptions: expected counts in χ², normality and equal variances for t-test. If assumptions fail, mention limitations.
For small sample counts in χ² (expected ≤ 5), note that results may be unreliable and consider alternative tests (Fisher exact for 2×2).

❤️ CAS Link

Run a small community survey (e.g., about transport choices) and use χ² tests to check associations. Present results and discuss limitations of small expected counts.

Worked example — two-sample t-test (illustrative)

Sample A: n₁=12, mean = 50, s = 5. Sample B: n₂=14, mean = 46, s = 6. Test H₀: μ₁=μ₂ at α = 0.05.

Use technology (LinReg / T-Test): pooled t ≈ 2.03, df = 24, p ≈ 0.053 (two-sided) → p slightly above 0.05 so fail to reject H₀; no strong evidence means differ (mention exact p and context).

🌐 EE Focus

Explore statistical testing choices in an EE: comparing χ² vs Fisher for small counts, or studying the robustness of t-tests to non-normality with simulations.

📌 Quick summary & checklist

Write H₀ and H₁ clearly, state α.
Choose correct test: χ² goodness-of-fit (categorical vs model), χ² independence (contingency), t-test for two means (SL conditions).
Check assumptions (expected counts, normality, equal variances). Use technology for calculations and give contextual interpretation of p.
When small expected counts appear, mention Yates/Fisher as appropriate and highlight limitations.

📝 Paper tips — hypothesis tests

Label everything: show O and E tables, state df, give χ², p and conclusion in context.
When using technology: still present the formula or intermediate E-values to earn method marks.
Always interpret: end with a one-line sentence linking conclusion to the real-world context of the question.
At the end of the sum: The way to conclude is to say “We do/do not have enough evidence to reject null hypothesis”(very important to remember)

📌 SL 4.11 — Hypothesis Testing, Chi-Square Tests & t-Tests

Multiple Choice Questions

MCQ 1
A hypothesis test is carried out at the 5% significance level. The p-value obtained is 0.032.
Which conclusion is correct?

A. Accept H₀ because the p-value is small
B. Reject H₀ because the p-value is less than 0.05
C. Accept H₁ because the p-value is greater than 0.05
D. Do not reject H₀ because the result is inconclusive

Answer & Explanation

Correct answer: B

In hypothesis testing, the decision rule using the p-value is:

If p-value < α, reject H₀.

Here, the p-value is 0.032 and the significance level is α = 0.05.
Since 0.032 < 0.05, the result is considered statistically significant.

This means the observed data are unlikely under the assumption that H₀ is true,
so we state ” we have enough evidence to reject the null hypothesis”.

MCQ 2
Which of the following situations is most appropriate for a chi-square test for independence?

A. Comparing two population means with unknown variance
B. Testing whether a die is fair
C. Testing whether two categorical variables are related
D. Estimating a confidence interval for a mean

Answer & Explanation

Correct answer: C

A chi-square test for independence is used when:

Both variables are categorical
Data are presented in a contingency table
We want to see whether the variables are associated or independent

Options A and D involve means, which require t-tests.
Option B refers to a chi-square goodness-of-fit test, not a test for independence.

MCQ 3
In a one-sample t-test, which assumption is required?

A. The population standard deviation must be known
B. The population must be normally distributed
C. The sample size must be greater than 30
D. The data must be categorical

Answer & Explanation

Correct answer: B

A one-sample t-test is used when the population standard deviation is unknown.

The key assumption is that the population distribution is normal, especially for small sample sizes.
If the sample size is large, the test is more robust, but normality is still the formal assumption in IB.

Short Answer Questions

Short Question 1
Explain what is meant by a Type I error in hypothesis testing.

Model Answer

A Type I error occurs when the null hypothesis H₀ is rejected even though it is actually true.

In other words, it is a false positive result, where the test suggests evidence for an effect or difference
that does not truly exist.

The probability of making a Type I error is equal to the chosen significance level α.
For example, if α = 0.05, there is a 5% chance of rejecting a true null hypothesis.

Short Question 2
State two conditions required for a chi-square test to be valid.

Model Answer

First, all expected frequencies in the contingency table should be sufficiently large,
typically at least 5, to ensure the chi-square approximation is valid.

Second, the observations must be independent, meaning that each individual or outcome
contributes to only one cell of the table.

If these conditions are not met, the conclusions of the test may not be reliable.

Long Answer Questions

Long Question 1 — One-Sample t-Test

A manufacturer claims that the mean lifetime of a certain type of battery is 120 hours.
A random sample of 10 batteries has a mean lifetime of 114 hours with a sample standard deviation of 8 hours.

(a) State the null and alternative hypotheses.
(b) Explain why a t-test is appropriate.
(c) Determine the test statistic.
(d) State the conclusion at the 5% significance level.

Full Worked Solution

(a) Hypotheses

H₀: μ = 120
H₁: μ < 120

The alternative hypothesis reflects suspicion that the true mean lifetime is less than the advertised value.

(b) Choice of test

The population standard deviation is unknown and the sample size is small.
Therefore, a one-sample t-test is appropriate.

(c) Test statistic

t = (x̄ − μ₀) / (s / √n)

t = (114 − 120) / (8 / √10) ≈ −2.37

(d) Conclusion

Using the GDC, the p-value corresponding to t ≈ −2.37 with 9 degrees of freedom is less than 0.05.

Since p-value < 0.05, we reject H₀.
There is sufficient evidence at the 5% level to suggest that the mean battery lifetime is less than 120 hours.

Long Question 2 — Chi-Square Test for Independence

A school records whether students prefer online or in-person learning, classified by gender.
The results are shown in a contingency table.

(a) State the null and alternative hypotheses.
(b) Explain how expected frequencies are calculated.
(c) Describe how the test statistic is obtained.
(d) Interpret a decision to reject H₀.

Full Worked Solution

(a) Hypotheses

H₀: Gender and learning preference are independent.
H₁: Gender and learning preference are not independent.

(b) Expected frequencies

Expected frequency = (row total × column total) / grand total.

This represents the frequency we would expect if the variables were truly independent.

(c) Test statistic

The chi-square statistic is calculated using:

χ² = Σ ( (Observed − Expected)² / Expected )

Each cell’s contribution is summed to obtain the final test statistic.

(d) Interpretation

If H₀ is rejected, this indicates there is a statistically significant association
between gender and learning preference.

This does not imply causation, only that the variables are related in the population.

SL 4.11 — hypothesis testing, Chi Square tests and t-tests

📌 1. Formulating hypotheses (H0 and H1)

📌 2. Significance levels and p-values (interpretation)

📌 3. χ2 goodness-of-fit test (categorical data)

📌 4. χ2 test for independence (contingency tables)

📌 5. The t-test (comparing two means) — SL perspective

📌 6. Use of technology and practical advice

📌 Quick summary & checklist

📌 SL 4.11 — Hypothesis Testing, Chi-Square Tests & t-Tests

Multiple Choice Questions

Short Answer Questions

Long Answer Questions

📌 1. Formulating hypotheses (H₀ and H₁)

📌 3. χ² goodness-of-fit test (categorical data)

📌 4. χ² test for independence (contingency tables)