Key Statistical Concepts
This topic introduces the foundational language and ideas used in statistics. Students must understand the
difference between populations and samples, types of data, and potential sources of bias when interpreting
real-world data.
Population and Sample
A population refers to the entire group being studied. A sample is a smaller
subset taken from the population. Since it is usually impossible to survey every member of a population, samples
allow us to make inferences about the whole.
A random sample ensures every member has an equal chance of being selected — this reduces bias
and improves the reliability of conclusions.
Types of Data
- Discrete data: Countable values (e.g., number of goals scored).
- Continuous data: Measurable values on a continuum (e.g., height, temperature).
- Qualitative data: Non-numerical categories (e.g., eye color, brand of phone).
- Quantitative data: Numerical values that can be analyzed statistically.
Reliability of Data Sources
Not all data are trustworthy. Students must question:
- Where the data came from
- How it was collected
- Whether the sample is representative of the population
- Whether missing or inaccurately recorded data may affect results
Poor sampling or biased data collection can lead to incorrect or misleading statistical conclusions.
Interpretation of Outliers
Outliers are values that lie far outside the overall pattern of a dataset. In SL, an outlier is defined as:
A value more than 1.5 × IQR (interquartile range) from the nearest quartile.
Outliers can signal unusual events, errors in recording, or legitimate extreme values. Students must interpret
them in context rather than automatically removing them.
Sampling Techniques and Their Effectiveness
Different sampling methods are used depending on the goals and constraints of a study. Understanding these helps
evaluate the strength of conclusions drawn from data.
- Simple random sampling: Every member has equal chance of being chosen.
- Convenience sampling: Data collected from easiest sources — often biased.
- Systematic sampling: Selecting every k-th individual (e.g., every 10th person).
- Quota sampling: Selecting a specific number from subgroups.
- Stratified sampling: Dividing population into subgroups and sampling each proportionally.
Stratified sampling is generally the most reliable because it ensures representation from different groups within a
population.
🌍 Real-World Connection
- Opinion polls during elections rely heavily on sampling methods.
- Medical research uses stratified sampling to ensure results apply to all demographic groups.
- Sports analytics depend on identifying outliers to detect exceptional performance or potential errors.
🔍 TOK Perspective
- Why are mathematics and statistics sometimes treated as separate subjects? What does this say about how we classify knowledge?
- To what extent can statistics be manipulated to influence public opinion?
- If statistics provide numerical certainty, why do two studies sometimes produce contradictory conclusions?
📊 IA Spotlight
- Use random sampling from online datasets to compare variability in different populations.
- Investigate whether outliers significantly affect mean vs. median in real-world contexts (e.g., housing prices).
- Analyse bias by comparing results of convenience sampling vs. stratified sampling on the same question.
🌐 EE Focus
- An extended exploration into the mathematics of sampling errors and margin of confidence.
- Investigate historical cases where poor sampling led to incorrect predictions (e.g., 1936 Literary Digest poll).
❤️ CAS Ideas
- Survey your school using random or stratified sampling and present findings to administration.
- Create a data-awareness campaign about misleading statistics on social media.
🧠 Examiner Tip
- Always justify whether an outlier should be removed — context matters more than calculation.
- When describing data, explicitly mention whether it is discrete or continuous.
- Know at least one strength and one limitation of each sampling method.
📱 GDC Use
- Use statistical menus to detect outliers automatically using IQR.
- Graph box-plots to visually analyse spread and skewness.
- Input samples to compare the effect of removing vs. keeping outliers.