SL 4.2 — Presentation of Data and Distribution Diagrams

This topic focuses on how to organise and display data so that patterns,
clusters and outliers can be seen clearly. You should understand how to construct and interpret
frequency tables, histograms, cumulative frequency graphs and box-and-whisker diagrams.

Frequency distributions (tables)

Discrete and continuous data

For discrete data (for example, number of goals, number of siblings), each distinct value can be
listed along with its frequency.
For continuous data (for example, height, reaction time), individual values are grouped into
class intervals, such as 150 ≤ height < 160.

Class intervals and frequency

A frequency distribution table shows:

each value or class interval, and
the corresponding number of observations (frequency).

In IB exams, class intervals are usually given as inequalities without gaps,
ensuring that every data value belongs to exactly one class.

🌍 Real-world connection

Frequency tables are used in sciences (for recording repeated measurements),
economics (income brackets), and social sciences
(age groups, survey responses). They are often the first step before creating
histograms or box plots.

Histograms

Basic idea

A histogram is a diagram used mainly for continuous data.
Along the horizontal axis we show the class intervals; along the vertical axis we show the
frequency. Each class interval is represented by a bar whose height equals its frequency.
In this course you only need frequency histograms with equal class widths
(frequency density histograms are not required).

Reading a histogram

The shape shows whether the distribution is symmetric, skewed, unimodal or bimodal.
The highest bars indicate where data values are most common.
Very low or isolated bars may indicate outliers or rare events.

🌍 Real-world connection

Histograms are widely used to check whether data may be approximately
normally distributed, for example when analysing test scores,
measurement errors in physics experiments, or daily returns in finance.

Cumulative frequency and cumulative frequency graphs

Cumulative frequency

The cumulative frequency for a class is the total number of data values
up to and including that class. It shows how many observations lie below a given boundary.

To construct a cumulative frequency table, add frequencies progressively down the table.
For grouped data, use the upper class boundary of each class.

Cumulative frequency graph

A cumulative frequency graph plots cumulative frequency (vertical) against the upper class boundary (horizontal).
You then join the points with a smooth curve or straight line segments.

From the graph you can estimate:

Median: point where cumulative frequency is half of the total.
Lower quartile (Q₁): cumulative frequency at 25% of total.
Upper quartile (Q₃): cumulative frequency at 75% of total.
Percentiles: e.g. the 90th percentile where 90% of data lie below.
Range and interquartile range (IQR): read approximate minimum, maximum, Q₁, Q₃ from the graph.

📊 IA spotlight

Cumulative frequency graphs are excellent for comparing two groups, such as
reaction times of different age ranges or test scores of two different classes.
You can use medians and quartiles from ogives to comment on central tendency and
spread in your Internal Assessment.

Box-and-whisker diagrams

Constructing a box plot

A box-and-whisker diagram (box plot) summarises a distribution using
five key values:

Minimum value (or lower end, excluding outliers),
Lower quartile Q₁,
Median,
Upper quartile Q₃,
Maximum value (or upper end, excluding outliers).

The box spans from Q₁ to Q₃ and contains the middle 50% of data.
The whiskers extend to the minimum and maximum non-outlier values.
Any outliers (more than 1.5×IQR from the nearest quartile) are shown with a cross.

Comparing distributions with box plots

Box plots are particularly useful for comparing two or more data sets.
When comparing, comment on:

Median: which group tends to have higher or lower values?
IQR: which group has more variation in the middle 50% of data?
Range: how wide is the entire spread?
Symmetry: is the median centred in the box (approximately symmetric) or closer to one side (skewed)?
Outliers: are there any extreme values that may affect interpretation?

🔍 TOK perspective

Box plots compress large data sets into just a few numbers.
This makes patterns easier to see, but also hides detail.
To what extent does summarising data improve understanding, and to what extent might it
oversimplify reality?

Indications of normality

A roughly symmetric box plot, where the median is near the centre of the box and the whiskers are of
similar length, suggests that the data may follow a distribution close to normal.
Strong skewness or several outliers indicate that the data are unlikely to be normally distributed.

🌍 Connection to other subjects

In science, histograms and box plots summarise laboratory measurements;
in geography, they model rainfall or population data;
in business and economics, they are used to present income distributions,
sales figures and risk.

🧠 Examiner tip

Always label axes clearly and show class boundaries for histograms and cumulative frequency graphs.
When comparing box plots, refer specifically to median, IQR, range and any outliers instead of vague phrases like “more spread out”.
Check that your frequencies add up to the total number of data values.

📱 GDC use

Use the statistics menu of your GDC to generate histograms, cumulative frequency graphs and box-and-whisker plots quickly.
Let the GDC compute quartiles, median and IQR, then sketch neat graphs by hand in exams if technology is not allowed.