Title: Statistics
1Statistics
What are statistics and why do we use them?
Statistics help to make sense of numbers that
have been collected.
For example, if there was a survey into the size
of feet at your school, after asking everyone
their size you would end up with hundreds of
random numbers!!
- Using statistics you could sort the numbers out
to find - the most common size
- the mean
- the range of sizes.
There are many other, more complex, ways that are
used to evaluate data which we go into in
Intermediate 2.
2Statistics
At Intermediate 1 level you covered basic
statistics including
- finding the mean, median, range and mode from a
set of numbers
- finding the mean, median, range and mode from a
frequency table
- finding the probability of an event occurring.
In Intermediate 2 you build on this work to
calculate other values that are used to evaluate
data.
Before we go any further into the new work, we
will go over the Intermediate 1 work.
3Statistics
The range is used to measure how widely spread a
set of values are
range highest value lowest value
Example
Stephen played 12 holes at his local golf club
and recorded his scores. What was his range of
scores?
4, 3, 4, 6, 5, 3, 8, 6, 9, 2, 3, 7
range highest value lowest value
9 2
7
The next day he played another 12 holes. What
is his range of scores now?
Example
2, 9, 10, 9, 13, 12, 12, 11, 1, 3, 4, 2
range highest value lowest value
13 1
12
4Statistics
The average number from a set of numbers can be
calculated using three different methods.
1. The mode is the most common number.
Example A group of pupils were asked how many
kilometres they could run.
mode 4
5Statistics
- The mean is found by adding all the numbers
together, then dividing by the number of pieces
of data.
Example 11 people were asked how much pocket
money they got. What is the mean
amount?
3, 4, 2, 5, 3, 6, 8, 5, 5, 7, 7
mean 3 4 2 5 3 6 8 5 5 7
7 11 55 5 11
6Statistics
3. The median is the middle number.
To see which number is in the middle you have to
put them in order.
Example Julie saved up some of her pocket money
over 11 weeks for an iPod Touch. What is the
median amount she saved each week?
3, 4, 2, 5, 3, 6, 8, 4, 5, 6, 7
Rearrange 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 8
median 5
Once youve rearranged the numbers, count them to
make sure you havent missed any of them out.
There are 11 numbers here, so the median will be
the 6th number.
7Statistics
If you have an even number of amounts, the median
will be between two numbers. To calculate this
value, find the mean of the two middle numbers.
Example Linda saved up some of her pocket
money over a 10- week period for a Wii.
What was the median amount that she saved?
3, 4, 2, 6, 3, 6, 8, 4, 6, 7
Rearrange 2, 3, 3, 4, 4, 6, 6, 6, 7, 8
There are 10 numbers here, so the median will be
between the 5th and 6th numbers
The median lies between 4 and 6. To calculate
this value, find the mean of these two numbers.
median 5
8Statistics
Write in your jotters the range, mode, median and
mean of the following sets of numbers.
- The following are the distances jumped in a
school sports day - in metres.
- 4, 3, 5, 6, 4, 5, 6,
7, 8, 12, 6 - The following numbers are the maths scores in an
S2 class. - 17, 12, 13, 12, 14, 15, 16, 17,
18, 17, 17, 20 - The following are times for the 100m sprint (in
seconds). - 24.5, 19.86, 21.15, 15.04, 15.10, 16.80,
20, 19.86, 14.22 - The following are minimum temperatures (C) in
Glasgow - measured over one week.
- 0, 3, 1, 0, 0, 0, 4,
3, 2, 3
1) 2) 3) 4)
9Statistics
Now check that your answers are correct.
1) Range 9 mode 6 median 6
mean 6
2) Range 8 mode 17 median 16.5
mean 15.67
3) Range 10.28 mode 19.86 median
19.86 mean 18.5
4) Range 4 mode 0 median 1.5
mean 1.6
10Frequency tables
It is possible to calculate the mean, mode and
median from a frequency table by adding a third
column to it.
Number of items x Frequency f f ? x
1 7 1 ? 7 7
2 4 2 ? 4 8
3 3 3 ? 3 9
4 4 4 ? 4 16
5 4 5 ? 4 20
The values for the third column are found by
multiplying the values in column 1 (x) with the
values in column 2 (f).
The mode is the most common number 1
The number 1 appears seven times.
Because there are 20 numbers, the median will be
between the 10th and 11th numbers, in this case 2.
11Frequency tables
Number of items x Frequency f f ? x
1 7 1 ? 7 7
2 4 2 ? 4 8
3 3 3 ? 3 9
4 4 4 ? 4 16
5 4 5 ? 4 20
Totals 20 60
To calculate the mean you use this formula
? stands for sum of
12Frequency tables
Copy and complete the frequency table to work out
the mean, mode and median of the number of cars
in a group of pupils homes.
Number of cars x Frequency f f ? x
1 4
2 9
3 10
4 6
5 1
Totals
13Frequency tables
Now check that your answers are correct.
Number of cars x Frequency f f ? x
1 4 4
2 9 18
3 10 30
4 6 24
5 1 5
Totals 30 81
Mean 2.7 Mode 3 Median 3
14Frequency tables
Copy and complete the frequency table to work out
the mean, mode and median of the shoe sizes of S1
pupils.
Shoe size x Frequency f f ? x
3 1
4 12
5 11
6 14
7 8
Totals
15Frequency tables
Now check that your answers are correct.
Shoe size x Frequency f f ? x
3 1 3
4 12 48
5 11 55
6 14 84
7 8 56
Totals 46 246
Mean 5.3 Mode 6 Median 5
16Cumulative frequency
A cumulative frequency column can be added to a
frequency table to keep a running total of the
frequencies.
A group of parents were asked how many children
they each had.
Number of children x Frequency f Cumulative frequency
1 6
2 7
3 5
4 3
5 1
The 21 tells you that 21 parents had 4 or fewer
children
6
13
(6 7)
18
(13 5)
21
(18 3)
(21 1)
22
22
You can easily work out the median from a
cumulative frequency column.
There were 22 parents asked so the median is
between the 11th and 12th people asked.
6 parents had 1 child and 13 had 2 or less.
Therefore if the median is between the 11th and
12th it must be 2 children.
17Cumulative frequency
The number of pairs of shoes owned by 5th year
girls is shown below. Copy and complete the
cumulative frequency table.
Number of shoes x Frequency f Cumulative frequency
4 9
5 12
6 8
7 11
8 6
1) How many girls owned fewer than 6 pairs of
shoes? 2) What was the median number of shoes
owned by the girls?
18Cumulative frequency
Now check that your answers are correct.
Number of shoes x Frequency f Cumulative frequency
4 9 9
5 12 21
6 8 29
7 11 40
8 6 46
46
1) 21 girls 2) Median 6
19Cumulative frequency
A group of 4th year boys were asked how many
hours a week they spent playing computer games.
Copy and complete the cumulative frequency table.
Number of hours x Frequency f Cumulative frequency
5 2
6 5
7 12
8 17
9 20
1) How many boys played games for less than 8
hours a week? 2) What was the median number of
hours spent playing computer games?
20Cumulative frequency
Now check that your answers are correct.
Number of hours x Frequency f Cumulative frequency
5 2 2
6 5 7
7 12 19
8 17 36
9 20 56
56
1) 19 boys 2) Median 8
21Quartiles
To order a set of numbers into quartiles, we
first of all have to put the numbers in order
from the lowest to the highest.
10, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14,
15, 15, 30
Q1
Q2
Q3
The median splits the numbers into two equal
parts and is the second Quartile, Q2
To calculate what the other two quartiles, Q1 and
Q3, are, you calculate the median of the upper
and lower halves.
The median of the lower half is called Q1.
The median of the upper half is called Q3.
The quartiles must divide the numbers into four
groups with the same amount of numbers in each
group, in this case groups of three.
22Quartiles
If you have a larger group of numbers, it might
not be so easy to find which number to look to
for the median and the quartiles.
The following rule will help you decide, no
matter how many numbers you have.
- Divide the number of values by 4.
2. Your answer will tell you how many numbers
will be in each group.
3. The remainder will tell you how many extra
values there are. This will be 0, 1, 2 or 3.
Example 1
12 numbers 2, 3, 3, 3, 4, 6, 7, 7,
8, 8, 8, 9
12 4 3 r 0, therefore there will be 3 in each
quarter, with 0 extra values to be fitted in.
2, 3, 3, 3, 4, 6, 7, 7, 8, 8, 8, 9
Q1 3
Q2 6.5
Q3 8
23Quartiles
Example 2
13 numbers 0, 1, 2, 2, 2, 2, 3, 5,
6, 7, 7, 7, 9
13 4 3 r 1, therefore there will be 3 in each
quarter, with 1 extra value to be fitted in
symmetrically.
0, 1, 2, 2, 2, 2, 3, 5, 6, 7, 7, 7, 9
Q1 2
Q3 7
Q2 3
Example 3
14 numbers 0, 0, 0, 1, 1, 2, 2, 3,
4, 4, 5, 5, 5, 6
14 4 3 r 2, therefore there will be 3 in each
quarter, with 2 extra values to be fitted in
symmetrically.
0, 0, 0, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5, 6
Q1 1
Q2 2.5
Q3 5
24Quartiles
Example 4
15 numbers 0, 1, 2, 3,
3, 3, 3, 4, 5, 5, 6, 7, 7, 8, 8
15 4 3 r 3, therefore there will be 3 in each
group of 4, with 3 extra values to be fitted in
symmetrically.
0, 1, 2, 3, 3, 3, 3, 4, 5, 5, 6, 7, 7, 8, 8
Q1 3
Q2 4
Q3 7
25Five-figure summary
A five-figure summary is a summary of a set of
numbers.
The five figures are the three quartiles (Q1, Q2
and Q3) together with the highest and lowest
numbers.
10, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14,
15, 15, 30
Q1
Q2
Q3
Using the previous example showing how to
calculate the quartiles, the five-figure summary
is as follows
highest 30 lowest 10 Q1 12 Q2 13 Q3 14
26Five-figure summary
In your jotters, write down the five-figure
summary for each set of numbers.
- 1) 14 pupils in S4 were asked their shoe size.
- 4, 5, 3, 6, 4, 7, 4, 5, 6, 6,
4, 7, 8, 9 - 2) A group of paper boys were asked how much
they earn a week. - 13, 14, 15, 12, 16, 17, 18, 18, 18, 19,
29, 39, 38, 37, 36, 37
4) A group at the swimming pool were asked their
ages. 12, 8, 7,
19, 23, 25, 20, 14
27Five-figure summary
Now check that your answers are correct.
- 1) Minimum 3 Maximum 9 Q1 4 Q2
5.5 Q3 7 -
- 2) Minimum 12 Maximum 39 Q1 15.5 Q2
18 Q3 36.5 - 3) Minimum 0 Maximum 42 Q1 12 Q2
22 Q3 28 - 4) Minimum 7 Maximum 25 Q1 10 Q2
16.5 Q3 21.5
28The range
Up until now, when we calculated the range of a
set of numbers, we took the lowest number from
the highest.
In certain situations, however, this will not
give an accurate reflection of the spread of the
numbers.
For example, here are the ages of a group of
children in the scouts and their leader.
10, 11, 12, 14, 13, 15, 13, 12, 11, 12, 14, 15,
14, 13, 30
The range here is the highest 30 take away the
lowest - 10 20
All of the children are aged between 10 and 15.
The leader of the group is 30 and this gives a
false impression of how widely spread the ages
are.
The range only uses the two end ages and
disregards all the others.
Another measure of spread is the
semi-interquartile range, which takes into
account more of the numbers to give a more
accurate and relevant result.
29The semi-interquartile range
Now that we know how to work out the quartiles,
we can calculate the semi-interquartile range.
Using the example of the scout group, we found
that the range was 20 years.
However, because one person was so much older
than the rest, this was not an accurate
reflection of the range of ages.
10, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14,
15, 15, 30
Q1
Q2
Q3
To calculate the semi-interquartile range, you
find the difference between the upper quartile Q3
and the lower quartile Q1, and then halve your
answer
30The semi-interquartile range
This way of working out the range is often
preferred to just taking the lowest from the
highest as you do for the range.
The reason for this is that it takes into account
more of the numbers in the data and it also
disregards what can sometimes be extreme high or
low numbers that are not typical of the data.
If you ever forget the formula for calculating
the semi- interquartile range, you could
construct it by breaking down the words.
Interquartile range is the range between the
upper and lower quartile. To calculate the
semi, divide your answer by 2.
31The semi-interquartile range
1) A group of 20 pupils were asked how much
pocket money they got each week.
7, 7, 2, 2, 3, 4, 5, 9, 10, 3, 4, 5,
6, 6, 7, 7, 7, 6, 3, 3
In your jotters, write down the five-figure
summary, range and semi-interquartile range.
2) A group of 15 pupils were asked what their
shoe size was.
12, 9, 2, 3, 7, 8, 5, 5, 6, 8, 9, 10,
5, 4, 6
In your jotters, write down the five-figure
summary, range and semi-interquartile range.
32The semi-interquartile range
Now check that your answers are correct.
- 1) minimum 2 maximum 10 Q1 3 Q2
5.5 Q3 7 - range 8 semi-interquartile range 2
- 2) minimum 2 maximum 12 Q1 5 Q2 6
Q3 9 - range 10 semi-interquartile range 2
33Comparing sets of data
Very often the reason for using statistics is to
compare two or more sets of results.
Once you have statistics for two or more sets of
data, you can make statements based on the
results.
Example
As part of a school project, pupils from two
schools were asked how much pocket money they
received each week.
Quahog School 8, 6, 5, 4, 6, 5, 7, 10,
10, 7, 8, 9, 7
Springfield Elementary 12, 7, 3, 4, 2, 3, 4,
4, 5, 2, 6, 3, 4
Quahog Mean 7.08 Median 7.00
Springfield Mean 4.92 Median 4.00
34Comparing sets of data
By calculating the mean and median from each set
of data, what statements can be made about how
much each child receives?
Quahog Mean 7.08 Median 7.00
Springfield Mean 4.92 Median 4.00
By looking at the mean and median of both sets of
data, we can see that the children at Quahog are
given more pocket money on average than the
children that go to Springfield Elementary. The
mean and median are similar in each school, which
suggests that they are both a good indication of
the average given to each child.
35Comparing sets of data
By comparing the mean, median and range of the
following sets of data, what statements can be
made about the data?
1) Two companies that produce boxes of paper
clips claim that they provide their
customers with more paper clips in each box.
The boxes cost the same from each company. Clips
R Us 102, 106, 101, 100, 99, 92,
96, 100, 101, 110, 90 Pippas Clippas 87,
120, 104, 102, 100, 98, 97, 100, 101,
102, 95
2) A teacher wanted to compare the marks of her
two first-year classes. What conclusions can
you make about the scores? Class 2A 18, 19,
17, 18, 17 ,17, 18, 18, 19, 17, 20, 16, 13,
12 Class 2B 20, 20, 19, 3, 2, 4, 6, 10, 11,
3, 2, 15, 16, 17
36Comparing sets of data
Now check that your answers are correct.
1) Clips R Us Mean 99.7 Median
100 Range 20 Pippas Clippas Mean
100.5 Median 100 Range 33 By comparing
the median we can see no difference in the
results. The mean shows that Pippas Clippas have
slightly more on average in each box. However,
the range is much bigger, meaning that the amount
in each box could vary by a fairly large amount
in comparison to Clips R Us. 2) Class 2A
Mean 17.1 Median 17.5 Range 8
Class 2B Mean 10.6 Median 10.5
Range 18 The mean for each class tells us that
class 2A achieved a higher mark on average than
2B and the median backs this up. The range in 2B
is very high, suggesting that while some people
did very well, others did very poorly. The range
in 2B shows that the scores that each pupil
achieved were closer together, suggesting that in
this class pupils are closely matched in ability.
37Box plots
Once you have a five-figure summary you can
represent the information on a box plot.
Using the example earlier, on the scout trip we
calculated that the quartiles were 12, 13 and 14.
10, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14,
15, 15, 30
Q1
Q2
Q3
We can see from the list that the lowest number
is 10
and the highest number is 30.
This information can be represented on a box plot.
38Standard deviation
So far we have looked at two methods for checking
the spread of numbers the range and the
semi-interquartile range.
The last measure of spread of data we are going
to look at is called standard deviation.
The reason that we need to use another method is
because of the limitations of the range and
semi-interquartile range, which are
- the range only uses the two end values, ignoring
every other value - the semi-interquartile range totally disregards
the two end-values.
The standard deviation is the most accurate
measure of spread because it takes into account
all of the numbers.
When you work out the standard deviation you
obtain a number.
This number tells you how far away on average
each of the values are from the mean.
39Standard deviation
To work out the standard deviation of a group of
numbers we are going to divide the calculation
into four steps.
Example The following group of numbers is how
late the bus was (in minutes) each day as
George went to work one week.
23, 15, 7, 8, 7 Calculate the mean and
the standard deviation.
Step 1 Calculate the mean. Each value when we
use standard deviation is represented with an x.
mean (the sum of all the x values) (the
number of values)
We are going to be using some new notation for
this
40 23, 15, 7, 8, 7
is the mean
Step 2 Now we draw a table to see how far
each value is from the mean.
x
23 23 12 11 121
15 15 12 3 9
7 7 12 -5 25
8 8 12 -4 16
7 7 12 -5 25
To get round this problem we square each value.
The negatives disappear and we add an extra
column to the table.
41Step 3 We now find the mean of the numbers in
the last column. For standard deviation we
divide the total by the number of values
minus 1. In this case 5 1 4.
( 121 9 25 16 25) 4 49
Step 4 Remember that we squared the numbers in
step 2 so now we must find the square root of
49.
This number is called the standard deviation and
is the measure of how far each value is from the
mean.
The formula for standard deviation is
in this case 7
When the standard deviation is low it means the
scores are close to the mean. When it is high it
means they are spread out from the mean. In this
case it is a high number in relation to the mean,
so the numbers are spread out from the mean.
42Standard deviation
Use this formula to calculate the standard
deviation of the following sets of data in your
jotters.
- 1) The ages of four people who climbed Everest
are - 28, 43, 50, 27
- 2) The following times show the 0 to 60
acceleration of different BMWs - 6.0, 5.2, 10.7, 9.6, 8.3, 11.5, 7.5
- 3) The following scores were recorded at a golf
competition - 68, 72, 70, 71, 69
43Standard deviation
Now check that your answers are correct.
44Standard deviation
There is one final formula that can be used to
find the standard deviation from a set of
numbers.
You will have noticed that in the previous
examples when you calculated the mean at the
beginning, it gave an easy-to-use number,
i.e. the mean was either a whole number or a
decimal number to 1 decimal place.
If you calculate the mean and you have a number
with many decimal places, you can use an
alternative formula.
This still gives the same answer as the one we
found before, but this formula is easier to use
for numbers that have more decimal places.
45Standard deviation
Example Calculate the mean and standard
deviation of the following numbers.
22, 23, 21, 20, 20.4, 21.3
x x²
22 484
23 529
21 441
20 400
20.4 416.16
21.3 453.69
21.28333
46Use this formula to calculate the standard
deviation of the following sets of data in your
jotters.
- 1) The reaction time of four drivers were tested
- 0.23, 0.85, 0.42, 0.94
- 2) The BMI values of a group of S5 pupils were
recorded as follows - 17.7, 22.42, 21.2, 23, 16.99, 18.4
- 3) A group of S6 students were asked at what age
they thought they would get married - 33, 32, 34, 34, 35
47Standard deviation
Now check that your answers are correct.
48Probability
Probability is the likelihood of an event
happening.
To calculate the probability of an event
happening, the following formula can be used.
P(event) number of favourable outcomes
number of possible outcomes
Example If you were to roll a dice what would
the probability be that it would land on a 2?
P(2) number of 2s on the dice total
numbers on the dice
49Probability
Example If you were to roll a dice, what is the
probability that you would roll
an odd number?
P(odd) number of odd numbers on the dice
total numbers on the dice
Example If you were to pick a random card out
from a set of cards, what is the
probability that you would pick out the
number 4?
P(4) number of 4s in a pack of cards
total number of cards in a pack
50Probability
In your jotters, calculate the probability of the
following events happening.
1) There are 52 cards in a pack. What is the
probability that you pick out a red card? 2)
A bag full of bank notes has 14 1 notes, 6 5
notes 3 10 notes and 1 20 note. What is the
probability that a 5 note would be randomly
picked out? 3) There are 49 numbers in the
National Lottery. What is the probability that
the first ball that rolls out is a multiple
of 4?
51Probability
Now check that your answers are correct.