Title: Measuring spread'
1Measuring spread.
- A Learning Object produced by
- Sidney Tyrrell, Coventry University,
- as part of her National Teaching Fellowship
project.
2Why measure spread?
- Data can be interpreted to give us information.
- Information is what we need for taking good
decisions.
3Lots of data is too much to take in
- usually by finding an average, which you
should think of as a typical value,
- and by finding a measure of spread.
4An average by itself
- does not give the whole picture.
- Try writing down 4 data sets each with a mean
of 4 and a median of 4. - Make them as different as possible.
- When you are ready click here to continue the
presentation.
54 data sets with a mean and median of 4
0 4 8
0 0 4 4 14
-92 4 100
The essential difference between each set is is
the spread of the numbers.
6A difference in spread
4
4
4
4
0
8
4
4
14
0
0
4
-92
100
7A difference in spread
4
4
4
4
0
8
4
4
14
0
0
4
100
-92
8A difference in spread
4
4
4
4
0
8
4
4
14
0
0
4
100
-92
9A difference in spread
4
4
4
4
0
8
4
4
14
0
0
4
100
-92
10A difference in spread
4
4
4
4
0
8
4
4
14
0
0
4
100
-92
11We can measure spread using
- The range
- The interquartile range - IQR
- The standard deviation
- Click on the measure you want to find out more
about.
12The range
- is the difference between the largest and
smallest values. -
0
For the data set 4 4 4 the range is
For the data set 0 4 8 the range is
8 - 0 8
13The range
- is the difference between the largest and
smallest values. -
For the data set -92 4 100 the range is
100-(-92) 100 92 192
14The range a warning
- It is easy to calculate, but
- because it is found from the extreme values it
may not be helpful, - and may give quite the wrong idea about the data.
- Click here to return to measures of spread
15The interquartile range
- Quartiles divide the data into quarters.
- 2 3 5 7 10 11 15 16
16The central quartile is the median
17Statisticians disagree about how to calculate
exactly where the quartiles lie.
18But agree that
- quartiles divide the data into quarters.
- 2 3 5 7 10 11 15 16
19One definition is that
- The lower quartile is the median of the lower
half of the data set. - 2 3 5 7 10 11 15 16
The lower quartile is 4
20With an odd number of numbers
- the lower quartile is the median of the lower
half of the data below the median itself. - 2 3 5 7 9 10 11 15 16
The lower quartile is 4
21The interquartile range
- is the distance between the lower
quartile and the upper quartile. - 2 3 5 7 10 11 15 16
22The interquartile range
- gives the spread of the middle 50 of the data
often the bit you are most interested in. - It is not affected by any extreme values.
23The interquartile range
- Is used in constructing boxplots.
- The quartiles define the ends of the box.
- Click here to return to measures of spread
0
20
5
15
10
40
25
30
35
24The standard deviation, s
- is an important measure of spread for
statisticians - and frequently misunderstood.
- You can think of it as a typical deviation from
the mean. - Hence standard deviation.
25To calculate the standard deviation
- you need to start by finding the deviations or
differences. - From what?
From the mean.
26An example take the numbers 7,8,9
- Find the differences from the mean
- The mean is 8
- The differences are
9 8 1
8 8 0
7 8 -1
27Now you have the differences
Yes it will, whatever 3 numbers you choose!
- You need to find a standard deviation which
suggests averaging them. - This could be tricky as -1 0 1 0
- Will the answer always be 0?
28We get round this problem
- By squaring the differences first
- Then averaging them.
29square the differences
7 8 -1
8 8 0
9 8 1
( -1)2 1
(0)2 0
(1)2 1
30find the average
0
1
2
2 / 3
31the square root of 2/3 is 0816.
this is the population standard deviation 0816.
32The population standard deviation
- is used where your data is the complete
population being studied.
33Usually we are finding the standard deviation of
a sample a subset of the population.
- The sample standard deviation,s, is calculated
slightly differently. - At the stage where we find the average of the
squared deviations we divide by one less than the
sample size.
34The same example the numbers 7,8,9
- Find the differences from the mean
- The mean is 8
- The differences are
1
0
-1
35square the differences
( -1)2 1
(0)2 0
36find the average
This time we divide by one less than the sample
size.
37find the average
0
1
2
2 / 2 1
divide by one less than the sample size
38the square root of 1 is 1
This is the sample standard deviation.
39The variance
- Is the square of the standard deviation.
- The name reminds us it is a measure of
variability in the data.
40Calculate the standard deviation by
- Finding the differences from the mean
- squaring the differencesnd
- then taking the square root.and
41Finally a reminder
- Why measure spread?
- Finding the spread helps us understand our data
better - which should lead to better decisions and
analysis.