Basic Statistics in Public Health - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Basic Statistics in Public Health

Description:

BMI? Obese or not? Underweight / normal / overweight / obese? Numerical ... Why do we calculate Confidence Intervals when our estimates are based on total ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 36

Provided by: swlondonpu

Category:

more less

Transcript and Presenter's Notes

Title: Basic Statistics in Public Health

1
Basic Statistics in Public Health
2

Types of data
Descriptive statistics
Confidence Intervals

3
TYPES OF DATA
Age?
Sex?
Social class?
4
TYPES OF DATA
BMI?
Obese or not?
Underweight / normal / overweight / obese?
5
Type of data?
Source Compendium of Clinical and Health
Indicators / Health Surveys for England
6
Type of data?
Source Compendium of Clinical and Health
Indicators / Health Surveys for England
7
SUMMARISINGDATA
8
Summarising Numerical Data

Measures of central tendency / location-
Mean
Median
Mode

Measures of spread / variability-
Range
Interquartile range
Variance
Standard deviation

Mean? Median?
3, 4, 5, 6, 7
9, 10, 20, 21
1, 2, 3, 4, 990

Mean? Median?
3, 4, 5, 6, 7
9, 10, 20, 21
1, 2, 3, 4, 990

5
5
15
15
3
200
11
Mean or Median?

For symmetric data, meanmedian
Choose the mean (easily understood, better
statistical properties)

For skewed data, mean is drawn towards the tail
of distribution
Median can be better reflection of centre of data

12
Positively skewed data ...
Median2
Mean3.8
13
Negatively skewed data ...
Median40
Mean38.9
14
And the Mode?

Mode most frequently occurring value
Hardly ever used
Depends on how the data are grouped, and not
always unique

15
The Range

Range maximum - minimum
In practice, usually present as (min, max)

Poor measure of spread-
very dependent on sample size
affected by outliers
but people often want to know it! present as an
extra

16
Interquartile Range (IQR)

Rank data from smallest to largest
Lower Quartile has 1/4 of values smaller than it
Upper Quartile has 1/4 of values larger than it
Interquartile Range Upper Quartile - Lower
Quartile
But usually written (Lower Quartile, Upper
Quartile)

Better than range - not influenced by outliers or
sample size

17
Interquartile Range (IQR) - Note

Note - quartiles are, strictly, observations
There are 3 quartiles -
lower quartile
?
upper quartile

median

But the word Quartiles is now often used to
mean Quarters - i.e. the 4 groups of ranked
observations
Similarly Quintiles is often used to mean
Fifths i.e. the 5 groups of ranked
observations

18
Variance

Step 1 Calculate Deviations the difference
between each observation and the mean of the data
Step 2 Square these Deviations
Step 3 Average the Squared Deviations
this is the Variance
(Strictly, divide by n-1, not n)

19
Standard Deviation (SD)

Step 4 Take the square root of the Variance
this is the Standard Deviation

This returns the statistic to the same units as
the data

Both Variance and Standard deviation use all of
the data
But as a result, can be over-influenced by
outliers

20
Symmetric data ...

Summarise using Mean and Standard Deviation

Skewed data ...
Summarise using Median and Interquartile Range
21
Useful Fact
If a dataset is Normally distributed, or at least
fairly symmetrical, then the central 95 of the
data will be included in the range Mean /- 2
Standard Deviations Sometimes called a
reference range or normal range Strictly
Mean /- 1.96 Standard deviations
22
Central 95 of data
2.5
2.5
mean 2SD mean
mean 2SD
23
CONFIDENCEINTERVALS
24
Obesity data, England, 2006

Age-standardised percentage obese 24.1
95 Confidence Interval 23.2 to 25.0

?
25
Sample estimates of Population values

The obesity data was based on a sample
But has this sample given the right answer?
First need to eliminate bias, e.g. take a random
sample
But even when samples are unbiased, different
samples will still give different answers - this
is known as sampling error or random variation

26
Would like to know, How imprecise might the
sample estimate be, just as a result of sampling
variation? i.e. How far away might the sample
estimate be from the true population value?

Depends on
Sample size
Variability of data (SD)

27
A 95 confidence interval provides a measure of
the precision of a sample estimate-
There is a 95 probability that the true
population value lies within the 95 confidence
interval.
Narrow 95 CI precise estimate Wide 95
CI imprecise estimate
28

Age-standardised percentage obese 24.1
95 Confidence Interval 23.2 to 25.0

We are 95 confident that the true
age-standardised percentage obese for England,
2006, is somewhere between 23.2 and 25.0.
29
FOR DISCUSSION

Why do we calculate Confidence Intervals when our
estimates are based on total population data,
e.g. SMRs for cancer?

30
Presenting 95 Confidence Intervals on graphs
Self-reported smoking status in women (), by
ethnic group with 95 confidence intervals
(England, 2004)
31
Interpreting 95 Confidence Intervals from graphs

What can you say about the true smoking
prevalence for the general population?
For which ethnic groups is the prevalence of
smoking significantly different from 25?
Is the prevalence of smoking significantly
different between the Black Caribbean and Black
African populations?
Is the prevalence of smoking significantly
different between the Pakistani and Bangladeshi
populations?

What can you say about the true smoking
prevalence for the general population?
We are 95 confident that it lies approximately
between about 22 and 24

For which ethnic groups is the prevalence of
smoking significantly different from 25?
All except Black Caribbean and Irish

Is the prevalence of smoking significantly
different between the Black Caribbean and Black
African populations?
Almost certainly, since the Confidence Intervals
dont overlap

Is the prevalence of smoking significantly
different between the Pakistani and Bangladeshi
populations?
We cant tell when Confidence Intervals
overlap slightly, there might still be a
significant difference!

34
Note In general, it is better to perform a
statistical significance test, than look for
overlapping or non-overlapping confidence
intervals!
35
Food for thought