Basic Statistics in Public Health - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Basic Statistics in Public Health

Description:

BMI? Obese or not? Underweight / normal / overweight / obese? Numerical ... Why do we calculate Confidence Intervals when our estimates are based on total ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 36
Provided by: swlondonpu
Category:

less

Transcript and Presenter's Notes

Title: Basic Statistics in Public Health


1
Basic Statistics in Public Health
2
  • Types of data
  • Descriptive statistics
  • Confidence Intervals

3
TYPES OF DATA
Age?
Sex?
Social class?
4
TYPES OF DATA
BMI?
Obese or not?
Underweight / normal / overweight / obese?
5
Type of data?
Source Compendium of Clinical and Health
Indicators / Health Surveys for England
6
Type of data?
Source Compendium of Clinical and Health
Indicators / Health Surveys for England
7
SUMMARISINGDATA
8
Summarising Numerical Data
  • Measures of central tendency / location-
  • Mean
  • Median
  • Mode
  • Measures of spread / variability-
  • Range
  • Interquartile range
  • Variance
  • Standard deviation

9
  • Mean? Median?
  • 3, 4, 5, 6, 7
  • 9, 10, 20, 21
  • 1, 2, 3, 4, 990

10
  • Mean? Median?
  • 3, 4, 5, 6, 7
  • 9, 10, 20, 21
  • 1, 2, 3, 4, 990

5
5
15
15
3
200
11
Mean or Median?
  • For symmetric data, meanmedian
  • Choose the mean (easily understood, better
    statistical properties)
  • For skewed data, mean is drawn towards the tail
    of distribution
  • Median can be better reflection of centre of data

12
Positively skewed data ...
Median2
Mean3.8
13
Negatively skewed data ...
Median40
Mean38.9
14
And the Mode?
  • Mode most frequently occurring value
  • Hardly ever used
  • Depends on how the data are grouped, and not
    always unique

15
The Range
  • Range maximum - minimum
  • In practice, usually present as (min, max)
  • Poor measure of spread-
  • very dependent on sample size
  • affected by outliers
  • but people often want to know it! present as an
    extra

16
Interquartile Range (IQR)
  • Rank data from smallest to largest
  • Lower Quartile has 1/4 of values smaller than it
  • Upper Quartile has 1/4 of values larger than it
  • Interquartile Range Upper Quartile - Lower
    Quartile
  • But usually written (Lower Quartile, Upper
    Quartile)
  • Better than range - not influenced by outliers or
    sample size

17
Interquartile Range (IQR) - Note
  • Note - quartiles are, strictly, observations
  • There are 3 quartiles -
  • lower quartile
  • ?
  • upper quartile

median
  • But the word Quartiles is now often used to
    mean Quarters - i.e. the 4 groups of ranked
    observations
  • Similarly Quintiles is often used to mean
    Fifths i.e. the 5 groups of ranked
    observations

18
Variance
  • Step 1 Calculate Deviations the difference
    between each observation and the mean of the data
  • Step 2 Square these Deviations
  • Step 3 Average the Squared Deviations
  • this is the Variance
  • (Strictly, divide by n-1, not n)

19
Standard Deviation (SD)
  • Step 4 Take the square root of the Variance
  • this is the Standard Deviation

This returns the statistic to the same units as
the data
  • Both Variance and Standard deviation use all of
    the data
  • But as a result, can be over-influenced by
    outliers

20
Symmetric data ...
  • Summarise using Mean and Standard Deviation

Skewed data ...
Summarise using Median and Interquartile Range
21
Useful Fact
If a dataset is Normally distributed, or at least
fairly symmetrical, then the central 95 of the
data will be included in the range Mean /- 2
Standard Deviations Sometimes called a
reference range or normal range Strictly
Mean /- 1.96 Standard deviations
22
Central 95 of data
2.5
2.5
mean 2SD mean
mean 2SD
23
CONFIDENCEINTERVALS
24
Obesity data, England, 2006
  • Age-standardised percentage obese 24.1
  • 95 Confidence Interval 23.2 to 25.0

?
25
Sample estimates of Population values
  • The obesity data was based on a sample
  • But has this sample given the right answer?
  • First need to eliminate bias, e.g. take a random
    sample
  • But even when samples are unbiased, different
    samples will still give different answers - this
    is known as sampling error or random variation

26
Would like to know, How imprecise might the
sample estimate be, just as a result of sampling
variation? i.e. How far away might the sample
estimate be from the true population value?
  • Depends on
  • Sample size
  • Variability of data (SD)

27
A 95 confidence interval provides a measure of
the precision of a sample estimate-
There is a 95 probability that the true
population value lies within the 95 confidence
interval.
Narrow 95 CI precise estimate Wide 95
CI imprecise estimate
28
  • Age-standardised percentage obese 24.1
  • 95 Confidence Interval 23.2 to 25.0

We are 95 confident that the true
age-standardised percentage obese for England,
2006, is somewhere between 23.2 and 25.0.
29
FOR DISCUSSION
  • Why do we calculate Confidence Intervals when our
    estimates are based on total population data,
    e.g. SMRs for cancer?

30
Presenting 95 Confidence Intervals on graphs
Self-reported smoking status in women (), by
ethnic group with 95 confidence intervals
(England, 2004)
31
Interpreting 95 Confidence Intervals from graphs
  • What can you say about the true smoking
    prevalence for the general population?
  • For which ethnic groups is the prevalence of
    smoking significantly different from 25?
  • Is the prevalence of smoking significantly
    different between the Black Caribbean and Black
    African populations?
  • Is the prevalence of smoking significantly
    different between the Pakistani and Bangladeshi
    populations?

32
  • What can you say about the true smoking
    prevalence for the general population?
  • We are 95 confident that it lies approximately
    between about 22 and 24
  • For which ethnic groups is the prevalence of
    smoking significantly different from 25?
  • All except Black Caribbean and Irish

33
  • Is the prevalence of smoking significantly
    different between the Black Caribbean and Black
    African populations?
  • Almost certainly, since the Confidence Intervals
    dont overlap
  • Is the prevalence of smoking significantly
    different between the Pakistani and Bangladeshi
    populations?
  • We cant tell when Confidence Intervals
    overlap slightly, there might still be a
    significant difference!

34
Note In general, it is better to perform a
statistical significance test, than look for
overlapping or non-overlapping confidence
intervals!
35
Food for thought
  • What is the difference between a 95 confidence
    interval and a 95 reference range?
Write a Comment
User Comments (0)
About PowerShow.com