Title: Basis of statistical Inference 1
1Basis of statistical Inference 1
- Epidemiology in Human Populations 202.251
- A/Prof. Cord Heuer
- EpiCentreMassey University
2Data types
- Qualitative data nominal scale
- Gender, location, socio-economic criteria
- Quantitative data
- Continuous and ratio any value (weight, height,
time) - Binary data yes/no (pregnant, exposed, diseased,
dead) - Discrete data no subdivision (number of disease
events, number of babies, number of traffic
violations) - Ordinal there is a rank order (mild moderate
severe negative suspicious positive)
3Simple statistics
- Measures of central tendency
- Mean sum of all values / number of observations
- Median
4Skewed distribution
Mean
Median
5Simple statistics
- Measures of variation
- Standard deviation
-x1
x1
-x1
x1
-x1
x1
6Simple statistics of a sample
- Example
- Values 3,4,6,7
- Median 5
- Standard deviation
-1
1
2
-2
7Normal bell curve
Probability
Mean 5 Std 1.83
- ? mean, and ? standard deviation - the
equation describes an infinite number of bell
curves
8Standard normal curve
- Convert all values to a single standard
- Centre around zero
- Scale to 1 stdev
9Standard normal curve
- Properties
- Symmetrical about y-axis
- Area under the curve 1
- Standard deviation z 1
- Curve asymptotically approaches
- the x-axis
- Extends to infinity in both directions
- Highest point
- We can now use a table to convert any observed
value x to a z, and z to the probability of
coming from a specific population!
0.4
10Standard normal curve
- Example
- A chicken dealer at the market grabs a bird of
2.2kg and offers it to you at the given price
is the bird really an average bird of the flock? - What is the probability the bird has the given
weight or less - Mean 3.5kg, stdev 0.8kg
2.2
Only 5 chance ? you are 95 sure he was
cheating! Does this mean he is really cheating??
11Inference on continuous data
Population parameters ? true population mean ?
population standard deviation
Sample n ltlt N X-bar sample mean S standard
deviation
Use sample statistics to make inferences
about population parameters
12Sampling distribution of mean
- Describes the distribution of means of repeated
samples
population
Sample1 mean1, s1
Sample2 mean2, s2
Sample3 mean3, s3
..
Samplek meank, sk
13Distributions for statistics
- Population ?, ?2
- 3 sample parameters
- 1. sample mean
- 2. standard deviation SD
- Standard deviation of the mean standard error
SE -
14Our example
- Mean
- Standard deviation
- Sample size n4
- Standard error
15Central limit theorem
- The set of means from all possible samples of
size n sampling distribution - mean of the means ?
- SD of this mean of means SE
- If population is normal, sampling dist is normal
- Even if pop is not normal, sampling dist is
normal as long as n is large enough (30)
GOTO
16Sample mean
- A random sample of n 150 adult weights has a
sample mean 80.7, SD 9.2 - What is the mean of the entire population?
- ? somewhere near
17How close is to ? ?
- What is the difference between the sample and the
true value of the mean (?)? - - ? deviation from the true mean
- Scale this deviation to units of SEmean
- Solve for ?
- Z is the standard normal distribution of all
sampling distributions with mean 0 and SE 1
18Our sample
- A random sample of n 150 adult weights has a
sample mean 80.7, SD 9.2 - The mean of the entire population is
- -zSD/sqrt(150) lt mean lt zSD/sqrt(150)
- The value of z depends on the level of
confidence - So, choose your z for sufficient confidence
19Confidence Interval
- 95 data points between 1.96 and 1.96
Probability
Z
20Confidence interval
- 95 data points between z -1.96 and z1.96
Confidence interval
21Our example
- For z 1.96 and z 1.96 the 95 CI is
- SE 9.2/sqrt(150) 0.75
- mean 80.7
- 95 confidence interval
- 80.70 1.960.75 lt mean lt 80.70 1.960.75
- 79.23 lt 80.70 lt 82.17
- Write as (79.23, 82.17)
- About 95 of 100 sample means will be between
these limits, provided sampling was unbiased
22µ 75
Up to 5 of 100 samples will have 95 confidence
intervals that do not include 75, hence ? 0.05