Title: The Basics
1Introduction
- Three distributions
- Sampling distributions, the central limit theorem
confidence intervals - Testing hypotheses the t test, Type I and II
errors the p value - Testing hypotheses the 2 sample t test
- One and two tailed tests
21. Three basic distributions
Normal, Poisson and Binomial
3Normal Distribution
- Described by two parameters Mean and variance
4Useful properties
5Poisson Distribution
- Discrete describes counts
- Described by one parameter the mean
- Probability of r events occurring per unit time
or space (r 0, 1, 2..).
6Shape
- Depends upon the size of the mean..
- If mean gt 5, can approximate with a normal
distribution
7Binomial Distribution
- Discrete describes proportions
- Described by two parameters n (number of events,
trials, sample size etc) and p (the probability
of a certain outcome).
- Probability of the proportion r/n occurring
8Shape
- When p close to 0.5 can be approximated by the
normal distribution
9- These three probability distributions are the
basis of describing much continuous and
categorical data
- Under certain conditions, the Poisson and
Binomial distributions can be approximated using
the Normal
102. Populations and samplesCalculating a
confidence interval
Consider this in the context of the simplest
model estimating a mean
11Population Mean ? Variance ?2
Sample
Mean ?y Variance s2
Take a sample of size n
Our view ?y and s2 are known Need to use these to
guess ? and ?2
- Gods view
- and ?2 are known
- ?y and s2 are unpredictable
12True ?
Possible ?y
Observed ?y
Inferred ?
This is what we are doing by constructing a
confidence interval
13Sample n independent data points
Calculate a mean
Calculate a variance
14Variance
DF number of independent pieces of information
15How can we use ?y and s2 to infer a likely value
for ? ?
Imagine a meta-experiment
Population
n
n
n
n
16Distribution of ?y
Often ?y will be close to ?
Occasionally ?y will be some distance from ?
17True ?
Possible ?y
Observed ?y
Inferred ?
18True ?
Possible ?y
Observed ?y
Inferred ?
19True ?
Possible ?y
Inferred ?
Observed ?y
20True ?
Possible ?y
Inferred ?
Observed ?y
The higher the sample size, the better the
estimate of ?
The lower the value of ?2, the better the
estimate of ?
21Distribution of ?y
- Mean ?
- Variance
- Shape Normal
22Standard error
standard deviation of the distribution of ?y
that would be obtained through a meta-experiment
standard deviation of a parameter distribution
23Why do we assume the shape is Normal?
- Because of the central limit theorem
In a psychology experiment, peoples reaction
times to operate a push button in response to a
signal are measured. 10 of the time, the
person missed.
The population ? 148 ms
24Aim estimate ? by sampling from the population
- Simulate five meta-experiments
- n 4 n 8 n 16 n 32 n 128.
- When n 4, the reaction times picked from the
population could be four hits, four misses or any
combination in-between.
- Could use the binomial distribution to calculate
the probabilities of these combinations.
25N 4
- First peak is four hits p 0.66, 115 ms
- Next peak, 3 hits and 1 miss, 200 ms
- And so on
Very peculiar
26N8
Still rather peculiar
27N16
Getting there
28N 32
Normalish
29N 128
Spot on
30Summary of the three distributions
31Put this information together
Distribution of ?Y
?
3296 of the time (in our meta-experiment) ?Y lies
between
and
Rearranging, and making it a 95 interval.
33- Problem we dont know ?.
- We have an estimate of ?, which is s
- How good is this estimate?
- That depends upon the number of independent
pieces of information we have about s i.e. the
degrees of freedom - Which in this case n-1
34t distributions compensate for uncertainty in s
When df 8, the t distribution a Normal
distribution
35Use t tables to find out how many standard errors
you need to encompass 95 of the distribution.
36Giving the final formula for a Confidence Interval
Where tcrit has n-1 degrees of freedom
37The General Formula
Parameter estimate tcrit standard
errorparameter
Where the required df for tcrit comes from the
unexplained variation
38As sample size increases.
- ?Y becomes closer to ?
- (because the variance of the sampling
distribution is ?2/n) - s becomes closer to ?
- (with df n-1)
- The shape of the sampling distribution becomes
more Normal - (due to the central limit theorem)
- The t distribution becomes very close to the
Normal distribution
393. Testing hypotheses
40One sample t test
- Can be used to test the hypothesis that ? a
specific value - Can be used with paired data, to ask if a mean
difference is significantly different from zero.
41(No Transcript)
42Null hypothesis H0 ? 0
Our sampling distribution
0
Units of the x axis are standard errors
Is our ?y here? Or is it here?
43How many standard errors is our observed value of
4.64 away from our hypothesised value of 0?
Answer
with 7 degrees of freedom
44?
?
45The General Formula
Where the required df for ts comes from the
unexplained variation
46So We reject the Null hypothesis with 0.02 lt p lt
0.05
The p value is the probability of getting that
test statistic, or something more extreme, under
the Null hypothesis (i.e. if the Null hypothesis
is true).
47The p value
When we conclude that we have a significant
result with p value of 0.03, what exactly is that
probability measuring?
NOT
- The null hypothesis is true with probability 0.03
- There is a probability of 0.03 that there is no
difference
48E all possible experiments
E
A
B
A the set of experiments for which the null
hypothesis is true
B the set of experiments for which the null
hypothesis is rejected
The p value the overlap
49Errors
Type I
Type II
- Failing to reject the null hypothesis when it is
false - Influenced by a host of factors including power
of statistical test used experimental design
etc. - Cant be measured absolutely (but can relatively)
- Rejecting the null hypothesis when it is true
- Convention sets this at 0.05 for any one test
- Under our control!
Power 1-Probability of Type II error
50E
A
H0 rejected
H0 true
B
- The null hypothesis is true with probability 0.03
- There is a probability of 0.03 that there is no
difference
This is equivalent to saying that set A is of
size 0.03 Incorrect
514. Two sample t test
- Are two groups different?
Sample ?y1, s12 ?y2, s22
Population ?1, ?12 ?2, ?22
Does ?1 ?2 ?
52Null hypothesis ?1 - ?2 0
Sampling distribution is now ?y1-?y2
?1-?2
Variance (A-B) Variance (A) Variance (B)
53Two sample t tests are a simple extension of one
sample t tests.
54and also with confidence intervals
Parameter estimate tcrit standard
errorparameter
Degrees of freedom? Two means have been
estimated n1 n2 - 2
555. One and two tailed tests
Two tails
Null hypothesis H0 ?1 - ?2 0
Alternative hypothesis HA ?1 - ?2 ? 0
56Two tailed test
Reject H0
Reject H0
Standard Errors
57One tail
Null hypothesis ?1 - ?2 0
Alternative hypothesis ?1 lt ?2
58One tailed test
Reject H0
Standard Errors
59Deciding on HA after you know direction will
double your chances of making a Type I error
Reject H0
Reject H0
Standard Errors
60Relationship between CIs and hypothesis testing
The distribution of ?y1 - ?y2
The null distribution
?y1 - ?y2
0
61Next week
- ANOVA
- Regression
- General Linear Models an introduction
- More than 1 variable
- Interactions