The Basics - PowerPoint PPT Presentation

1 / 61

About This Presentation

Title:

The Basics

Description:

Shape. Depends upon the size of the mean. ... The shape of the sampling distribution becomes more Normal (due to the central limit theorem) ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 62

Provided by: RHA100

Category:

more less

Transcript and Presenter's Notes

Title: The Basics

1
Introduction

The Basics

Three distributions
Sampling distributions, the central limit theorem
confidence intervals
Testing hypotheses the t test, Type I and II
errors the p value
Testing hypotheses the 2 sample t test
One and two tailed tests

2
1. Three basic distributions
Normal, Poisson and Binomial
3
Normal Distribution

Described by two parameters Mean and variance

4
Useful properties

Standardising

Shape

5
Poisson Distribution

Discrete describes counts

Described by one parameter the mean

So mean variance

Probability of r events occurring per unit time
or space (r 0, 1, 2..).

6
Shape

Depends upon the size of the mean..

If mean gt 5, can approximate with a normal
distribution

7
Binomial Distribution

Discrete describes proportions

Described by two parameters n (number of events,
trials, sample size etc) and p (the probability
of a certain outcome).

Probability of the proportion r/n occurring

Mean n p Variance n p q

8
Shape

Depends upon p

n 10
p 0.1, 0.5, 0.9

When p close to 0.5 can be approximated by the
normal distribution

These three probability distributions are the
basis of describing much continuous and
categorical data

Under certain conditions, the Poisson and
Binomial distributions can be approximated using
the Normal

10
2. Populations and samplesCalculating a
confidence interval
Consider this in the context of the simplest
model estimating a mean
11
Population Mean ? Variance ?2
Sample
Mean ?y Variance s2
Take a sample of size n
Our view ?y and s2 are known Need to use these to
guess ? and ?2

Gods view
and ?2 are known
?y and s2 are unpredictable

12
True ?
Possible ?y
Observed ?y
Inferred ?
This is what we are doing by constructing a
confidence interval
13
Sample n independent data points
Calculate a mean
Calculate a variance
14
Variance
DF number of independent pieces of information
15
How can we use ?y and s2 to infer a likely value
for ? ?
Imagine a meta-experiment
Population
n
n
n
n
16
Distribution of ?y
Often ?y will be close to ?
Occasionally ?y will be some distance from ?
17
True ?
Possible ?y
Observed ?y
Inferred ?
18
True ?
Possible ?y
Observed ?y
Inferred ?
19
True ?
Possible ?y
Inferred ?
Observed ?y
20
True ?
Possible ?y
Inferred ?
Observed ?y
The higher the sample size, the better the
estimate of ?
The lower the value of ?2, the better the
estimate of ?
21
Distribution of ?y

Mean ?
Variance
Shape Normal

22
Standard error

Variance

Standard error

standard deviation of the distribution of ?y
that would be obtained through a meta-experiment
standard deviation of a parameter distribution
23
Why do we assume the shape is Normal?

Because of the central limit theorem

In a psychology experiment, peoples reaction
times to operate a push button in response to a
signal are measured. 10 of the time, the
person missed.
The population ? 148 ms
24
Aim estimate ? by sampling from the population

Simulate five meta-experiments
n 4 n 8 n 16 n 32 n 128.

When n 4, the reaction times picked from the
population could be four hits, four misses or any
combination in-between.

Could use the binomial distribution to calculate
the probabilities of these combinations.

25
N 4

First peak is four hits p 0.66, 115 ms
Next peak, 3 hits and 1 miss, 200 ms
And so on

Very peculiar
26
N8
Still rather peculiar
27
N16
Getting there
28
N 32
Normalish
29
N 128
Spot on
30
Summary of the three distributions
31
Put this information together
Distribution of ?Y
?
32
96 of the time (in our meta-experiment) ?Y lies
between
and
Rearranging, and making it a 95 interval.
33

Problem we dont know ?.
We have an estimate of ?, which is s
How good is this estimate?
That depends upon the number of independent
pieces of information we have about s i.e. the
degrees of freedom
Which in this case n-1

34
t distributions compensate for uncertainty in s
When df 8, the t distribution a Normal
distribution
35
Use t tables to find out how many standard errors
you need to encompass 95 of the distribution.
36
Giving the final formula for a Confidence Interval
Where tcrit has n-1 degrees of freedom
37
The General Formula
Parameter estimate tcrit standard
errorparameter
Where the required df for tcrit comes from the
unexplained variation
38
As sample size increases.

?Y becomes closer to ?
(because the variance of the sampling
distribution is ?2/n)
s becomes closer to ?
(with df n-1)
The shape of the sampling distribution becomes
more Normal
(due to the central limit theorem)
The t distribution becomes very close to the
Normal distribution

39
3. Testing hypotheses

t tests

40
One sample t test

Can be used to test the hypothesis that ? a
specific value
Can be used with paired data, to ask if a mean
difference is significantly different from zero.

41
(No Transcript)
42
Null hypothesis H0 ? 0
Our sampling distribution
0
Units of the x axis are standard errors
Is our ?y here? Or is it here?
43
How many standard errors is our observed value of
4.64 away from our hypothesised value of 0?
Answer
with 7 degrees of freedom
44
?
?
45
The General Formula
Where the required df for ts comes from the
unexplained variation
46
So We reject the Null hypothesis with 0.02 lt p lt
0.05
The p value is the probability of getting that
test statistic, or something more extreme, under
the Null hypothesis (i.e. if the Null hypothesis
is true).
47
The p value
When we conclude that we have a significant
result with p value of 0.03, what exactly is that
probability measuring?
NOT

The null hypothesis is true with probability 0.03
There is a probability of 0.03 that there is no
difference

48
E all possible experiments
E
A
B
A the set of experiments for which the null
hypothesis is true
B the set of experiments for which the null
hypothesis is rejected
The p value the overlap
49
Errors
Type I
Type II

Failing to reject the null hypothesis when it is
false
Influenced by a host of factors including power
of statistical test used experimental design
etc.
Cant be measured absolutely (but can relatively)

Rejecting the null hypothesis when it is true
Convention sets this at 0.05 for any one test
Under our control!

Power 1-Probability of Type II error
50
E
A
H0 rejected
H0 true
B

The null hypothesis is true with probability 0.03
There is a probability of 0.03 that there is no
difference

This is equivalent to saying that set A is of
size 0.03 Incorrect
51
4. Two sample t test

Are two groups different?

Sample ?y1, s12 ?y2, s22
Population ?1, ?12 ?2, ?22
Does ?1 ?2 ?
52
Null hypothesis ?1 - ?2 0
Sampling distribution is now ?y1-?y2
?1-?2
Variance (A-B) Variance (A) Variance (B)
53
Two sample t tests are a simple extension of one
sample t tests.
54
and also with confidence intervals
Parameter estimate tcrit standard
errorparameter
Degrees of freedom? Two means have been
estimated n1 n2 - 2
55
5. One and two tailed tests
Two tails
Null hypothesis H0 ?1 - ?2 0
Alternative hypothesis HA ?1 - ?2 ? 0
56
Two tailed test
Reject H0
Reject H0
Standard Errors
57
One tail
Null hypothesis ?1 - ?2 0
Alternative hypothesis ?1 lt ?2
58
One tailed test
Reject H0
Standard Errors
59
Deciding on HA after you know direction will
double your chances of making a Type I error
Reject H0
Reject H0
Standard Errors
60
Relationship between CIs and hypothesis testing
The distribution of ?y1 - ?y2
The null distribution
?y1 - ?y2
0
61
Next week