Lecture 5 Introduction to Hypothesis tests - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 5 Introduction to Hypothesis tests

Description:

To introduce hypothesis testing. Objectives. By the end of this session, students ... tends to have fatter tails smaller the sample, fatter the tails become. ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 32

Provided by: Gradu1

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 5 Introduction to Hypothesis tests

1
Lecture 5Introduction to Hypothesis tests

Quantitative Methods Module I
Gwilym Pryce
g.pryce_at_socsci.gla.ac.uk

2
Notices

Register
Class Reps and Staff Student committee.

3
Aims Objectives

Aim
To introduce hypothesis testing
Objectives
By the end of this session, students should be
able to
Understand the 4 steps of hypothesis testing
Run hypothesis test on a mean from a large
sample
Run hypothesis test on a mean from a small
sample

4
Plan

1. Statistical Significance
2. The four steps of hypothesis testing
3. Hypotheses about the population mean
3.1 when you have large samples
3.2 when you have small samples

5
1. Significance

Does not refer to importance but to real
differences in fact between our observed sample
mean and our assumption about the population mean
P significance level chances of our observed
sample mean occurring given that our assumption
about the population (denoted by H0) is true.
So if we find that this probability is small, it
might lead us to question our assumption about
the population mean.

I.e. if our sample mean is a long way from our
assumed population mean then it is
either a freak sample
or our assumption about the population mean is
wrong.
If we draw the conclusion that it is our
assumption re m that is wrong and reject H0 then
we have to bear in mind that there is a chance
that H0 was in fact true.
In other words, when P 0.05 every twenty times
we reject H0, then on one of those occasions we
would have rejected H0 when it was in fact true.

Obviously, as the sample mean moves further away
from our assumption (H0) about the population
mean, we have stronger evidence that H0 is false.
If P is very small, say 0.001, then there is only
1 chance in a thousand of our observed sample
mean occurring if H0 is true.
This also means that if we reject H0 when P
0.001, then there is only one in a thousand
chance that we have made a mistake (I.e. that we
have been guilty of a Type I error)

There is a tradition (initiated by English
scientist R. A. Fisher 1860-1962) of rejecting H0
if the probability of incorrectly rejecting it is
? 0.05.
If P ? 0.05 then we say that H0 can be rejected
at the 5 significance level.
If P gt 0.05, then, argued Fisher, the chances of
incorrectly rejecting H0 are too high to allow us
to do so.
the probability of a sample mean at least as
extreme as our observed value occurring, will be
determined not just by the difference between our
assumed value of m, but also by the standard
deviation of the distribution and the size of our
sample.

9
Type I and Type II errors

P significance level chances of incorrectly
rejecting H0 when it is in fact true.
Called a Type I error
So sig Pr(Type I error) Pr(false rejection)
If we accept H0 when in fact the alternative
hypothesis is true
Called a Type II error.
On this course we shall be concerned only with
Type I errors.

10
2. The four steps of hypothesis testing

Last week we looked at confidence intervals
establish the range of values of the population
mean for a given level of confidence
e.g. we are 90 confident that population mean
age of HoHs in repossessed dwellings in the Great
Depression lay between 32.17 and 36.83 years (s
20).
Based on a sample of 200 with mean 34.5yrs.
But what if we want to use our sample to test a
specific hypothesis we may have about the
population mean?
E.g. does m 30 years?
If m does 30 years, then how likely are we to
select a sample with a mean as extreme as 34.5
years?
I.e. 4.5 years more or 4.5 years less than the
pop mean?

11
(No Transcript)
12
One tailed test P how likely we are to select
a sample with mean age at least as great as 34.5?
13
How do we find the proportion of sample means
greater than 34.5?

Because all sampling distributions for the mean
(assuming large n) are normal, we can convert
points on them to the standard normal curve
e.g. for 34.5
z (34.5 - 30)/(20/?200)
4.5 / 1.4
3.2

14
(No Transcript)
15
(No Transcript)
16
Upper tailed test
17
Two tailed test
18
3. Steps to Hypothesis tests

1. Specify null and alternative hypotheses and
say whether its a two, lower, or upper tailed
test.
2. Specify threshold significance level a and
appropriate test statistic formula
3. Specify decision rule (reject H0 if P lt a)
4. Compute P and state conclusion.

19
P values for one and two tailed tests

Use diagrams to explain how we know the following
are true
Upper Tail Test population mean gt specified
value
H1 m gt m0 then P Prob(z gt zi)
Lower Tail Test population mean lt specified
value
H1 m lt m0 then P Prob(z lt zi)
Two Tail Test population mean ? specified value
H1 m ? m0 then P 2xProb(z gt zi)

20
(No Transcript)
21
(No Transcript)
22
E.g. The obesity threshold for men of a
particular height is defined as weighing over
187lbs mean weight of men in your sample with
this height is 190.5lbs, sd 13.7lbs, n 94.
Are the men in your sample typically obese?

Test the hypothesis that the average man in the
population is obese.
How do we write Step 1?
Because H1 m gt m0 then P Prob(z gt zi)
So this is an Upper tailed test we write
H0 m 187lbs
H1 m gt 187lbs

23
How do we write Step 2? (a and appropriate test
statistic formula)

Large sample

24
How do we write Step 3?
25
How do we write Step 4?
26

The upper tail significance level is given by
SIGZ_UTL 0.00663
What can we conclude from this?

27
eg Test the hypothesis that male super
heroes/villains tend to be c. six foot tall.

1st you need to convert scale 6ft 182.88cm
2nd you need to run descriptive stats on height
to get the n, x-bar, and s
n 29
xbar 181.72cm
s 8.701

28
H_L1M n(29) x_bar(181.72) m(182.88)
s(8.701).

Compare this output with that of the large sample
95 confidence interval interpret

29
Hypotheses about the population mean when you
have small samples

This is exactly the same as the large sample
case, except that one uses the t-distribution
provided that x is normally distributed.
Many statisticians use t rather than z even when
the sample size is large since
(i) strictly speaking our approximation for the
SE of the mean has a t rather than z distribution
(ii) t tends towards the z distribution when n is
large

30
E.g. re-run the hypothesis test on height of
super heroes using a t testH_S1M n(29)
x_bar(181.72) m(182.88) s(8.701).