FPP 28 - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

FPP 28

Description:

Chi-square test FPP 28 More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies of nominal ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 26
Provided by: Garrit3
Category:
Tags: fpp | square | test

less

Transcript and Presenter's Notes

Title: FPP 28


1
Chi-square test
  • FPP 28

2
More types of inference for nominal variables
  • Nominal data is categorical with more than two
    categories
  • Compare observed frequencies of nominal variable
    to hypothesized probabilities
  • One categorical variable with more than two
    categories
  • Chi-squared goodness of fit test
  • Test if two nominal variables are independent
  • Two categorical variables with at least one
    having more than two categories
  • Chi-squared test of independence

3
Goodness of fit test
  • Do people admit themselves to hospitals more
    frequently close to their birthday?
  • Data from a random sample of 200 people admitted
    to hospitals

Days from birthday Number of admissions
within 7 11
8-30 24
31-90 69
91 96
4
Goodness of fit test
  • Assume there is no birthday effect, that is,
    people admit randomly. Then,
  • Pr (within 7) .0411 Pr (8
    - 30) .1260 Pr (31-90)
    .3288 Pr (91)
    .5041
  • So, in a sample of 200 people, wed expect
    to be in within 7
    to be in 8 - 30
    to be in 31 - 90
    to be in 91

5
Goodness of fit test
  • If admissions are random, we expect the sample
    frequencies and hypothesized probabilities to be
    similar
  • But, as always, the sample frequencies are
    affected by chance error
  • So, we need to see whether the sample frequencies
    could have been a plausible result from a chance
    error if the hypothesized probabilities are true.
  • Lets build a hypothesis test

6
Goodness of fit test
  • Hypothesis
  • Claim (alternative hyp.) is admission
    probabilities change according to days since
    birthday
  • Opposite of claim (null hyp.) is probabilities in
    accordance with random admissions.
  • H0 Pr (within 7) .0411 Pr (8 - 30)
    .1260 Pr (31-90) .3288 Pr
    (91) .5041
  • HA probabilities different than those in H0 .

7
Goodness of fit test Test statistic
  • Chi-squared test statistic

8
Goodness of fit test Test statistic
Cell Obs Exp Dif Dif2 Dif2/Exp
In 7
8-30
31-90
91
9
Goodness of fit test Calculate p-value
  • X2 has a chi-squared distribution with degrees
    of freedom equal to number of categories minus 1.
  • In this case, df 4 1 3.

10
Goodness of fit test Calculate p-value
  • To get a p-value, calculate the area under the
    chi-squared curve to the right of 1.397
  • Using JMP, this area is 0.703. If the null
    hypothesis is true, there is a 70 chance of
    observing a value of X2 as or more extreme than
    1.397
  • Using the table the p-value is between 0.9 and
    0.70

11
Chi-squared table
12
JMP output admissions
13
Goodness of fit test Judging p-value
  • The .70 is a large p-value, indicating that the
    difference between the observed and expected
    counts could well occur by random chance when the
    null hypothesis is true. Therefore, we cannot
    reject the null hypothesis. There is not enough
    evidence to conclude that admissions rates change
    according to days from birthday.

14
Independence test
  • Is birth order related to delinquency?
  • Nye (1958) randomly sampled 1154 high school
    girls and asked if they had been delinquent.

Eldest 24 450
In Between 29 312
Youngest 35 211
Only 23 70
15
Sample of conditional frequencies
  • Delinquent for each birth order status
  • Based on conditional frequencies, it appears that
    youngest are more delinquent
  • Could these sample frequencies have plausibly
    occurred by chance if there is no relationship
    between birth order and delinqeuncy

Oldest .05
Middle .085
Youngest .14
Only .25
16
Test of independence
  • Hypotheses
  • Want to show that there is some relationship
    between birth order and delinquency.
  • Opposite is that there is no relationship.
  • H0 birth order and delinquency are
    independent.
  • HA birth order and delinquency are
    dependent.

17
Implications of independence
  • Expected counts
  • Under independence,
  • Pr(oldest and delinquent) Pr(oldest)Pr(delinque
    nt)
  • Estimate Pr(oldest) as marginal frequency of
    oldest
  • Estimate Pr(delinquent) as marginal frequency of
    delinquent
  • Hence, estimate Pr(oldest and delinquent) as
  • The expected number of oldest and delinquent,
    under independence, equals
  • This is repeated for all the other cells in table

18
Test of independence
  • Expected counts
  • Next we compare the observed counts with the
    expected to get a test statistic

Oldest 45.59 428.41
In Between 32.80 308.2
Youngest 23.66 222.34
Only 8.95 84.05
19
  • Use the X2 statistic as the test statistic

20
Test of independence
  • Calculate the p-value
  • X 2 has a chi-squared distribution with degrees
    of freedomdf (number rows 1) (number
    columns 1)
  • In delinquency problem, df (4 - 1) (2 - 1)
    3.
  • The area under the chi-squared curve to the right
    of 42.245 is less than .0001. There is only a
    very small chance of getting an X2 as or more
    extreme than 42.245.

21
(No Transcript)
22
JMP output for chi-squared test
  • This is a small p-value. It is unlikely wed
    observe data like this if the null hypothesis is
    true. There does appear to be an association
    between delinquency and birth order.

23
Chi-squared test details
  • Requires simple random samples.
  • Works best when expected frequencies in each cell
    are at least 5.
  • Should not have zero counts
  • How one specifies categories can affect results.

24
Chi-squared test items
  • What do I do when expected counts are less than
    5?
  • Try to get more data. Barring that, you can
    collapse categories.Example Is baldness
    related to heart disease? (see JMP for data set)
  • Baldness Disease Number of people
  • None Yes 251
  • None No 331
  • Little Yes 165
  • Little No 221
  • Some Yes 195
  • Some No 185 Combine extreme and
    much categories
  • Much Yes 50 Much or extreme
    Yes 52
  • Much No 34 Much or extreme No
    35
  • Extreme Yes 2
  • Extreme No 1
  • This changes the question slightly, since we have
    a new category.

25
Chi-squared test
  • for collapsed data for baldness example
  • Based on p-value, baldness and heart disease are
    not independent.
  • We see that increasing baldness is associated
    with increased incidence of heart disease.
Write a Comment
User Comments (0)
About PowerShow.com