CHI SQUARE (?2) - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

CHI SQUARE (?2)

Description:

CHI SQUARE ( 2) Dangerous Curves Ahead! – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 27

Provided by: LisaM174

Category:

more less

Transcript and Presenter's Notes

Title: CHI SQUARE (?2)

1
CHI SQUARE (?2)
Dangerous Curves Ahead!
2
Why Chi ? (?2)

We want to compare two variables, but
Not all variables are interval-level, so we
cannot use regression.
Hypothesis Tests for Difference of Means and
Difference of Proportions only allow us to
compare two groups with one value.
We need something else. . .

3
Imagine a a bag that contained 90 white marbles
and 10 black marbles. If you drew 10 marbles, how
many would you expect to come up white, and how
many black? We expect 9 white marbles and 1
black. But there is some probability that we
will get 8/2 and some probability we will get 7/3

4
What do we do?

We can compare what we would expect by chance to
what we actually observed.
We can make a probabilistic statement about the
chances of observing what we did based on our
expectations.
Finally, we test the hypothesis that there is no
real difference between what we observed and what
we expected (using the 6 steps of hypothesis
testing.

Expected Observed
White 9 ???
Black 1 ???
5
Basic Assumption of the Null Hypothesis

There is no difference in the population, the
difference you observe is just the chance
variation of your sample.
Expected score Observed score 0 SE
We are comparing observed values (frequency
actually observed in our sample, written fo) to
some set of expected by chance frequencies
(written fe).

6
Chi Square (?2)

The test statistic for testing hypothesis
comparing 2 or more nominal categories
The Chi Square Statistic compares nominal values
in a cross-tabulation table, making what are
called row by column comparisons or r x c
tables.

7
A Nominal variable

is a categorical variable with mutually
exclusive categories. For example gender where
male 1 and female 2.

8
Approval for President Obama by Race
BLACKS WHITES
APPROVE 69 156
DISAPPROVE 21 144
9
The formula for c2 is OR, sometimes
written Where fo is the observed frequency of
each category in each cell of a table.
10
O or fo is what we observe from our sample, the
observed frequency. NOTE that c2 works with
frequencies in each cell. E or fe is the
expected frequency, the number of people who
would show up in each cell IF the null hypothesis
were true, if there was no racial difference in
approval, if the frequencies were due solely to
chance.
11
For each cell in the table we are to compare what
we observe to what we should expect by chance

Subtract the value of the hypothetical expectancy
(fe) from the observed frequency (fo) for each
cell.
Square each of these deviations.
Divide each of the squared differences by the
expected value of each cell.
Finally, take the sum of the squared fo- f e
differences to get ?2 .

12
The Chi Square statistic tests

Whether the difference between what you observe
and what chance would predict is due to sampling
error.
The greater the deviation of what we observe to
what we would expect by chance, the greater the
probability that the difference is NOT due to
chance.

13
DIFFERENCE BETWEEN EXPENSIVE AND CHEEP BEER

Consumer Reports routinely finds that many people
who claim they can taste the difference cant
they are influenced by the label.
How would you test the idea that people cannot
really tell the difference, and that they are
really responding to the price label information.
How do we disentangle the label effect from taste?

14
What is the null? gt No difference We expect
beer 1 rootbeer 2 rootbeer 3Study Design
Sample 150 rootbeer drinkers. Place before them 3
bottles, one labeled with name of well-known
high-priced rootbeer, another a medium-priced
rootbeer, and the third a low priced rootbeer.
Bottles counter balanced to control for order
effects. All 150 Subjects taste each rootbeer
and state preference.
15
The Full Table
High Priced RootBeer Medium Priced RootBeer Low Priced RootBeer
Observed fo 77 41 32
Expected fe 50 50 50
16
Step 1. HypothesisNull the proportions
preferring each rootbeer should be equal IF
indeed the rootbeers are equal and if preferences
are not influenced by the label. Here, chance
would predict 50 people in each group if label
did not matter. The ratios of O to E values
should be the same across all 3 comparisons if
label does not matter. The O E ratios in each
column should be the same. Our alternative
hypothesis is that preferences will follow the
status of rootbeer 1 gt rootbeer 2 gt rootbeer 3.
17
Step 2. The Distribution. Since we are
interested in the effect of one nominal variable
on another nominal variable the c2 distribution
is appropriate -- we are doing a row by column r
c analysis. Step 3. Level of
Significance Set alpha at .05 for 95
confidence.
18
Step 4. Determine Critical Value of c2 The chi
square distribution changes shape by degrees of
freedom, just as does the t distribution. Degrees
of freedom change as a function of the number of
comparisons made.
19
Formula for degrees of freedom of c2df (r -
1) x (c - 1)where r number of rows c number
of columnsWe have a 3 by 2 table, so df (3 -
1) x (2 - 1) 2. (Also when doing a One-way
Chi-square just subtract k-1 categories.) Step
5. Decision Let's fill in the table

20
RootBeer Hi Priced Med Priced Lo Priced
Observed 77 41 32
Expected 50 50 50
O-E 27 -9 -18
(O-E)2 729 81 324
(O-E)2 / E 14.58 1.62 6.48
c2 S(O-E)2 / E 14.58 1.62 6.48 22.68 c2 S(O-E)2 / E 14.58 1.62 6.48 22.68 c2 S(O-E)2 / E 14.58 1.62 6.48 22.68 c2 S(O-E)2 / E 14.58 1.62 6.48 22.68
21
Look up our p-value of c2 22.68 in Chi Square
table at 2 df. Find that the 22.68 is even
beyond .01 significance. The probability is plt
.0005, that is, less that 5 chances in 10,000
would produce a difference this big just by
chance. Or better, less than 5 samples 10,000 of
the same size would produce a difference this
big.
22
Step 6. Interpret The Chi Square value of
22.68 is beyond the critical value of
5.991.Therefore reject the null hypothesis of
equality. People do respond to price label
information.
23
Summing up the properties of the c2 Distribution

c2 distribution ranges from zero to some positive
value, i.e., no difference to some big
difference.
c2 distribution is not symmetrical, but skewed to
the right, from zero to a large positive c2. Chi
square looks at differences from zero. Its value
depends on the number of comparisons made, that
is, the number of df. Note that the critical
value of chi square gets bigger as the df get
bigger, just because the more comparisons made
the more likely you are to find differences, so
df corrects for this.
There are many different c2 distributions. Like
the t distribution, c2 varies with degrees of
freedom.

24
Another Example

Levels of political activism by ideology
Are conservative college students more likely to
participate in activism on campus?
If this is true, we should see a disproportionate
number of conservative student activists. If
not, the distribution of activists by ideology
should be random.

25
Student Activists
Observed Expected
Conservative 33 20
Liberal 7 20
Total 40 40
Null hypothesis Alternative hypothesis
26