Title: ChiSquare
1Chi-Square
2Uses of chi-square
- a mathematical distribution
- a test for categorical data
- (more generally) a test for the fit of data to a
theory
3The Chi-Square Distribution
where
or
for z an integer
4(matched in beauty only by Students t)
where
5the chi-square family
k1
k2
k4
k8
6Continuous vs. Discrete distributions
k8
these have probability density
these have probability
7from probabilities to densities
k8
8(familiar?) question
Given a large set of data converted to z-scores,
the mean and variance will obviously be 0 and 1.
Now if we square all our scores and compute the
mean, what will it be?
9chi-square and z
mean 0.989 sd 1.390
10chi-square and z (cont.)
so z2 is distributed as
and, further,
is distributed as
(and this is the really useful bit)
11why is this useful?
is distributed as
Because, basically it tells us the sampling
distribution of the difference between what we
expected and what we observed. As is always the
case, if the form of a sampling distribution is
known, a lot of work is saved.
12example how well do these data fit my theory?
13how well do these data fit my theory?
- let m stand for expected value
- calculate s2 in the regular way
- calculate
and compare it to the known sampling distribution
14What does that tell us?
How well do these data fit my theory?turned
intoGiven my theory, what are the odds of
observing data this bad or worse?or, more
technicallyAssuming that my theory is correct,
what is the probability of observing a departure
from the theory at least as large as that
represented by the data?
15Interpretation
- a large chi2 is bad news for your theory
- a small chi2 is good
16Have we been here before?
- yes
- substitute null hypothesis for my theory
- flip interpretation of large small values of
the statistic
17The chi-square goodness-of-fit test
(and here we go again)
Does the sign on our rear door have any effect?
rear
front
Observed
Expected
18The one-way calculation
rear
front
Observed
Expected
3.52/13.5 3.52/13.5 1.88
19chi-square calc. (cont.)
p 0.17
1.88
k1
so if people ordinarily use both doors with
equal frequency, then we would see behavior at
least this aberrant almost 2 times out of 5.
20Contingency Table Analysis
Eij (RiCj)/N
in parentheses are
21Expected frequencies
p(low) R/N 177/358
Eij (RiCj)/N
(?)
p(guilty) C/N 258/358
p(low, guilty) (R/N)(C/N) (RC) / N2
(low, guilty) p(low, guilty)N (RC)/N
22Contingency table chi-square
and is distributed on (R-1)(C-1) df
35.93
.99
.01
k1
hm
6.6
23other stuff
- continuity correction (it exists)
- small expected values (be careful)
- Assumptions
- independence
- normality (sm. expected frequences)
- including non-occurrences (make sure your
analyzed data totals to n)
24Likelihood Ratio Tests
(not yet)
25Measures of Association
- Cramers Phi - chi-sq (un)weighted by N
- Odds ratios - good for reporting low relative
frequencies
odds ratio (a/b)/(c/d) 1.8
26That variance thing
Sample variance is unbiased, but is distributed
as
where
s1.0 s1.06
s1.0 s1.02