Title: Fitting probability models to frequency data
1Fitting probability models to frequency data
2Review - proportions
- Data discrete nominal variable with two states
(success and failure) - You can do two things
- Estimate a parameter with confidence interval
- Test a hypothesis
3Estimating a proportion
4Confidence interval for a proportion
where Z 1.96 for a 95 confidence interval
The Agresti-Couli method
5Hypothesis testing
- Want to know something about a population
- Take a sample from that population
- Measure the sample
- What would you expect the sample to look like
under the null hypothesis? - Compare the actual sample to this expectation
6not so weird
weird
7Sample
Null hypothesis
Test statistic
Null distribution
compare
How unusual is this test statistic?
P gt 0.05
P lt 0.05
Reject Ho
Fail to reject Ho
8Binomial test
9Test statistic
- For the binomial test, the test statistic is the
number of successes
10Binomial test
11The binomial distribution
12Binomial distribution, n 20, p 0.5
x
13Binomial distribution, n 20, p 0.5
Test statistic
x
14P-value
- P-value - the probability of obtaining the data
if the null hypothesis were true - as great or greater difference from the null
hypothesis
15P-value
- Add up the probabilities from the null
distribution - Start at the test statistic, and go towards the
tail - Multiply by 2 two tailed test
16Binomial distribution, n 20, p 0.5
P 2(Pr16Pr17Pr18 Pr19Pr20)
x
17Sample
Null hypothesis
Test statistic
Null distribution
compare
How unusual is this test statistic?
P gt 0.05
P lt 0.05
Reject Ho
Fail to reject Ho
18N 20, p0 0.5
This is a pain.
19Calculating P-values
- By hand
- Use computer software like jmp, excel
- Use tables
20Sample
Null hypothesis
Test statistic
Null distribution
compare
How unusual is this test statistic?
P gt 0.05
P lt 0.05
Reject Ho
Fail to reject Ho
21Discrete distribution
- A probability distribution describing a discrete
numerical random variable
22Discrete distribution
- A probability distribution describing a discrete
numerical random variable - Examples
- Number of heads from 10 flips of a coin
- Number of flowers in a square meter
- Number of disease outbreaks in a year
23c2 Goodness-of-fit test
- Compares counts to a discrete probability
distribution
24Hypotheses for c2 test
25Test statistic for c2 test
26(No Transcript)
27(No Transcript)
28Hypotheses for day of birth example
29(No Transcript)
30The calculation for Sunday
31(No Transcript)
32The sampling distribution of c2 by simulation
Frequency
c2
33Sampling distribution of c2 by the c2 distribution
34Degrees of freedom
- The number of degrees of freedom specifies which
of a family of distributions to use as the
sampling distribution
35Degrees of freedom for c2 test
df Number of categories - 1 - (Number of
parameters estimated from the data)
36Degrees of freedom for day of birth
df 7 - 1 - 0 6
37Finding the P-value
38Critical value
The value of the test statistic where P a.
3912.59
40(No Transcript)
41(No Transcript)
42Plt0.05, so we can reject the null
hypothesis Babies in the US are not born
randomly with respect to the day of the week.
43(No Transcript)
44Assumptions of c2 test
- No more than 20 of categories have Expectedlt5
- No category with Expected ? 1
45c2 test as approximation of binomial test
- If the number of data points is large, then a c2
goodness-of-fit test can be used in place of a
binomial test. - See text for an example.
46The Poisson distribution
- Another discrete probability distribution
- Describes the number of successes in blocks of
time or space, when successes happen
independently of each other and occur with equal
probability at every point in time or space
47(No Transcript)
48Poisson distribution
49Example Number of goals per side in World Cup
Soccer
Q Is the outcome of a soccer game (at this
level) random? In other words, is the number of
goals per team distributed as expected by pure
chance?
50World Cup 2002 scores
51Number of goals for a team(World Cup 2002)
52Whats the mean, m?
53Poisson with m 1.26
54Finding the Expected
Too small!
55Calculating c2
56Degrees of freedom for poisson
df Number of categories - 1 - (Number of
parameters estimated from the data)
57Degrees of freedom for poisson
df Number of categories - 1 - (Number of
parameters estimated from the data)
Estimated one parameter, ?
58Degrees of freedom for poisson
df Number of categories - 1 - (Number of
parameters estimated from the data) 5 - 1 - 1
3
59Critical value
60Comparing c2 to the critical value
So we cannot reject the null hypothesis. There
is no evidence that the score of a World Cup
Soccer game is not Poisson distributed.