L4.1 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

L4.1

Description:

Goodness of fit Testing goodness of fit Testing normality An important note on testing normality! Goodness of fit measures the extent to which some empirical ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 23
Provided by: CScottF
Category:

less

Transcript and Presenter's Notes

Title: L4.1


1
Lecture 4 Fitting distributions goodness of fit
  • Goodness of fit
  • Testing goodness of fit
  • Testing normality
  • An important note on testing normality!

2
Goodness of fit
  • measures the extent to which some empirical
    distribution fits the distribution expected
    under the null hypothesis

3
Goodness of fit the underlying principle
30
Expected
Observed
20
  • If the match between observed and expected is
    poorer than would be expected on the basis of
    measurement precision, then we should reject the
    null hypothesis.

0
Frequency
30
20
10
0
20
30
40
50
60
Fork length
4
Testing goodness of fit the Chi-square
statistic (C2)
  • Used for frequency data, i.e. the number of
    observations/results in each of n categories
    compared to the number expected under the null
    hypothesis.

5
How to translate C2 into p?
  • Compare to the ?2 distribution with n - 1 degrees
    of freedom.
  • If p is less than the desired ? level, reject the
    null hypothesis.

6
Testing goodness of fit the log likelihood-ratio
Chi-square statistic (G)
  • Similar to C2, and usually gives similar results.
  • In some cases, G is more conservative (i.e. will
    give higher p values).

7
c2 versus the distribution of C2 or G
  • For both C2 and G, p values are calculated
    assuming a ?2 distribution...
  • ...but as n decreases, both deviate more and more
    from ?2.

C2/G, very small n
C2/G, small n
c2
8
Assumptions (C2 and G)
  • n is larger than 30.
  • Expected frequencies are all larger than 5.
  • Test is quite robust except when there are only 2
    categories (df 1).
  • For 2 categories, both X2 and G overestimate ?2,
    leading to rejection of null hypothesis with
    probability greater than ?, i.e. the test is
    liberal.

9
What if n is too small, there are only 2
categories, etc.?
More data
  • Collect more data, thereby increasing n.
  • If n gt 2, combine categories.
  • Use a correction factor.
  • Use another test.

Classes combined
10
Corrections for 2 categories
  • For 2 categories, both X2 and G overestimate ?2,
    leading to rejection of null hypothesis with
    probability greater than ???i.e. test is
    liberal?.
  • Continuity correction add 0.5 to observed
    frequencies.
  • Williams correction divide test statistic (G or
    C2) by

11
The binomial test
  • Used when there are 2 categories.
  • No assumptions
  • Calculate exact probability of obtaining N - k
    individuals in category 1 and k individuals in
    category 2, with k 0, 1, 2,... N.

Probability
0
1
2
3
4
5
6
7
8
9
10
Number of observations
Binominal distribution, p 0.5, N 10
12
An example sex ratio of beavers
  • H0 sex-ratio is 11, so p 0.5 q
  • p(0 males, females) .00195
  • p(1 male/female, 9 male/female) .0195
  • p(9 or more individuals of same sex) .0215, or
    2.15.
  • therefore, reject H0

13
Multinomial test
  • Simple extension of binomial test for more than 2
    categories
  • Must specify 2 probabilities, p and q, for null
    hypothesis, p q r 1.0.
  • No assumptions...
  • ...but so tedious that in practice C2 is used.

14
Multinomial test segregation ratios
  • Hypothesis both parents Aa, therefore
    segregation ratio is 1 AA 2 Aa 1 aa.
  • So under H0, p .25, q .50, r .25
  • For N 60, p lt .001
  • Therefore, reject H0.

15
Goodness of fit testing normality
  • Since normality is an assumption of all
    parametric statistical tests, testing for
    normality is often required.
  • Tests for normality include C2 or G,
    Kolmogorov-Smirnov, Wilks-Shapiro Lilliefors.

16
Cumulative distributions
  • Areas under the normal probability density
    function and the cumulative normal distribution
    function

17
C2 or G test for normality
  • Put data in classes (histogram) and compute
    expected frequencies based on discrete normal
    distribution.
  • Calculate C2.
  • Requires large samples (kmin 10) and is not
    powerful because of loss of information.

Expected under hypothesis of normal distribution
Observed
18
Non-statistical assessments of normality
  • Do normal probability plot of normal equivalent
    deviates (NEDs) versus X.
  • If line appears more or less straight, then data
    are approximately normally distributed.

19
Komolgorov-Smirnov goodness of fit
  • Compares observed cumulative distribution to
    expected cumulative distribution under the null
    hypothesis.
  • p is based on Dmax, absolute difference, between
    observed and expected cumulative relative
    frequencies.

20
An example wing length in flies
  • 10 flies with wing lengths 4, 4.5, 4.9, 5.0,
    5.1, 5.3, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0
  • cumulative relative frequencies .1, .2, .3, .4,
    .5, .6, .7, .8, .9, 1.0

21
Lilliefors test
  • KS test is conservative for tests in which the
    expected distribution is based on sample
    statistics.
  • Liliiefors corrects for this to produce a more
    reliable test.
  • Should be used when null hypothesis is intrinsic
    versus extrinsic.

22
An important note on testing normality!
  • When N is small, most tests have low power.
  • Hence, very large deviations are required in
    order to reject the null.
  • When N is large, power is high.
  • Hence, very small deviations from normality will
    be sufficient to reject the null.
  • So, exercise common sense!
Write a Comment
User Comments (0)
About PowerShow.com