Testing observations against an expected distribution - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Testing observations against an expected distribution

Description:

... yellow, but heterozygous for a recessive pink allele, and observe 40 offspring: ... disagree with a model where pink color is recessive to yellow eye ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 14
Provided by: ericvann
Category:

less

Transcript and Presenter's Notes

Title: Testing observations against an expected distribution


1
Testing observations against an expected
distribution
You mate two banana slugs that are yellow, but
heterozygous for a recessive pink allele, and
observe 40 offspring
How can you test whether these results disagree
with a model where pink color is recessive to
yellow eye color?
Stats terminology H0 The data is consistent
with the model HA The data is not consistent
Goal Can we reject H0?
2
What is the Chi-square test?
  • The chi-square test is used to test if a sample
    of data came from a population with a specific
    distribution.1
  • From Wikipedia2 It tests a null hypothesis that
    the relative frequencies of occurrence of
    observed events follow a specified frequency
    distribution. The events are assumed to be
    independent and have the same distribution, and
    the outcomes of each event must be mutually
    exclusive. A simple example is the hypothesis
    that an ordinary six-sided die is "fair", i.e.,
    all six outcomes occur equally often.

1NIST/SEMATECH e-Handbook of Statistical Methods,
http//www.itl.nist.gov/div898/handbook/eda/sectio
n3/eda35f.htm 2Wikipedia, Pearson's chi-square
test, http//en.wikipedia.org/wiki/Pearson27s_chi
-square_test
3
Pearson chi-square test
  • Summed over the k different outcomes,
  • What does this chi-square value mean?
  • How can we determine if this chi-square value is
    significant?

4
Chi-square tests Example 1
  • You mate two banana slugs that are yellow, but
    heterozygous for a recessive pink allele, and
    observe 40 offspring
  • Given the following phenotypic classes, does this
    data significantly differ from a Mendelian
    dominant/recessive allele?
  • Expect a 31 dominantrecessive ratio, so 40 x ¼
    and 40 x 3/4
  • How do we know if this chi-square value means the
    data was significantly different than our
    expected values?

5
What is a significant chi-square result?
Significance (aka p-value) how often would you
expect to find a chi-square value that large by
chance?
Exact method You can determine the probability
from the distribution using a statistics
program/package (or calculators on various
websites from google) Excel CHIDIST(chi-square
value,degrees of freedom) Typical usage
Compare Chi-square value against known critical
values the chi-square value that corresponds
to certain useful probabilities (0.01, 0.001, etc)
Links http//www.stat.tamu.edu/west/applets/chis
qdemo1.html will let you see the distribution for
various df (degrees of freedom) Above graph
Hyperstat (http//davidmlane.com/hyperstat/A100557
.html)
6
Sidetrack Degrees of freedom
  • Degrees of freedom refers to how many values you
    are free to choose in your dataset
  • For 1 variable (i.e., genotypes) Since the total
    has to equal N, if you have k categories
    (genotypes), once you know the number of
    observations for the first k-1 categories, the
    kth category must be N minus the sum
  • In the previous example you have 40 individuals,
    and once you know there are 28 yellow then there
    have to be 12 pink thus, there is only 1 value
    you can choose, or 1 degree of freedom
  • Why it matters if you have many degrees of
    freedom, you will get a higher chi-square value
    by chance, and thus your significance values will
    be lower for the same chi-square value

7
Example critical values
8
Example 1
  • Given the following phenotypic classes, does this
    data significantly differ from a Mendelian
    dominant/recessive allele?
  • Expect a 31 dominantrecessive ratio, so 40 x ¼
    and 40 x ¾
  • This data has 1 degree of freedom, so look up the
    table for df1 (actual p 0.47)

9
Sample size matters!
  • Expect a 31 dominantrecessive ratio, so 400 x ¼
    and 400 x ¾
  • Critical value is 5.024 for 0.025 significance
    with 1 degree of freedom (actual p-value is
    0.021)
  • Thus, now this is significant at a 0.025
    significance threshold (even though the ratio of
    yellowpink is the same as the previous example)

10
Example 2 Two traits
  • Analyzing 2 traits (A/a and B/b)
  • Given the following phenotypic class frequencies,
    does this data significantly differ from a
    two-trait Mendelian dominant/recessive allele?
  • If that hypothesis is true, you expect a 9331
    phenotypic ratio

11
Example 2 Two traits
  • In this case, there are 3 degrees of freedom (as
    there are 4 possible phenotypic classes, but the
    4th must be 160 minus the sum of the first three)
  • Thus, you would reject the model at the 0.01
    significance level, but not at the 0.001
    significance level (actual p-value 0.0024)

12
Multiple hypothesis testing
  • What does a significance of 0.01 mean?
  • You expect to observe a spurious result as
    significant as your result 1 out of every 100
    times
  • What happens if you test many thousands of
    different genes/traits/etc?

13
  • As an example you want to find differentially
    expressed genes on a microarray
  • Take a typical Affymetrix human expression
    microarray 20,000 genes
  • Using some algorithm, you find 300 genes
    significantly enriched at p0.01
  • The problem p0.01 means that 1 of every 100
    genes will be that significant by chance thus,
    you would expect 0.01 of 20000 200 genes to be
    that significant just by random chance! In other
    words, two-thirds of your 300 genes are
    false-positives that probably wont be
    interesting for further study
  • Multiple hypothesis correction many different
    methods depending on your specific topic/question
    (more complicated than required for this class,
    for more info consult a statistics
    person/textbook)
  • Simplest ( most conservative) method Bonferoni
    correction
  • If you are testing 20,000 genes, multiply every
    p-value by 20,000 to get an adjusted p-value
  • Thus, if you want an adjusted error rate of 0.01,
    you would look for genes with an original p
    5x10-7
Write a Comment
User Comments (0)
About PowerShow.com