STANDARD DISTRIBUTIONS ExamplesExtensions - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

STANDARD DISTRIBUTIONS ExamplesExtensions

Description:

The corresponding parent population characteristics are N (or infinity) ... of the corresponding sampling distribution is equal to the parent characteristic. ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 18
Provided by: markush
Category:

less

Transcript and Presenter's Notes

Title: STANDARD DISTRIBUTIONS ExamplesExtensions


1
STANDARD DISTRIBUTIONS -Examples/Extensions
  • GENETIC LINKAGE and MAPPING
  • Linkage Phase - chromatid associations of
    alleles of linked loci
  • - same
    chromosome coupled, different repulsion
  • Genetic Recombination - define R.F. (gametes or
    phenotypes) homologous case - greater the
    distance, greater chance of recombining.High
    interference problem for multiple locus models.
    R.F. between loci not additive. Need Mapping
    Function
  • Haldanes Mapping Function
  • Assume crossovers occur randomly along
    chromosome length and average number ?, model
    as Poisson
  • PNo crossover e - ? and
    PCrossover 1- e - ?

2
Examples - continued
  • Precombinant 0.5?P(Crossover (each pair of
    homologs, with one crossover results in one-half
    recombinant gametes)
  • Define Expected No. recombinants as mapping
    function (m 0.5 ?)
  • R.F. r 0.5(1-e
    -2m) (form of Haldanes M.F.)
  • with inverse m - 0.5 ln (1-2r)
  • converting an estimated R.F. to Haldanes
    map distance
  • Thus, for locus order ABC
  • mAC mAB
    mBC (since mAB - 0.5ln(1-2rAB) )etc.
  • Substituting for each of these gives us the
    usual relationship between R.F.s (no
    interference situation)
  • Net Effect - transform to straight line mAC vs
    mAB or mBC
  • In practice - too simple only applies to
    specific conditions may not relate directly to
    physical distance -(common M.F. problem).

3
Example expanded- LINKAGE/MAPPING
  • Genetic Map -Models linear arrangement of group
    of genes / markers (easily identified genetic
    features - e.g. change in known gene, piece of
    DNA -no known function). Map based on homologous
    recombination during meiosis. If two or more
    markers located close together on chromosome,
    alleles usually inherited through meiosis
  • 4 steps after marker data obtained. Pairwise
    linkage - all 2-locus combinations (based on
    observed and expected frequencies of genotypic
    classes). Grouping markers into Linkage Groups
    (based on R.F.s, significance level etc.). No.
    of linkage groups should ? haploid no.
    chromosomes for organism. Ordering of markers
    (key step, computationally demanding, precision
    important). Estimation multipoint R.F. (physical
    distance - no. of DNA base pairs between two
    genes vs map distance transformation of R.F.).
    Ultimate physical map DNA sequence (restriction
    map also common)

4
Example
  • RECOMBINANTS and MULTINOMIAL
  • Binomial No. of recombinant gametes, produced by
    a heterozygous parent for a 2-locus model, with ?
    Pgamete recombinant (R.F.)
  • So for r recombinants in sample of n
  • Multinomial 3-locus model (A,B,C) - 4 possible
    classes of gametes
  • (non-recombinants, AB recombinants, BC
    recombinants and double recombinants loci ABC).
    Joint probability distribution for r.v.s
    counting number in each class
  • where abcdn and P1, P2, P3, P4 are
    probabilities of observing a member of each of 4
    classes respectively

5
Sampling and Sampling Distributions
  • Random Sampling. If the same element can not be
    selected more than once, we say that the sample
    is drawn without replacement otherwise, the
    sample is said to be drawn with replacement.The
    usual convention in sampling is that lower case
    letters are used to designate the sample
    characteristics. Thus if the sample size is n,
    its elements are designated, x1, x2, , xn , its
    mean is and its modified variance is
  • The corresponding parent population
    characteristics are N (or infinity), X and S2.

6
Central Limit Theorem
  • Suppose that we repeatedly draw random
    samples of size n (with replacement) from a
    distribution with mean m and variance s2. Let
  • be the
    collection of sample averages and let
  • where the collection is
    called the sampling distribution of means.
  • Central Limit Theorem.If X1, X2, Xn are a
    random sample of r.v. X, (mean ?, variance ?2),
    then, in the limit, as n ??, the sampling
    distribution of means has a Standard Normal
    distribution, N(0,1)

7
Probabilities for sampling distribution
  • Statements
    for large n
  • U standardized Normal deviate
  • and, in particular
  • In general, closer r.v. X to Normal, the faster
    the approximation approaches U. Generally n? 30
    ? Large sample theory

8
Attribute and Proportionate Sampling
  • If the sample elements are a measurement of some
    characteristic, we are said to have attribute
    sampling.
  • If all the sample elements are 1 or 0
    (success/failure, agree/ do-not-agree), we have
    proportionate sampling.
  • For proportionate sampling, the sample average
    and the sample proportion are synonymous,
    just as are the mean m and proportion p (or P)
    for the parent population. From our results on
    the binomial distribution, the sample variance is
    p (1 - p) and the variance of the parent
    distribution is P (1 - P).

9
Probability Statements
  • If X and Y independent Binomially distributed
    r.v.s parameters n, p and m, p respectively,
    then XY B(nm, p) - (show e.g. by m.g.f.s)
  • So, YX1 X2. Xn B(n, p) for the IID
    XB(1, p).
  • Since we know ?Y np, ?Y?(npq) and, clearly
    then
  • and, further
    sampling distribution of a proportion

10
Probability Statements- C.L.T. and Approximation
summary
  • General form of theorem - an infinite sequence of
    independent r.v.s, with means, variances as
    before, then approximation ? U for n large
    enough. Note No condition on form of
    distribution of Xs (raw data)
  • Strictly - for approximations of discrete
    distributions, can improve by considering
    correction for continuity
  • e.g.

11
Generalising Sampling Distn. concept
  • For the sampling distribution of any statistic.
    We say that a sample characteristic is an
    unbiased estimator of the parent population
    characteristic, if the mean of the corresponding
    sampling distribution is equal to the parent
    characteristic.
  • Lemma. The sample average (proportion ) is an
    unbiased estimator of the parent average
    (proportion) Recall Sampling without
    replacement from finite population -
    Hypergeometric. The quantity Ö ( N - n) / ( N -
    1) is called the finite population correction
    (fpc). If the parent population is infinite or if
    sampling with replacement the fpc 1.
  • Lemma. E s S ? fpc.

12
NON-PARAMETRICS/DISTRIBUTION FREE
  • Standard Probability Distributions -do not apply
    to data or sampling distributions or test
    statistics - uncertainty because small or
    unreliable data sets, non-independence etc.
    Parameter estimation - not key issue.
  • Example or Empirical-basis. Weaker assumptions.
    Less information.
  • e.g. median, not mean. Simple hypothesis
    testing rather than estimation.
  • Counts - nominal, ordinal (natural
    non-parametric).
  • Power and efficiency
  • Nonparametric Hypothesis Tests -main focus of
    interest here (standard parallels to the
    parametric case).
  • e.g. H.T. of locus orders have complex test
    statistic distributions, for which need to
    construct empirical probability distributions.
    Usually obtained, by assuming the null hypothesis
    using re-sampling techniques, e.g. permutation
    tests, bootstrap, jacknife

13
LIKELIHOOD - DEFINITIONS
  • Suppose X can take a set of values x1,x2,with
  • where is a vector of parameters
    affecting observed xs
  • e.g. . So can say
    something about PX if we
  • know say
  • But not usually case, i.e. observe xs, knowing
    nothing of
  • Assuming xs a random sample size n from a known
    distribution, then likelihood for
  • Finding most likely for given data is
    equivalent to Maximising the Likelihood function.
  • where M.L.E. is

14
LIKELIHOOD SCORE and INFO. CONTENT
  • The Log-likelihood is a support function S(?)
    evaluated at point,
  • ? say
  • Support function for any other point, say ?
    can be obtained approx., using the Taylor
    expansion
  • and this is the basis of the Newton-Raphson
    iteration for the M.L.E.
  • SCORE first derivative of support function
    w.r.t. the parameter

  • or, numerically,
  • INFORMATION CONTENT evaluated at (i) arbitrary
    point Observed Info. (ii)support function
    maximum Expected Info.


15
Example -Binomial variable(e.g. use of Score,
Expected Info. Content to determine type of
mapping population and sample size for genomics
experiments)
  • Likelihood function
  • Log-likelihood
  • Assuming n constant , then first term invariant
    w.r.t. p S(? ) at point p
  • Maximising w.r.t. p gives M.L.E.
    with SCORE

16
Bayesian Estimation- in context
  • Parametric Estimation - in the classical
    approach f(x,?) for a r.v. X of density f(x) ,
    with ? the unknown parameter indicates the
    dependency of the distribution on the parameter
    to be estimated.
  • Bayesian Estimation- ? is a random variable, so
    appropriate to consider the density as
    conditional and write f(x/ ? )
  • Given a random sample X1, X2, Xn the
    sample random variables can be considered jointly
    distributed with parameter r.v. ?. So, joint pdf
  • Objective - to form an estimator that gives a
    value of ? dependent on observations of the
    sample random variables. Thus conditional density
    of ? given X1, X2, Xn also plays a role. This is
    the posterior density

17
Bayes - contd.
  • Posterior Density
  • Relationship - prior and posterior
  • where ?(?) prior density of ?
  • Value Close to MLE for large n, or for small n
    if sample values compatible with the prior
    distribution, strong sample basis, simpler to
    calculate.
Write a Comment
User Comments (0)
About PowerShow.com