Title: STANDARD DISTRIBUTIONS ExamplesExtensions
1STANDARD DISTRIBUTIONS -Examples/Extensions
- GENETIC LINKAGE and MAPPING
- Linkage Phase - chromatid associations of
alleles of linked loci - - same
chromosome coupled, different repulsion - Genetic Recombination - define R.F. (gametes or
phenotypes) homologous case - greater the
distance, greater chance of recombining.High
interference problem for multiple locus models.
R.F. between loci not additive. Need Mapping
Function - Haldanes Mapping Function
- Assume crossovers occur randomly along
chromosome length and average number ?, model
as Poisson - PNo crossover e - ? and
PCrossover 1- e - ?
2Examples - continued
- Precombinant 0.5?P(Crossover (each pair of
homologs, with one crossover results in one-half
recombinant gametes) - Define Expected No. recombinants as mapping
function (m 0.5 ?) - R.F. r 0.5(1-e
-2m) (form of Haldanes M.F.) - with inverse m - 0.5 ln (1-2r)
- converting an estimated R.F. to Haldanes
map distance - Thus, for locus order ABC
- mAC mAB
mBC (since mAB - 0.5ln(1-2rAB) )etc. - Substituting for each of these gives us the
usual relationship between R.F.s (no
interference situation) - Net Effect - transform to straight line mAC vs
mAB or mBC - In practice - too simple only applies to
specific conditions may not relate directly to
physical distance -(common M.F. problem).
3Example expanded- LINKAGE/MAPPING
- Genetic Map -Models linear arrangement of group
of genes / markers (easily identified genetic
features - e.g. change in known gene, piece of
DNA -no known function). Map based on homologous
recombination during meiosis. If two or more
markers located close together on chromosome,
alleles usually inherited through meiosis - 4 steps after marker data obtained. Pairwise
linkage - all 2-locus combinations (based on
observed and expected frequencies of genotypic
classes). Grouping markers into Linkage Groups
(based on R.F.s, significance level etc.). No.
of linkage groups should ? haploid no.
chromosomes for organism. Ordering of markers
(key step, computationally demanding, precision
important). Estimation multipoint R.F. (physical
distance - no. of DNA base pairs between two
genes vs map distance transformation of R.F.).
Ultimate physical map DNA sequence (restriction
map also common)
4Example
- RECOMBINANTS and MULTINOMIAL
- Binomial No. of recombinant gametes, produced by
a heterozygous parent for a 2-locus model, with ?
Pgamete recombinant (R.F.) - So for r recombinants in sample of n
- Multinomial 3-locus model (A,B,C) - 4 possible
classes of gametes - (non-recombinants, AB recombinants, BC
recombinants and double recombinants loci ABC).
Joint probability distribution for r.v.s
counting number in each class - where abcdn and P1, P2, P3, P4 are
probabilities of observing a member of each of 4
classes respectively
5Sampling and Sampling Distributions
- Random Sampling. If the same element can not be
selected more than once, we say that the sample
is drawn without replacement otherwise, the
sample is said to be drawn with replacement.The
usual convention in sampling is that lower case
letters are used to designate the sample
characteristics. Thus if the sample size is n,
its elements are designated, x1, x2, , xn , its
mean is and its modified variance is -
- The corresponding parent population
characteristics are N (or infinity), X and S2.
6Central Limit Theorem
- Suppose that we repeatedly draw random
samples of size n (with replacement) from a
distribution with mean m and variance s2. Let - be the
collection of sample averages and let - where the collection is
called the sampling distribution of means. - Central Limit Theorem.If X1, X2, Xn are a
random sample of r.v. X, (mean ?, variance ?2),
then, in the limit, as n ??, the sampling
distribution of means has a Standard Normal
distribution, N(0,1)
7Probabilities for sampling distribution
- Statements
for large n -
- U standardized Normal deviate
- and, in particular
- In general, closer r.v. X to Normal, the faster
the approximation approaches U. Generally n? 30
? Large sample theory
8Attribute and Proportionate Sampling
- If the sample elements are a measurement of some
characteristic, we are said to have attribute
sampling. -
- If all the sample elements are 1 or 0
(success/failure, agree/ do-not-agree), we have
proportionate sampling. -
- For proportionate sampling, the sample average
and the sample proportion are synonymous,
just as are the mean m and proportion p (or P)
for the parent population. From our results on
the binomial distribution, the sample variance is
p (1 - p) and the variance of the parent
distribution is P (1 - P).
9Probability Statements
- If X and Y independent Binomially distributed
r.v.s parameters n, p and m, p respectively,
then XY B(nm, p) - (show e.g. by m.g.f.s) - So, YX1 X2. Xn B(n, p) for the IID
XB(1, p). - Since we know ?Y np, ?Y?(npq) and, clearly
then -
- and, further
sampling distribution of a proportion
10Probability Statements- C.L.T. and Approximation
summary
- General form of theorem - an infinite sequence of
independent r.v.s, with means, variances as
before, then approximation ? U for n large
enough. Note No condition on form of
distribution of Xs (raw data) - Strictly - for approximations of discrete
distributions, can improve by considering
correction for continuity - e.g.
11Generalising Sampling Distn. concept
- For the sampling distribution of any statistic.
We say that a sample characteristic is an
unbiased estimator of the parent population
characteristic, if the mean of the corresponding
sampling distribution is equal to the parent
characteristic. - Lemma. The sample average (proportion ) is an
unbiased estimator of the parent average
(proportion) Recall Sampling without
replacement from finite population -
Hypergeometric. The quantity Ö ( N - n) / ( N -
1) is called the finite population correction
(fpc). If the parent population is infinite or if
sampling with replacement the fpc 1. - Lemma. E s S ? fpc.
12NON-PARAMETRICS/DISTRIBUTION FREE
- Standard Probability Distributions -do not apply
to data or sampling distributions or test
statistics - uncertainty because small or
unreliable data sets, non-independence etc.
Parameter estimation - not key issue. - Example or Empirical-basis. Weaker assumptions.
Less information. - e.g. median, not mean. Simple hypothesis
testing rather than estimation. - Counts - nominal, ordinal (natural
non-parametric). - Power and efficiency
- Nonparametric Hypothesis Tests -main focus of
interest here (standard parallels to the
parametric case). - e.g. H.T. of locus orders have complex test
statistic distributions, for which need to
construct empirical probability distributions.
Usually obtained, by assuming the null hypothesis
using re-sampling techniques, e.g. permutation
tests, bootstrap, jacknife
13LIKELIHOOD - DEFINITIONS
- Suppose X can take a set of values x1,x2,with
- where is a vector of parameters
affecting observed xs - e.g. . So can say
something about PX if we - know say
- But not usually case, i.e. observe xs, knowing
nothing of - Assuming xs a random sample size n from a known
distribution, then likelihood for - Finding most likely for given data is
equivalent to Maximising the Likelihood function.
- where M.L.E. is
-
14LIKELIHOOD SCORE and INFO. CONTENT
- The Log-likelihood is a support function S(?)
evaluated at point, - ? say
- Support function for any other point, say ?
can be obtained approx., using the Taylor
expansion - and this is the basis of the Newton-Raphson
iteration for the M.L.E. - SCORE first derivative of support function
w.r.t. the parameter -
or, numerically, - INFORMATION CONTENT evaluated at (i) arbitrary
point Observed Info. (ii)support function
maximum Expected Info. -
15Example -Binomial variable(e.g. use of Score,
Expected Info. Content to determine type of
mapping population and sample size for genomics
experiments)
- Likelihood function
- Log-likelihood
- Assuming n constant , then first term invariant
w.r.t. p S(? ) at point p - Maximising w.r.t. p gives M.L.E.
with SCORE
16 Bayesian Estimation- in context
- Parametric Estimation - in the classical
approach f(x,?) for a r.v. X of density f(x) ,
with ? the unknown parameter indicates the
dependency of the distribution on the parameter
to be estimated. - Bayesian Estimation- ? is a random variable, so
appropriate to consider the density as
conditional and write f(x/ ? ) - Given a random sample X1, X2, Xn the
sample random variables can be considered jointly
distributed with parameter r.v. ?. So, joint pdf - Objective - to form an estimator that gives a
value of ? dependent on observations of the
sample random variables. Thus conditional density
of ? given X1, X2, Xn also plays a role. This is
the posterior density
17Bayes - contd.
- Posterior Density
- Relationship - prior and posterior
- where ?(?) prior density of ?
- Value Close to MLE for large n, or for small n
if sample values compatible with the prior
distribution, strong sample basis, simpler to
calculate.