How many genes? Mapping mouse traits, cont. - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

How many genes? Mapping mouse traits, cont.

Description:

How many genes? Mapping mouse traits, cont. Lecture 3, Statistics 246 January 27, 2004 Inferring linkage and mapping markers We now turn to deciding when two marker ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 36
Provided by: Melani145
Category:

less

Transcript and Presenter's Notes

Title: How many genes? Mapping mouse traits, cont.


1
How many genes? Mapping mouse traits, cont.
  • Lecture 3, Statistics 246
  • January 27, 2004

2
Inferring linkage and mapping markers
  • We now turn to deciding when two marker loci are
    linked, and if so, estimating the map distance
    between them. Then we go on and create a full
    (marker) map of each chromosome, relative to
    which we can map trait genes. With these
    preliminaries completed, we can map trait loci.

3
The LOD score
  • Suppose that we have two marker loci, and we
    dont know whether or not they are linked. A
    natural way to address this question is to carry
    out a formal test of the null hypothesis H r1/2
    against the alternative K rlt 1/2, using the
    marker data from our cross. The test
    statistic almost always used in this context is
    log10 of the ratio of the likelihood at the
    maximum likelihood estimate to that at the
    null, r1/2, i.e.

4
Calculating the LOD score
  • Recall that the (log) likelihood here is based
    on the multinomial distribution for the
    allocation of n132 intercross mice into their
    nine 2-locus genotypic categories. As we saw
    earlier, it can be written
  • and so we take the difference between this
    function evaluated at and at r1/2, which is
  • where qi is 1/16, 1/8 or 1/4, depending on i.

5
Null probabilities of 2-locus genotypes
L1 L2 A H B
A 1/16 1/8 1/16
H 1/8 1/4 1/8
B 1/16 1/8 1/16
This is just putting r 1/2 in an earlier
table.
Exercise Suggest some different test statistics
to discriminate between the null H and the
alternative K. How do they perform in comparison
to the LOD?
6
Using the LOD score
  • Normal statistical practice would have us
    setting a type 1 error in a given context (cross,
    sample size), and determining the cut-off for the
    LOD which would achieve approximately the desired
    error under the null hypothesis.
  • This approach is rarely adopted in genetics,
    where tradition dictates the use of more
    stringent thresholds, which take into a account
    the multiple testing common on linkage mapping.
    It was originally motivated by a Bayesian
    argument, and in fact, Bayesian approaches to
    linkage analysis are increasingly popular. Let
    us use of Bayes formula in the form
  • log10 posterior odds log10 prior odds
    LOD,
  • where the odds are for linkage. With 20
    chromosomes, which we might assume approx the
    same size, and not too long, the prior
    probability of two random loci being on the same
    chromosome and hence linked, is about 1/20. In
    order to overcome these prior odds against
    linkage, and achieve reasonable posterior odds,
    say 1001, we would want a LOD of at least 3.

7
Linkage groups
  • And so it has come to pass that a LOD must be
    gt3 to get peoples attention. Well be a little
    more precise later.
  • The next step is to define what are called
    linkage groups. These partition the markers into
    classes, every pair of markers being either
    closely linked (i.e. r ? 0), or being connected
    by a chain of markers, each consecutive pair of
    which is closely linked. In practice, we might
    define closely linked to be something like
  • a) lt c1, and b) LOD( ) gt c2, where
    e.g. c1 0.2, c2 3.

8
Forming linkage groups, cont.
  • When one tries to form linkage groups, it is
    not unusual to have to vary c1 and c2 a little,
    until all markers fall into a group of more than
    just one marker. When this is done, it is hoped
    that the linkage groups correspond to
    chromosomes. If the chromosome number of the
    species is known, and that coincides with the
    number of linkage groups, this is a reasonable
    presumption. But much can happen to dash this
    hope one may have two linkage groups
    corresponding to different arms of the same
    chromosome, and not know that one can have a
    marker at the end of one chromosome linked to a
    marker at the end of another chromosome, though
    this should be rare if there is plenty of data
    and so on.

9
Ordering linkage groups
  • Next we want to order the markers in a
    linkage group( ideally, on a chromosome). How do
    we do that? An initial ordering can be done by
    starting one of the markers, M1 say, on the most
    distant pair, here distance being recombination
    fraction, or map distance. Call M2 the closest
    marker to M1 and continue in this way.
  • Now we want to confirm our ordering. One way
    is to calculate a (maximized) log likelihood for
    every ordering, and select the one with the
    largest log likelihood. But if we have (say) 11
    markers on a chromosome, this is 11! 4?107
    orders. What people often do is take moving
    k-tuples of markers, and optimize the order of
    each, e.g. with k 3 or 4. Whichever strategy
    one adopts, multi (i.e. gt2) locus methods are
    needed.

10
Likelihoods for 3-locus data
  • Suppose that we have 3 markers M1 , M2 and
    M3 in that order. How do we calculate the log
    likelihood of the associated 3-locus marker data
    from our intercross?
  • Recalling the discussion preceding the
    Punnett square of the last lecture, the parental
    haplotypes here are a1a2a3 and b1b2b3 while
    are would no fewer than 6 forms of recombinant
    haplotypes
  • the four single recombinants a1a2b3 , a1 b2
    b3 , b1b2a3 and b1a2a3 ,
    and the two double recombinants a1b2 a3 and
    b1a2b3 .
  • Proceeding as before, we calculate the
    probability of each of these in terms of the
    recombination fractions r1 and r2 across
    intervals M1-M2, and M2-M3, respectively. For
    simplicity, we assume the Poisson model, with
    independence of recombination across disjoint
    intervals. For example, a1a2a3 would have
    probability (1- r1)(1- r2)/4, a1a2b3 would have
    probability (1- r1)r2/4, while a1b2 a3 would
    have probability r1r2 .
  • We would do this for every one of the 8
    paternal and 8 maternal haplotypes, and then
    collect them up to assign probabilities for each
    of the 33 3-locus genotypes (AAA, AAH, , BBB),
    and maximize the multinomial likelihood in the
    parameters r1 and r2 . This is just as in the
    2-locus case.

11
Multilocus linkage loci gt3
  • It should have become clear by now that the
    strategy just outlined is not going to work too
    easily when there are (say) 11 loci in a linkage
    group.
  • In that case, haplotypes are strings of the
    form a1a2b3 a10b11 , where there are just 2
    parental and 210-2 distinct recombinant
    haplotypes. The number of parental haplotype
    combinations is the square of this number, and
    they must be mapped into 311 11-locus genotypes,
    and a multinomial MLE carried out to estimate 10
    recombination fractions. What can be done?
  • In 1987 the first large scale human genetic
    map was published, and at the same time a new
    algorithm was announced for both human pedigrees
    and experimental crosses, such as our intercross.
    This algorithm made use of hidden Markov models,
    and for the first time allowed full likelihood
    calculations in our current context without the
    exponential blow-up just described.

12
Multilocus mapping no details
  • Im not going to cover this topic in detail this
    year, as I discussed it a few years ago, and
    those interested can read it there
  • www.stat.berkeley.edu/users/terry/Classes/s260.199
    8/index.html
  • We will meet hidden Markov models again pretty
    soon, as they are have become a common feature of
    statistical genetics and computational biology
    since the early 1980s.
  • Now suppose that we have ordered our marker
    loci as just described, either by maximizing the
    likelihood within linkage groups over all orders,
    or by doing so in moving windows of size 3-5. How
    do we look at the result?

13
Checking the map, after removal of bad markers
Top triangle is a transform of the recombination
fraction, namely -4(1log2r ). Bottom triangle
contains the LOD scores at the maximum likelihood
estimate of recombination fraction. Notice the
bad bits in the top LH and bottom RH corners.
est.rf, plot.rf (from an R package)
14
Checking existing genetic maps
  • As indicated earlier, the markers in our cross
    came from MIT, and they were already mapped.
    Most researchers would simply use the
    pre-existing map, as this would usually (but not
    always) be based on many more recombinations than
    could be expected in a single cross. Why might we
    not just do the same?
  • Well, existing maps are rarely completely
    error-free, and one should always look at ones
    own data.
  • An added benefit of looking at ones own data
    in relation to an existing map is that this
    should bring to light markers with a large
    numbers of genotyping errors, assuming the map is
    correct.

15
Interplay between error detection and maps
  • Genotyping errors in mouse crosses can usually
    only be detected with the appearance of unusual
    numbers of close recombination events
  • This depends entirely on the quality of the map
  • The availability of the mouse genome sequence
    allows us to check genetic maps against the
    physical maps we locate the (unique) PCR primers
    for our microsatellite markers. This has brought
    a new era in quality of maps (includes human
    genetic maps!).
  • The next slide depicts the genetic map we used.

16
Locations of our markers
After a commercial, we move on to mapping coat
color genes.

17
R
18
R/qtl
Authors Karl Broman, Hao Wu, Gary Churchill,
Saunak Sen, Brian Yandell
19
Benefits of using R/qtl
  • Lots of graphics
  • Good error detection with accompanying graphics
  • Single and two qtl mapping (and interaction
    terms)
  • Choice of several input formats
  • Includes Mapmaker format
  • Many alternatives for mapping methods
  • Many different models for phenotypes, e.g.
    standard normal, nonparametric model, binary
    traits

20
Why map coat color genes in our C57/BL6 x NOD F2
intercross?
  • the locations of these genes are known
  • even with a modest number of mice we should be
    able to map these genes easily
  • it is a useful check that everything is as it
    should be with our data
  • and finally, it is a good exercise for us.
  • Exercise. Look up the agouti and albino loci at
    the Mouse Genome Informatics database.

21
Recall our earlier Punnett square
22
Segregation data at a random marker
  • Phenotype by genotype at D12Mit51
  • (complete data only)
  • A B H
  • Agouti 19 18 35
  • Black 8 3 18
  • White 9 7 12

23
Mapping a segregating trait
  • We turn now to mapping the two coat color
    genes segregating in our cross, beginning with
    the albino locus, and then the agouti locus. To
    do so, we need a genetic model, that is, we need
    to know or guess the relation between genotypes
    at our trait loci and phenotypes, which is
    embodied in the notion of a penetrance function.
  • Looking at the preceding table, the albino
    trait segregates just as though governed by a
    recessive gene, so we postulate a locus with a
    recessive and a dominant allele for it. Although
    this is not precisely the case for the non-agouti
    trait, it is almost, and we do likewise.
  • Later we will consider their interaction.

24
Probabilities of albino-marker genotypes (?4)
  • Recall that the NOD mouse (A) is homozygous
    for the albino allele, while the C57/BL6 (B) is
    homozygous for the non-albino allele. We can
    collapse an earlier table to get

Colour M A H B
Albino (1-r)2 2r(1-r) r2
Full color 1-(1-r)2 2 - 2r(1-r) 1-r2
Here r is the rec. fr. between a marker and the
albino locus.
25
Segregation data at the marker closest to Tyrc
  • Phenotype by genotype at D7Mit126
  • _at_ 50 cM (the Tyrc locus is at 44 cM)
  • A B H
  • Agouti 3 19 47
  • Black 0 10 19
  • White 21 0 1

26
Mapping the albino locus
Plot of LOD score at each marker along the genome
27
Chromosome 7 genotypes for the albino mice.
A homozygous NOD, B homozygous B6, H
heterozygote. Genotypes are read down.
Pale blue shading is conserved NOD
haplotype. D7Mit128 is near the Tyrc locus,
28
Honesty in advertising, and LOD thresholds
  • There is more material in preparation here.
  • Please revisit this space in a day or so.

29
Approximate probabilities of agouti-marker
genotypes (?4)
  • Recall that the C57/BL6 (B) is homozygous for
    non-agouti, while the NOD (A) is homozygous
    agouti. Ignoring the 1/16 of the intercross who
    would exhibit the non-agouti trait (and be black)
    if they werent albino, we get the following
    approximate table, where 1/16 of the mice will be
    misclassified. Here r is the recombination
    fraction between a marker and the agouti locus.

Colour M A H B
Non-black 1-r2 2-2r(1-r) 1- (1-r)2
Black r2 2r(1-r) (1-r)2
30
Segregation data at the marker closest to the
agouti locus
  • Phenotype by genotype at D2Mit48
  • _at_ 87 cM (agouti locus is at 89 cM)
  • A B H
  • Agouti 24 2 46
  • Black 0 28 1
  • White 5 6 14

31
Mapping the agouti locus
Plot of LOD score at each marker along the genome
32
Chromosome 2 genotypes for the black progeny.
Mauve shading indicates conserved C57/BL6
haplotype. Marker D2Mit48 is very close to the
agouti locus.
33
Conclusion single locus mapping
  • agouti locus (A,a alleles) on Chr 2 at 89.9 cM
  • albino locus (C,c alleles) on Chr 7 at 44 cM (now
    known as Tyrc gene)
  • In the data set
  • at 89 cM on Chr 2 with a LOD score gt 20
  • Marker D2M48 (8th marker on Chr 2)
  • at 43 cM on Chr 7 with a LOD score gt 20
  • Marker D7M126 (4th marker on Chr 7)

The method worked for agouti, even though 1/16th
of the mice were misclassified
34
Acknowledgement
  • These last 3 lectures would not have been
    possible without the very substantial input of
    Melanie Bahlo and Tom Brodnicki of the Walter
    Eliza Hall Institute of Medical Research,
    Melbourne Australia.
  • Tom (together with people from the WEHI mouse
    facility) carried out the cross, and did all the
    phenotyping, while Melanie did all the data
    analysis presented, and contributed a lot to the
    presentation. Overall, responsibility for the
    presentation (especially all the errors!) remains
    mine.

35
General exercise
Go through the last 3 lectures and redo all the
calculations as you can for the case of a
backcross rather than an intercross. You will
find it all simpler, and in every case, closed
form expressions appear, where we needed
iterative methods for the intercross.
Write a Comment
User Comments (0)
About PowerShow.com