Parametric linkage analysis and lod scores - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Parametric linkage analysis and lod scores

Description:

of Human Genetics & Biostatistics. UCLA. Contents. the big picture: meiotic mapping techniques ... known as reverse genetics if you start with the phenotype (e. ... – PowerPoint PPT presentation

Number of Views:880
Avg rating:3.0/5.0
Slides: 48
Provided by: SHor1
Category:

less

Transcript and Presenter's Notes

Title: Parametric linkage analysis and lod scores


1
Parametric linkage analysis and lod scores
  • Steve Horvath
  • Depts. of Human Genetics Biostatistics
  • UCLA

2
Contents
  • the big picture meiotic mapping techniques
  • genetic distances and genetic maps
  • map functions
  • LOD (log of the odds) score analysis
  • 2-point analysis
  • testing for linkage between a marker and an
    affectation status locus
  • example rare, fully penetrant, dominant
    Mendelian disease
  • more general disease models
  • parameters in parametric linkage analysis
  • multipoint analysis algorithms for LOD scores
  • significance levels, thresholds and false
    positives

3
The big picturelocating (mapping) disease
genes
4
Meiotic mapping allows to identify DNA segments
that contain disease genes
trait 1
Reverse genetics trait - DNA
trait 3
trait 2
  • Mapping is part of the positional cloning
    strategy.
  • works well for Mendelian diseases,
  • correspond to rare, highly penetrant disease
    alleles

5
Different ways of expressing the goal of genomics
  • goal find stretches of DNA that are risk factors
    for a disease.
  • known as reverse genetics if you start with the
    phenotype (e.g. affectation status)
  • aka. positional cloning (Collins FS)
  • 3 step procedure (adapted)
  • first meiotic mapping (linkage, linkage
    disequilibrium)
  • second, physical mapping (includes sequencing)
  • third, find mutation and verify functional role

6
Different kinds of meiotic mapping methods
  • parametric (better model-based) lod score
    analysis
  • single point
  • multipoint
  • non-parametric (better model-free) linkage
    analysis
  • allele sharing methods
  • key concept identity by descent
  • confusing factoid non-parametric models
    sometimes equivalent to parametric methods (Knapp
    M, 1993?)
  • association studies, linkage disequilibrium
    mapping
  • family-based methods (TDT, FBAT)
  • population-based methods (chi-square test,
    log-linear model)

7
What do meiotic mapping methods have in common?
  • based on meiosis
  • made possible through the violation of Mendels
    law of independent assortment
  • crossing over effects, recombination, ....
  • recombination fraction ?
  • requires genetic markers, and sometimes the
    distances between them (genetic map)
  • usually test hypothesis of no linkage H ?1/2
  • but sometimes test for no linkage disequilibrium

8
What is parametric linkage analysis?
  • A meiotic mapping technique based on
    constructing a disease gene transmission model to
    explain the inheritance of a disease in
    pedigrees.
  • Meaning will become clear....

9
Genetic markers
  • desirable properties of genetic markers
  • locus-specific
  • polymorphic in the studied population
  • many heterozygotes
  • easily genotyped
  • quality measures for markers
  • heterozygosity homozygotes are uninformative!
  • or Polymorphism Information Content
  • probability that the parent is heterozygous x
    probability that the offspring is informative
  •  

10
Important co-dominant genetic markers
  •  
  • microsatellites
  • variations in the number of tandem repeats
  • high level of polymorphism
  • even distribution across the genome
  • 2nd generation map
  • SNPs
  • single nucleotide polymorphisms
  • bi-allelic codominant marker
  • heterozygosity is limited at 50 percent
  • 3rd generation map

11
Genetic distances and genetic maps
Will be very relevant for multipoint linkage
studies.
12
The recombination fraction is a measure of
distance between 2 loci
  • recombination fraction ?the probability that a
    recombinant gamete is transmitted
  • If two loci are on different chromosomes, they
    will segregate independently
  • recombination fraction ?.5.
  • if two loci are right next to each other, they
    will segregate together during meiosis
  • recombination fraction ?0
  • terminology
  • ?
  • ?.5 the loci are far apart (they are not
    linked)

13
Genetic distance (unit is Morgan) expected no.
of cross-over pts per gamete
  • notation let a and b be 2 points in the genome.

  • Nab number of chiasmata between them
  • chiasmatacrossing-over points
  • Definition the genetic (map) distance is
    dE(Nab)/2
  • Why factor of 2? Want no. of chiasmata per
    gamete.
  • Example if on average 49 crossovers per per cell
    in meiosis
  • then total genetic map distance49/224.5
    Morgans
  • 1 Morgan100 centimorgan

14
There is a relationship between crossing over and
recombination fraction
  • Mathers formula ?.5P(Nab0)
  • for small distances d approximately equal to ?,
  • since in this case E(Nab)P(Nab0)
  • P(Nab0) is related to dE(Nab)/2
  • different probability models for Nab lead to
    different relationships between ? and d.
  • each sensible relationships between ? and d is
    called a map functions
  • Great reference Lange K Mathematical and
    Statistical methods in genetic analysis book,
    Springer

15
The mathematical relationship between
recombination fraction and genetic distance is
called mapping function
  • Haldanes mapping function
  • d-.5 ln(1-2?)
  • the distance d is measure in centimorgan
  • perfect if crossovers occurred at random (no
    interference)
  • Kosambis mapping function
  • d.25 ln(12?)/(1-2?)
  • again distance is measured in centimorgan
  • suitable if there is (crossover) interference
  • one cross-over prevents another from taking place
    nearby
  • widely used

16
  • Note for both mapping functions
  • if ?.5, d infinite Morgans (infinite
    distance)
  • if ?.0, d 0 M (0 distance)
  • if ?27, Haldane.3939cM, Kosambi .30
    Morgans30cM

17
Men are genetically shorter than women
  • Total male map length2851cM
  • Total female map length4296cM (excluding the X)
  • Thus over 3000Mb (megabases) autosomal genome
  • 1 male cM averages 1.05 Mb
  • 1 female cM averages 0.88Mb

18
Meiotic versus physical maps
  • meiotic maps measure distances in genetic
    distances, i.e. centimorgan
  • pretty coarse and often inaccurate
  • problem 1 which marker order?
  • problem2 which mapping function?
  • physical maps measure distances in base pairs
  • extremely high resolution allows you to find the
    actual mutation
  • Connection between the 2 maps
  • rule of thumb 1cM equals 1 million base pairs
  • but this thumb is very crooked!!!

19
Computing the lod score
20
The likelihood
  • likelihoodprobability of data given the
    parameters
  • likelihoods are useful for estimation and for
    testing
  • example phase-known fully informative case
  • observed data Rno. of recombinations, NRno of
    non-recomb.
  • parameter the recombination fraction
    ?Pr(recombination)
  • likelihood is proportional to ?R(1- ?)NR
  • maximum likelihood likelihood estimate
  • use the log of the likelihood for mathematical
    convenience

21
Advantages of max. likelihood estimation
  • advantages
  • asymptotically most efficient,
  • high precision
  • asymptotically consistent
  • it will converge closer and closer to the true
    value
  • asymptotically unbiased
  • corresponding likelihood ratio test enjoys
    similar optimality criteria

22
How to compute lod scores?
Lod scores are computed for each pedigree (i)
as For a given value of ?, pedigree
-specific lod scores are summed across the F fam
ilies to yield an overall lod score
  •  


23
Example lod score calculation
  •  

PEDIGREE DRAWING Message disease status is n
ot required....
24
2 point parametric linkage analysis
25
2 point parametric linkage analysis
  • Setting
  • genotype of 1 marker locus is known for family
    members
  • the genotypes of the other locus (disease
    susceptibility locus) are unknown
  • but the disease locus phenotype (affectation
    status) is known
  • GOAL
  • test whether the disease locus and marker are
    linked
  • Q Why is it important?
  • A If they are linked, the disease locus must be
    close to the marker, i.e. we have localized the
    disease gene.
  •  

26
Test for linkage is carried out in 3 steps
  •  

Step 1 use the disease status to infer the
underlying disease locus genotypes
Step 2 count the number of recombinations and n
on-recombinations for the different possible
paternal phases Step 3 compute the lod score a
nd check whether it is bigger than 3.0
27
DATA for a single pedigree
  •  

rare, fully penetrant, dominant disease
Grandpa unaffected, 22, Grandma affected 11
father affected
28
Step 1-3
  •  
  • STEP 1
  • we assume that the disease locus carries 2
    alleles
  • since the disease genotype is fully penetrant,
    the genotypes of the unaffecteds must equal dd
  • the genotype of the grandma is Dd or DD. Since
    the disase is rare, it is probably Dd.
  • thus we get the same pedigree as described
    earlier
  • STEPs 2-3 were already carried out earlier.

29
Parameters in parametric linkage analysis
30
Glitch for non-Mendelian diseases
  •  
  • the relation between disease locus genotypes and
    affectation status is in general very complex and
    can no longer be solved by inspection
  • need powerful statistical and computation
    methods
  • start with likelihood (easy to write down)
  • compute the likelihood (hard)

31
Most general form of the likelihood of pedigree
data
  •  

  • summation of j is over all founders (specify
    allele frequencies)
  • product (k,l,m) is taken over all
    parent-offspring triples.
  • transmission probabilities depend on ?
  • for multiple markers (multipoint analysis) need
    to specify
  • a mapping function, e.g., Kosambi

32
Marker parameters
  • notation marker alleles denoted here by 1, 2,
    .
  • relation between marker genotype and phenotype
  • usually known (example ABO blood group)
  • SNPs and microsatellites are codominantrelation
    is trivial
  • allele frequencies p1,p2, .
  • if parents are unavailable, the results may
    depend critically on getting them right. Also
    homozygosity mapping.
  • vary between different populations
  • but can be estimated from the pedigree data
  • genetic marker map for multiple markers
  • marker order
  • genetic distance
  • increasingly accurate because of DNA sequencing

33
Disease locus parameters
  • notation often 2 alleles D (bad) and d (normal)
  • allele frequencies pD and pd
  • pentrancesP(affected/genotype)
  • fDDP(affected/genotype DD)
  • fDdP(affected/genotype Dd)
  • fddP(affected/genotype dd)
  • liability classes
  • fancy terminology for letting penetrances between
    individuals
  • example different penetrances for men and
    women,
  • or age dependence young versus old

34
The biology is modeled through penetrance values
  • fully penetrant, dominant disease, no
    phenocopies
  • fDDfDd1, fdd0
  • fully penetrant, recessive disease, no
    phenocopies
  • fDD1, fDdfdd0
  • no effect
  • fDDfDdfdd
  • incomplete penetrance fDD
  • definition phenocopies are affecteds without
    disease genes
  • phenocopies are present if fdd0
  • for the experts imprinting is modeled by using 4
    penetrances and keeping track of maternally and
    paternally transmitted alleles

35
2-point versus multipoint linkage analysis
36
Two point mapping
  • computerized lod score analysis is best way to
    analyze complex pedigrees for linkage with
    mendelian traits
  • use computer software, e.g., Mendel
  • the result of a linkage analysis is a table of
    lod scores at various recombination fractions
  • the result can be plotted to give curves,
  • region with lod3 are linked and those with
    lod
  • the curve will peak at the most likely
    recombination fraction

37
Output of a 2 point linkage analysis
significant
excluded
  • Equivalently, consider the table
  • ? 0.01, 0.10, 0.20, 0.30, 0.35, 0.40, 0.45,
    0.50
  • lod -5.0, -2.0, 1.0, 3.3, 4.0, 3.0,
    1.0, 0.0

38
Multipoint mapping is more efficient than two
point mapping
  • idea analyze data for more than 2 loci
    simultaneously
  • helps overcome limited informativeness of
    markers
  • especially relevant for SNPs
  • peak heights depend crucially on the precise
    distances between markers and the mapping
    function-problematic
  • highest peak marks the most likely location
  • powerful method for scanning the genome in 20-Mb
    segments

39
Standard lod score analysis is not without
problems
  • genotyping errors misdiagnosis- loss of power
  • lead to spurious recombinants - inflates the
    length of the genetic map
  • multi-locus maps can detect such errors by
    checking for double recombinants
  • locus heterogeneity is always a pitfall
  • mutations in unlinked loci may produce the same
    clinical phenotype
  • use Genehunter of Homog to test for homogeneity
  • computational difficulties limit the pedigrees
    that can be analyzed (na
    not really....)

40
Comparing different multipoint linkage analysis
algorithms
41
Limitations of the different methods
 
Slide from webpage http//watson.hgen.pitt.edu/do
cs/simwalk2.html
 
42
Computation times of the algorithms.
General-Pedigree Linkage Analysis Packages
 
 
43
Critical values for linkage tests
44
Distinction between pointwise (nominal) and
genome-wide significance
  • pointwise p-valueprobability of exceeding
    observed value at a given point, under H?1/2
  • genome-wide p-valueprob that the observed value
    will be exceeded anywhere in the genome
  • reality check about p-values
  • if the p-value finding is significant
  • the smaller the p-value, the higher the
    statistical significance
  • genome-wide p-valuepointwise p value

45
Lod score thresholds should ensure a .05
genomwide false positive rate
  • genomwide false positive rate alphachance of a
    false positive result occurring anywhere during a
    whole genome scan
  • for single point, classically want lod 3.0
  • multipoint threshold for a Mendelian disease 3.3

  • Lander Schork 1994
  • multipoint threshold for a complex disease
  • 3.3-4.0 (depends on the study design, Lander and
    Kruglyak 1995)
  • pointwise p value for significant linkage
    510(-5)

46
How to relate the pointwise (?P) to the
genome-wide false positive rate (?G).
  • conservative Bonferroni correction
  • ?P ?G/(no of potential pointwise tests)
  • Example no. of potential pointwise testsno of
    potential SNPs1 million, ?G.05 ?P
    510(-8)
  • ignores dependencies (linkage) between markers
  • Lander and Kruglyak 1995 found the asymptotic
    relation
  • ?G(T) C9.2?GT?P(T)
  • Tthreshold lod score
  • Cnumber of chromosomes23
  • ?crossover rate, depends on relationship being
    studied, e.g., sibs
  • Glength of the genome in Morgans33
  • for sibpairs use 3.6 for IBD testing and 4.0 for
    IBS testing

47
Linkage finding are controversial because of high
false positive rate.
  • The smart money knows
  • want to see a lod score 4 (or even 5)
  • meiotic mapping techniques fail at detecting
    complex disease genes
  • if the disease is complex, it is a false
    positive.
  • if the effect is real, 2 point linkage analysis
    performs pretty well
  • How to avoid arguments over finding?
  • replicate the finding in a different sample
  • find the mutation
Write a Comment
User Comments (0)
About PowerShow.com