Gene mapping: Linkage and association methods - PowerPoint PPT Presentation

About This Presentation
Title:

Gene mapping: Linkage and association methods

Description:

Linkage and association methods. Disease gene mapping is one of the main purposes for ... The task of linkage analysis is to find markers that are linked to the ... – PowerPoint PPT presentation

Number of Views:744
Avg rating:3.0/5.0
Slides: 42
Provided by: cscu9
Category:

less

Transcript and Presenter's Notes

Title: Gene mapping: Linkage and association methods


1
Gene mappingLinkage and association methods
  • Disease gene mapping is one of the main purposes
    for genotyping
  • Two major approaches linkage and association
    analyses

2
Linkage analysis
  • Try to localize genes affecting specific
    phenotypes
  • Search for co-segregation of disease and marker
    alleles

3
Basics of Linkage Analysis
  1. Idea of Linkage Analysis
  2. Types of Linkage Analysis
  3. Parametric Linkage Analysis
  4. Conclusions

4
Basics of Linkage Analysis
  1. Idea of Linkage Analysis
  2. Types of Linkage Analysis
  3. Parametric Linkage Analysis
  4. Conclusions

5
Linkage Analysis
  • One of the two main approaches in gene mapping.
  • Uses pedigree data.

6
Genetic linkage and linkage analysis
  • Two loci are linked if they appear nearby in the
    same chromosome.
  • The task of linkage analysis is to find markers
    that are linked to the hypothetical disease locus
  • Complex diseases in focus ? usually need to
    search for one gene at a time
  • Requires mathematical modelling of meiosis

7
Meiosis and crossover
  • Number of crossover sites is thought to follow
    Poisson distribution.
  • Their locations are generally random and
    independent of each other.

8
The simple idea
Recombination fraction
?
Always 0 ? 0.5
  • Task Find ? that maximises L(? data )
  • Obtain measure for degree of evidence in favour
    of linkage (LOD score)


9
Markers and inheritance
  • Polymorphic loci whose locations are known
  • Most often SNPs or microsatellites
  • Inherited within the chromosomes

10
Markers and information
  • Two individuals share same allele label ? they
    share the allele IBS (identical by state)
  • Two individuals share an allele with same
    (grand)parental origin ? they share an allele IBD
    (identical by descent)
  • IBS sharing can easily be deduced from genotypes.
  • IBD sharing requires more information. One can
    try to deduce IBD sharing based on family
    structure and inheritance.

11
Markers and information
1,2
2,3
The children share allele 1 IBS.
They also share it IBD.
1,2
1,3
12
Markers and information
1,2
1,3
The children share allele 1 IBS.
1,2
1,3
They do not share alleles IBD.
13
Markers and information
1,1
2,3
The children share allele 1 IBS.
1,2
1,3
They either share or do not share it IBD.
14
Building blocks of linkage analysis
Marker maps
Pedigree structures
Genotypes
Phenotypes
15
Building blocks of linkage analysis
  • Information about disease model (in parametric
    analysis)

? ?(aa), probability of a homozygote being
affected
? ?(Aa), probability of a heterozygote being
affected
? ?(AA), probability of a non-carrier being
affected (phenocopy rate)
  • Assumed disease allele frequency
  • Marker allele frequencies
  • Information about environmental variables

16
Basics of Linkage Analysis
  1. Idea of Linkage Analysis
  2. Types of Linkage Analysis
  3. Parametric Linkage Analysis
  4. Conclusions

17
Types of linkage analysis
  • Parametric vs. non-parametric
  • Dichotomous vs. continuous phenotypes
  • Elston-Stewart vs. Lander-Green vs. heuristic
  • Two-point vs. multipoint
  • Genome scan vs. candidate gene

18
Basics of Linkage Analysis
  1. Idea of Linkage Analysis
  2. Types of Linkage Analysis
  3. Parametric Linkage Analysis
  4. Conclusions

19
Maximum likelihood estimation
  • A common approach in statistical estimation
  • Define hypotheses
  • Generate likelihood function
  • Estimate
  • Test hypotheses
  • Draw statistical conclusions

20
Hypotheses in linkage analysis
  • H0
  • ? 0.5
  • the disease locus is not linked to the marker(s)
  • HA
  • ? ? 0.5
  • the disease locus is linked to the marker(s)

21
Likelihood function for a single nuclear family
  • Lj ?gF P(gF) P(yF gF)?gM P(gM)P(yM
    gM)?gOi P(gOi gF, gM) P(yOi gO)

G genotype probabilitiesy phenotype
probabilities
22
Several independent families
  • The likelihood functions of multiple independent
    families are combined
  • L ? Lj or logL ? log Lj

23
Testing of hypotheses
  • Compute values of likelihood function under null
    and alternative hypotheses.
  • Their relationship is expressed by LOD score
    (essentially derived from the likelihood ratio
    test statistic.

24
On significance levels
  • P-value gives a probability that a null
    hypothesis is rejected even though it was true.
  • A LOD-score threshold of 3 corresponds to a
    single-test p-value of approximately 0.0001
  • Often, the significant areas pointed out are
    quite large, from 10-40 cM (millions of basepairs)

25
0.56
0.5
LOD score
0.0
0.0
0.5
0.14
Recombination fraction
LODgt3 taken as evidence of linkage.
26
Basics of Linkage Analysis
  1. Idea of Linkage Analysis
  2. Types of Linkage Analysis
  3. Parametric Linkage Analysis
  4. Conclusions

27
Conclusions
  • Linkage analysis is a pedigree-based approach to
    gene mapping.
  • Parametric vs. nonparametric methods.
  • Hypothesis-driven vs. explorative analysis.
  • Meta-analysis (integration of several studies
    into one big study) becoming increasingly
    popular.

28
Fine mapping and association analysis
  • After successful linkage analysis, what to do?
  • How to refine the linked area where actually
    the disease susceptibility locus is?
  • Outline of the rest of the lecture
  • Allelic association
  • ?2 test
  • LD mapping

29
Allelic association
  • An example A leukaemia study, where a number of
    affected and healthy control persons have been
    contacted for DNA samples
  • A candidate gene has been suggested GSTM1, which
    functions in the metabolism of benzene
  • GSTM1 has two different alleles, 1 and 2, where
  • A person is positive for allele 1 if his
    genotype is 1 1 or 1 2
  • A person is null, if having genotype 2 2
  • The numbers of leukaemic and control individuals
    either positive or null with respect to allele 1
    are compared by ?2-test in order to find out,
    whether there is statistically significant
    difference

30
Allelic assosiation
  • Results observed frequencies
  • Expected frequencies

31
Test statistic
  • The observed are compared to expected
    frequencies. (null hypothesis, H0 carrier status
    and disease occurrence are independent of each
    other )
  • Test statistic
  • where
  • oi is the observed frequency for class i, ei the
    expected frequency for class i
  • k is the number of classes

32
Allelic assosiation
  • Now, ?2 111,39.
  • Degrees of freedom for the test df(r-1)(s-1),
    where r number of rows, s number of columns
  • Here, df (2-1)(2-1) 1
  • The ?2 value is then compared to the null
    distribution of critical ?2-test statistic values
    (within the given df class)

33
?2-distribution critical values for chosen
significance levels
  • df\p 0.10 .05 .025 .01 .005
  • 1 2.71 3.84 5.02 6.63 7.88
  • 2 4.61 5.99 7.38 9.21 10.60
  • 3 6.25 7.81 9.35 11.34 12.84
  • 4 7.78 9.49 11.14 13.28 14.86
  • 5 9.24 11.07 12.83 15.09 16.75
  • 6 10.64 12.59 14.45 16.81 18.55
  • 7 12.02 14.07 16.01 18.48 20.28
  • 8 13.36 15.51 17.53 20.09 21.96
  • 9 14.68 16.92 19.02 21.67 23.59
  • 10 15.99 18.31 20.48 23.21 25.19
  • 11 17.28 19.68 21.92 24.73 26.76

When the observed value of test statistic is
greater than the critical value (for the chosen
significance levels) given in the table, the null
hypothesis can be rejected.
34
Allelic association
  • The value we obtained, ?2 111,39 , exceeds all
    critical values with df1 given in the table. We
    conclude, that H0 can be rejected and thus, there
    is statistically significant difference between
    the affected and healthy with respect to GSTM1
    genotypes.
  • The relative frequencies of null and positive
    genotypes show the same
  • It seems that different GSTM1 genotypes, by
    changing the benzene metabolism, considerably
    affect the probability of getting leukaemia

35
  • Note compared to linkage analysis, which is
    based on the observed inheritance patterns in
    pedigrees, the association analysis studies
    correlation of allele presence and a disease in
    the level of population
  • We find an allele or a haplotype overrepresented
    in affected individuals ?
  • BUT the statistical correlation does not
    implicate a causal relationship !!!! ?
  • Quite often, the associating allele or haplotype
    is not the cause of the disease itself, but is
    merely correlated with the presence of the actual
    susceptibility gene in the same chromosome. It is
    then said to be in linkage disequilibrium with
    the disease gene. ?

36
Original mutation in one chromosome in the
founder population
A
Time
Current generation
C
B
An affected pedigree
37
LD mapping
  • The marker itself is NOT the reason for the
    disease, but its located nearby the disease
    susceptibility gene, and there is correlation
    between the presence of certain marker allele and
    the disease gene allele (LD)
  • The correlation, i.e. LD, is based on founder
    effect the disease allele has been born a long
    time ago on a certain ancestral chromosome, and
    majority of disease alleles existing presently
    predate from that original mutation

38
LD-mapping Utilizing the founder effect
39
Data
Disease locus
Disease status
S2
...
SNP1
...
a ? 2 1 1 a ? 1
2 1
1 2 2 1 1 2 1 2 1 2 1 1
2 2
1 2 2 1 2 1 1
2
2 1 1 1 1 1 1
1
c 2 1 ? ?c 1 1
? ?
1 2 2 1 1 2 1 1 2 2 2 1
1 1
a 1 1 2 1a 1 1
1 2
1 1 2 1 1 2 2 2 2 2 1 1
2 1
2 2 ? 1 1 1 ?
1

40
Many approaches, several programs
  • old-fashioned allele association with some
    simple test (problem multiple testing)
  • TDT modelling of LD process Bayesian, EM
    algorithm, integrated linkage LD

41
Limitations LD is random process
  • The amount of LD is on a continuous but slow
    change, where the natural forces of
  • genetic drift
  • population structure
  • natural selection
  • new mutations
  • founder effect
  • ...affect it even if two pairs of loci are in
    exactly the same distance from each other, their
    amount of LD may vary a lot.
  • ? This limits the accuracy of LD mapping, though
    it is much more accurate in pinpointing the
    location of a disease gene compared to linkage
Write a Comment
User Comments (0)
About PowerShow.com