Title: Statistical Issues in Human Genetics
1Statistical Issuesin Human Genetics
- Jonathan L. Haines Ph.D.
- Center for Human Genetics Research
- Vanderbilt University Medical Center
2COMMON COMPLEX DISEASE
Environment
Genes
3COMMON COMPLEX DISEASE
Environment
Genes
4What Can The Genes Tell Us?
- Give us a better understanding of the underlying
biology of the trait in question - Serve as direct targets for better treatments
- Pharmacogenetics
- Interventions
- Give us better predictions of who might develop
disease - Give us better predictions of the course of the
disease - Lead to knowledge that can help find a cure or
prevention
5- Watson and Crick started it all in 1953 with the
description of DNA - 53 Year Anniversary of the paper will be in
April. - Both Won Nobel Prize
6(No Transcript)
7The DNA Between Individuals is Identical. All
differences are in the 0.1 of DNA that varies.
A C C G T C C A G G
A C C G T G C A G G
Its hard to believe sometimes!
8HUMAN CHROMOSOMES
9Single-Nucleotide Polymorphisms (SNPs)One of the
most common types of variation
GATCCTGTAGCT
1st Chromosome
GATCCTCTAGCT
2nd Chromosome
G/C
Affected
Normal
GATCCTGTAGCT
GATCCTCTAGCT
GATCCTGTAGCT
GATCCTCTAGCT
Extremely frequent across the genome (1/400 bp)
- high resolution
Easy to genotype - high-throughput techniques
10What are We Looking For?
Earth
City
Street
Address
Human Genome
Chromosome
Gene (DNA)
Band
11640 cubic yards
3,000 MB
1/100 cubic inch
1 x 10-6 MB
It really is like finding a needle in a
haystack! (and a very BIG haystack, at that)
12The Genome Sequence is not THE answer!
13Disease Gene Discovery In Complex Disease
1. Define Phenotype a. Consistency b.
Accuracy
2. Define the Genetic Component a. Twin Studies
b. Adoption Studies c. Family Studies
d. Heritability e. Segregation Analysis
3. Define Experimental Design
4. Ascertain Families a. Case-Control b.
Singleton c. Sib Pairs d. Affected
Relative Pairs
5. Collect Data a. Family Histories b.
Clinical Results c. Risk Factors d. DNA
Samples
6. Perform Genotype Generation a. Genomic
Screen b. Candidate Gene
7. Analyze data
c. Association studies
case-control, family-based
b. Model-independent sib-pair, relative pair
a. Model-dependent Lod score
8. Identify, Test, and Localize Regions of
Interest
9. Bioinformatics and Gene Identification
10. Identify Susceptibility Variation(s)
11. Define Interactions a. Gene-Gene b.
Gene-Environment
14CLASSES OF HUMANGENETIC DISEASE
- Diseases of Simple Genetic Architecture
- Can tell how trait is passed in a family follows
a recognizable pattern - One gene per family
- Often called Mendelian disease
- Usually quite rare in population
- Causative gene
- Diseases of Complex Genetic Architecture
- No clear pattern of inheritance
- Moderate to strong evidence of being inherited
- Common in population cancer, heart disease,
dementia etc. - Involves many genes or genes and environment
- Susceptibility genes
15CLASSES OF HUMANGENETIC DISEASE
- Diseases of Simple Genetic Architecture
- Can tell how trait is passed in a family follows
a recognizable pattern - One gene per family
- Often called Mendelian disease
- Usually quite rare in population
- Causative gene
- Diseases of Complex Genetic Architecture
- No clear pattern of inheritance
- Moderate to strong evidence of being inherited
- Common in population cancer, heart disease,
dementia etc. - Involves many genes or genes and environment
- Susceptibility genes
16Modes of Inheritance
- Autosomal Dominant
- Huntington disease
- Autosomal Recessive
- Cystic fibrosis
- X-linked
- Duchenne muscular dystrophy
- Mitochondrial
- Leber Optic atrophy
- Additive
- HLA-DR in multiple sclerosis
- Combinations of the above
- RP (39 loci), Nonsyndromic deafness
17Linkage Analysis
- Traces the segregation of the trait through a
family - Traces the segregation of the chromosomes through
a family - Statistically measures the correlation of the
segregation of the trait with the segregation of
the chromosome
18A SAMPLE PEDIGREE
The RED chromosome is key
19Measures of LinkageParametric Vs Non-Parametric
- Two major approaches toward linkage analysis
- Parametric Defines a genetic model of the
action of the trait locus (loci). This allows
more complete use of the available data
(inheritance patterns and phenotype information). - The historical approach towards linkage analysis.
Development driven by need to map simple
Mendelian diseases - Quite powerful when model is correctly defined
- Non-Parametric Uses either a partial genetic
model or no genetic model. Relies on estimates
of allele/ haplotype/region sharing across
relatives. Makes far fewer assumptions about the
action of the underlying trait locus(loci).
20Linkage Analysis
- Families
- Affected sibpairs
- Affected relative pairs
- Extended families
- Traits
- Qualitative (affected or not)
- Quantitative (ordinal, continuous)
- There are numerous different methods that can be
applied - These methods differ dramatically depending on
the types of families and traits
21Recombination Natures way of making new
combinations of genetic variants
A. B.
C. D.
A. A diploid cell. B. DNA replication and
pairing of homologous chromosomes to form
bivalent. C. Chiasma are formed between the
chromatids of homologous chromosomes D.
Recombination is complete by the end of prophase
I.
22Linkage Analysis in Humans
- Measure the rate of recombination between two or
more loci on a chromosome - Can be done with any loci, but primary
application is to find the location of a trait
variant by measuring linkage to known marker
variants.
23LOD Score Analysis
The likelihood ratio as defined by Morton
(1955) L(pedigree? x)
L(pedigree ? 0.50) where ?
represents the recombination fraction and where 0
? x ? 0.49. When all meioses are scorable,
the LR is constructed as L.R.
z(?) is the lod score at a particular value of
the recombination fraction z(?) is the maximum
lod score, which occurs at the MLE of the
recombination fraction
?
The LOD score (z) is the log10 (L.R.)
24CLASSES OF HUMANGENETIC DISEASE
- Diseases of Simple Genetic Architecture
- Can tell how trait is passed in a family follows
a recognizable pattern - One gene per family
- Often called Mendelian disease
- Usually quite rare in population
- Causative gene
- Diseases of Complex Genetic Architecture
- No clear pattern of inheritance
- Moderate to strong evidence of being inherited
- Common in population cancer, heart disease,
dementia etc. - Involves many genes or genes and environment
- Susceptibility genes
25Study Designs
Linkage Analysis
Large Families
Small Families
Association Studies
Family-Based
Case-Control
26Linkage vs. Association
Linkage
Association
Shared within Families
Shared across Families
27TESTING CANDIDATE GENES
Disease
Normal
5/20
5/20
Gene is not important
28TESTING CANDIDATE GENES
Disease
Normal
10/20
5/20
Gene may be important
29Two Basic Study Designsfor Association Analysis
- Case-Control
- Advantages
- Power
- Ascertainment
- Disadvantages
- Sensitivity to assumptions
- Matching
- Family-Based
- Parent-child Trio
- Discordant sibpairs
- Advantages
- Use existing samples
- Robustness to assumptions
- Disadvantages
- Ascertainment
- Power
30METHODS FOR FAMILY-BASED ASSOCIATION STUDIES
- Sibship
- SDT
- WSDT
- FBAT
- Pedigree
- Transmit
- PDT
- FBAT
- Parent-Child
- AFBAC
- TDT
- HHRR
- QTDT
- Sibpair
- S-TDT
- DAT
31TRANSMISSION DISEQUILIBRIUM TEST (TDT)
- Examines transmission of alleles to affected
individuals - Requires
- Linkage (transmission through meioses) and
- Association (specific alleles)
- Test of linkage if association assumed
- Test of association if linkage assumed
- Test of linkage AND association if neither
assumed - Uses the non-transmitted alleles, effectively, as
the control group. Can make pseudocontrol by
creating genotype of the two non-transmitted
alleles - Requires phenotype only for the child
32TDT calculation
Transmitted
2
1
12
12
Non-Transmitted
11
With 5 per cell, this follows a ?2 distribution
with 1 df
33TDT
12
12
Transmitted
1 2 Not
transmitted 1 0 0
2 2 0
11
34TDT
22
12
Transmitted
1 2 Not
transmitted 1 0 0
2 1 1
12
35TDT
22
11
Transmitted
1 2 Not
transmitted 1 1 0
2 0 1
12
36TDT Example
Transmitted
Transmitted
2
1
2
1
Non-Transmitted
Non-Transmitted
(42-25)2
4.31
TDT
(4225)
37Two Basic Study Designsfor Association Analysis
- Case-Control
- Advantages
- Power
- Ascertainment
- Disadvantages
- Sensitivity to assumptions
- Matching
- Family-Based
- Parent-child Trio
- Discordant sibpairs
- Advantages
- Use existing samples
- Robustness to assumptions
- Disadvantages
- Ascertainment
- Power
38Analysis of Case-Control Data
- Standard epidemiological approaches can be used
- Qualitative trait
- Logistic regression
- Quantitative trait
- Linear regression
- The usual concerns about matching but must also
worry about false-positives from population
substructure
39Incorporating Geneticsinto Your Studies
- Obtain appropriate IRB approval
- DNA studies are quite common
- Template language exists for IRB approval and
consent forms - Genetic Studies Ascertainment Core (GSAC) can
help - Kelly Taylor ktaylor_at_chgr.mc.vanderbilt.edu
- Collect family history information
- Obtain DNA sample
- Venipuncture
- Buccal wash/swab
- Finger stick
- Extract/Store DNA
- DNA Resources Core can help
- Cara Sutcliffe cara_at_chgr.mc.vanderbilt.edu
- http//chgr.mc.vanderbilt.edu/
40What Can The Genes Tell Us?
- Give us a better understanding of the underlying
biology of the trait in question - Serve as direct targets for better treatments
- Pharmacogenetics
- Interventions
- Give us better predictions of who might develop
disease - Give us better predictions of the course of the
disease - Lead to knowledge that can help find a cure or
prevention