Title: SNPs and Variation
1SNPs and Variation
2- Most variation at the DNA level comes in the form
of small insertions and deletions and single
nucleotide polymorphisms (SNP). - A substitutional variation must occur in at least
1 of the population to be considered as a SNP. - A haplotype is a cluster of SNPs.
3High-throughput Sequencing
4SNP
homozygous
A
C
A/C
A/C
A substitutional variation must occur in at least
1 of the population to be considered as a SNP.
heterozygous
5Haplotype a cluster of SNPs
6The Nature of Single Nucleotide Polymorphisms
- Classification of SNPs
- nucleotide change by way of transition (purine
to purine, or pyrimidine to pyrimidine) and
transversion (purine to pyrimidine, or pyrimidine
to purine). - 1. Noncoding SNP
- 2. Coding SNP
7Noncoding SNP
Flanking region
Flanking region
5'
3'
GT
AG
GT
AG
GC box
Initiation codon
Stop codon
Poly(A)-addition site
CAAT box
TSS
AATAA
GC box
TATA box
8Coding SNP
Flanking region
Flanking region
5'
3'
GT
AG
GT
AG
GC box
Initiation codon
Stop codon
Poly(A)-addition site
CAAT box
TSS
AATAA
GC box
TATA box
9Coding SNP
103.1 (Part 1) Human promoter SNPs that affect
gene expression
113.1 (Part 2) Human promoter SNPs that affect
gene expression
12SNP Database
13(No Transcript)
14(No Transcript)
15SNP database of disease related loci.
http//pga.mbt.washington.edu/
16Distribution of SNPs
- Population genetics study the distribution of
genetic variants. - Quantitative genetics study the relationship of
variants and phenotypes. - Quantitative trait loci (QTL) a region of DNA
that is associated with a particular phenotypic
trait - these QTLs are often found on different
chromosomes. Knowing the number of QTLs that
explains variation in the phenotypic trait tells
us about the genetic architecture of a trait.
17- Neutral theory most SNPs are maintained in
natural populations as a result of a balance
between mutation and genetic drift. - The dynamics of genotypic variation are described
by the balance of mutation, drift, migration, and
selection. - Neutral theory supplies the null hypothesis when
the pattern of variants are assessed.
18- Three key concepts for characterizing SNP
variation - 1. Allel frequency distribution
- 2. Linkage disequilibrium
- 3. Population stratification
193.2 (Part 1) Nucleotide diversity in natural
populations
874 SNPs from 75 candidate human hypertension
loci.
203.2 (Part 2) Nucleotide diversity in natural
populations
LD decays with time regarding recombination rate
r.
21Linkage disequilibrium and Haplotype Maps
- 1. Non-random assortment of alleles.
- 2. Typically occurs over kbs.
- 3. Measures based 2 SNPs system A/a B/b.
The probability of each haplotype (for two
allele)
chromosome recombination
- PAB ? PAPB
- PAb ? PAPb PA(1-PB)
- PaB ? PaPB (1-PA) PB
- Pab ? PaPb (1-PA) (1-PB)
A
b
There are 4 possible haplotypes for SNP sites A/a
and B/b.
22D Coefficient
- We can measure the non-randomness of two loci by
means of a deviation, D, defined as follows - D PAB PAPB or PABPab PAbPaB
- PAB PAPB D
- PAb PA(1-PB) - D
- PaB (1-PA) PB - D
- Pab (1-PA) (1-PB) D
- These two SNPs are linkage equilibrium iff D 0.
23Let D PAB PAPB , and D D/Dmax, where Dmax
stands for the absolute maximal possible value of
D.
D
D
0
-PAPB
PAPB
- D is independent of allele frequencies, 1 gt D
gt -1. - D allow us to compare LD at different
combination of loci.
24- D is constrained between -1 and 1.
- D 1 (perfect positive LD between SNP alleles)
- D 0 (linkage equilibrium between SNP alleles)
- D -1 (perfect negative LD between SNP alleles)
- D 0.87 (strong positive LD between SNP
alleles) - D 0.12 (weak positive LD between SNP alleles)
- Other measures of D coefficient
- r2 or ?2
- Chi-square Test.
- Fisher exact test (for small samples).
- P value.
25- A set of closely linked SNPs located on one
chromosome.
SNP 1
SNP 2
SNP 3
GATATTCGTACGGA-T GATGTTCGTACTGAAT GATATTCGTACGGA-T
GATATTCGTACGGAAT GATGTTCGTACTGAAT GATGTTCGTACTGAA
T
Haplotypes
AG- 2/6 GTA 3/6 AGA 1/6
DNASequences
263.3 Distribution of linkage disequilibrium
across the human lipoprotein lipase (LPL) gene
The significance is inferred according to Fisher
exact test. Blue box significance,
p lt 0.001 Yellow box nonsignficance. White
unknown.
66 SNP sites in this genes
27Haplotype Blocks in Human Genome
- The human genome has been shown to contain
regions of high LD interspersed by regions of low
LD. - The recombination occurs frequently in low LD
regions. - The high LD regions can form haplotype blocks.
- The International HapMap Project aims to build
the haplotype map across human genome.
Recombination hot spots(Low LD regions, D lt 0)
Haplotype blocks(High LD regions, D gt 0)
Chromosome
283.4 (Part 1) Distribution of LD in the human
genome
293.4 (Part 2) Distribution of LD in the human
genome
30(No Transcript)
313.5 Tagging SNPs and the haplotype clusters
Tagging SNPs capture most of the nucleotide
variation included in haplotypes.
323.6 Haplotype structure in the human lipoprotein
lipase (LPL) gene
(82 unrelated humans)
Only 8 of all haplotypes are unique regarding
racial groups
333.7 Human diversity and population structure
Genotyping of 377 microsatellites in 1056
individuals.
34Applications of SNP Technology
- Population Genetics
- Recombination Mapping
- QTL Mapping
- Linkage Disequilibrium Mapping
35SNP Discovery
- Resequencing
- SNP identification
- SNP verification
- Sequence-free Polymorphism Detection
36Resequencing
- SNP are discovered by comparing sequences derived
from different chromosomes. - The probability of detecting a SNP in the
condition of two alleles is given by - P1-(1-p)2n
- where p is the frequency of rare allele and n
is the comparison number of individuals. -
37SNP identification
- Two major methods can be used to identify SNPs.
- 1. Comparison of whole genome sequence with
cDNA (EST) sequences which from different
strands. - 2. Comparison of genomic sequences derived
from different individuals. - There is no way to determine whether a difference
is the result of a sequencing error in original
sequences. -
38(No Transcript)
39SNP verification
- PCR amplification of the specific fragment
including SNP in a sample of at least 10
individuals and resequencing. - Sequencing by hybridization use sequencing chip.
403.15 Sequencing by hybridization
One chip per individual
25-mer probe, N represent the variation site.
41Sequence-free Polymorphism Detection
- Denaturing High Performance Liquid Chromatography
(DHPLC) - Denaturing Gradient Gel Electrophoresis (DGGE)
and Single-Strand Conformation Polymorphism
(SSCP) - Targeting Induced Local Lesion IN Genomes
(TILLING)
42DHPLC
43DGGE and SSCP analysis
acrylamide gel
W and C are complementary strands
genotypes
Single-strand secondary structures
urea gradient
44GC-clamp
45AT, CG, AG, CT Four types of CG clamps
46(No Transcript)
47Tilling
EthylMethaneSulfonate
Pools of rows and columns
483.17 (Part 2) Tilling
49SNP Genotyping
- Low-Technology Methods
- Minisequencing Methods
- Homogeneous Fluorogenic Dye-Based Methods
- Haplotype Phasing Methods
50Low-Technology Methods
- All current SNP genotyping methods except Invader
assay depend on specific amplification of the DNA
sequence surrounding the site to be genotyped,
which is generally achieved using the PCR. - 1. PCR-RFLP method
- 2. dCAPS method
- 3. ASO method
513.18 (Part 1) Bulked segregant mapping of
mutations using Snip-SNPs
PCR-RFLP method
1/5 1/2 of all SNPs contained in palindromic
segments
523.18 (Part 2) Bulked segregant mapping of
mutations using Snip-SNPs
533.19 dCAPS
dCAPS method
54ASO method
55Minisequencing Methods
- A philosophically different approach to
genotyping is to actually resequence each allele
only one or a few bases in a sample of
individuals. - 1. Single-base extension
- 2. Pyrosequencing
-
563.20 (Part 1) Single-base extension methods
Single-base extension
573.20 (Part 2) Single-base extension methods
583.21 Pyrosequencing
Pyrosequencing
59Homogeneous Fluorogenic Dye-Based Methods
- Homogeneous assays are those that are carried out
in a single reaction in solution. - 1. TaqMan
- 2. Molecular beacons
- 3. Dye-labeled oligonucleotide ligation
- 4. The Invader assay
-
603.22 (Part 1) Fluorogenic dye-based genotyping
methods
TaqMan
613.22 (Part 2) Fluorogenic dye-based genotyping
methods
Molecular beacons
623.22 (Part 3) Fluorogenic dye-based genotyping
methods
Dye-labeled oligonucleotide ligation
633.23 (Part 1) The Invader assay
The Invader assay
643.23 (Part 2) The Invader assay
653.23 (Part 3) The Invader assay
66Haplotype Phasing Methods
- Consider two adjacent SNPs A/T and A/C, the
haplotypes could be AA and TC or TA and AC. - SNPs --------A/T--------- ----------A/C--------
-- - --------A-------------------------A---------
---- - --------T-------------------------C---------
---- - or
- --------T-------------------------A---------
---- - --------A-------------------------C---------
----