Genetic Variations presentation

About This Presentation

Transcript and Presenter's Notes

Title: Genetic Variations

1
Genetic Variations

Lakshmi K Matukumalli

2
Human Mouse Comparison
3
Structural Variations
Ploidy (Downs Syndrome)
Inversions Translocations Segmental duplications
4
Molecular Variations
Single nucleotide polymorphisms Short
Indels Simple sequence repeats Copy number
variants Loss of heterozygosity
Copy number variants
Microsatellite (2-9 bp core repeat)
Minisatellite (10-60 bp core repeat)
5
Type of polymorphisms
Insertion/deletion polymorphism (indel)
Nonsynonymous polymorphism
Synonymous polymorphism
Single-nucleotide Polymorphism (SNP)
TAACGGTA GG
GAG Asp GUG Val
GAU Asp GAC Asp
TC
5 Untranslated region
3 Untranslated region
ATG
End
5 Flanking region
3 Flanking region
Promoter
Coding
Intron
Coding
Transcript
6
Choosing the Technology
7
Extent of Variation (Human Genome)

gt 5 million SNPs (dbSNP)
Recent genome analysis of diploid individual
showed 4.1 million DNA variants, encompassing
12.3 Mb.
- 3,213,401 single nucleotide polymorphisms
(SNPs),
- 53,823 block substitutions (2206 bp),
- 292,102 heterozygous insertion/deletion events
(indels)(1571 bp),
- 559,473 homozygous indels (182,711 bp),
- 90 inversions,
- Plus segmental duplications and copy number
variations.
Non-SNP DNA variation accounts for 22 of all
events, however they involve 74 of all variant
bases. This suggests an important role for
non-SNP genetic alterations in defining the
diploid genome structure.
Moreover, 44 of genes were heterozygous for one
or more variants.

8
Importance of SNPs and other variants

Study Genetic variation in diverse populations in
any species to
understand evolutionary origins and history,
estimate population size,
breeding structure, or life-history characters
Migration within and between sub-populations
Understand evolutionary basis for maintenance of
genetic variation and speciation.
Applications
Genetic association of traits
Effects on gene expression (e.g., synonymous vs
nonsynonymous / TF binding sites)
DNA finger printing or sample tracking

9
Fine Mapping with SNP Markers

Advantages of SNPs as genetic markers
as compared to microsatellites.
High abundance
Distribution throughout the genome
Ease of genotyping
Improved accuracy
Availability of high throughput
multiplex genotyping platforms

10
SNP Discovery - Sanger sequencing (EST)
11
SNP Discovery - Diploids (heterozygous loci)
12
SNP-PHAGE (Software package)
SNP Pipeline for Haplotype Analysis and GEnbank
(dbSNP) submissions.

Important steps are
Primer development
Primer testing
Sequencing
Base calling,
Sequence assembly
Polymorphisms analysis
Haplotype analysis
GenBank submission of confirmed polymorphisms

13
Application of Machine Learning in SNP Discovery
Objective Reduce human intervention by using
expert annotated dataset for training a Machine
learning (ML) program and use it to differentiate
good/bad polymorphisms

Steps
Parameter Selection
Parameter Optimization
Testing
Implementation.
Results
Achieved substantial improvement in the
accuracies as compared to using only polybayes or
polyphred.

14
SNP Discovery using next generation sequencers

Short sequences 23-35 bp long at a fraction of
cost.
Reduced Representation Sequencing
Digest genomic DNA with restriction enzyme
Screen based on in silico digestion
Size select based on
Repetitive DNA
Number of fragments
Sequencing platform
Allows targeted deep sequencing of pools of DNA
Randomly distributed

Cost / Mb ABI 880 454 160 Solexa 5
15
SNP Discovery - Bioinformatics

Strategies to maximize performance
High quality score stringencies
For each read
At base for putative SNP
Require single map location of a 23-bp tag (and
4-bp restriction site)
Allow only one single base pair difference match
for a putative SNP
Reduces repeat content
Reduces gene family/paralog false positives
Require 2 copies of each allele assembly can
count as 1

16
Predicted Observed Minor Allele Frequency
17
Population Genetics

Population genetics is the study of the
allele frequency distribution and change under
the influence of the four evolutionary forces
natural selection, genetic drift, mutation and
gene flow. It attempts to explain phenomena as
adaptation and speciation.
(www.wikipedia.org)

Variation
18
Population Genetics
Neutral theory Rate at which new genetic
variants are formed is equal to the loss of
genetic diversity due to drift.
Genotypes CT, CC, TT Alleles C and T
C/T C/C T/T
Genotyping of a population of 1000 individuals
for a SNP resulted in 100, 500 and 400 genotypes
for CC, CT and TT respectively Genotype
Frequencies CC (0.1), CT (0.5) and
TT(0.4) Allele Frequencies C (p)
(200500)/2000 0.35 (minor allele -- MAF)
T (q) (500800)/2000 0.65 (major
allele) Hardy-Weinberg Equilibrium Expected
genotype frequencies are p2, 2pq and q2 (122,
422 and 455) HWE Deviations Drift, Selection,
Admixture etc.,
19
Fst
Useful to partition genetic variation into
components within populations between
populations among populations Sewall Wrights
Fixation index (Fst is a useful index of genetic
differentiation and comparison of overall effect
of population substructure. Measures reduction
in heterozygosity (H) expected with non-random
mating at any one level of population hierarchy
relative to another more inclusive hierarchical
level. Fst (HTotal - Hsubpop)/HTotal Fst
ranges between minimum of 0 and maximum of 1
0 ? no genetic differentiation ltlt 0.5 ?
little genetic differentiation gtgt 0.5 ?
moderate to great genetic differentiation 1.0
? populations fixed for different alleles
20
Genotype Phenotype Association (Significance of
Haplotypes)
21
Haplotype inference

The solution to the haplotype phasing problem is
not straightforward due to resolution ambiguity

Computational and statistical algorithms for
addressing ambiguity in Haplotype Phasing
1) parsimony
2) phylogeny
3) maximum-likelihood
4) Bayesian inference

22
Linkage disequilibrium (LD)

Non-random association of alleles at two or more
loci, not necessary in the same chromosome.
LD is generally caused by interactions between
genes genetic linkage and the rate of
recombination random drift or non-random mating
and population structure.

Let A and B be two loci segregating two alleles
each a1 and a2 with frequencies p1 and p2 in
A, and b1 and b2 with frequencies q1 and q2 in B.
B1 B2 Total A1 p11 p1 q1 D p12 p1
q2 - D p1 A2 p21 p2 q1 - D p22 p2 q2
D p2 Total q1 q2 1
A
B
23
Linkage disequilibrium (cont)

D p11 - p1q1
D depends on the allele frequencies at A and B.
D a scaled version of D

24
Linkage disequilibrium (cont)

Squared correlation coefficient

D2
r2
p1p2q1q2
The measure preferred by population
geneticists Is independent of of allele
frequencies Ranges between 0 and 1 r2 1
implies the markers provide exactly the same
information r2 0 when they are in perfect
equilibrium
25
(No Transcript)
26
(No Transcript)
27
2.4 Linkage disequilibrium (cont)

Visualizing LD

28
2.4 Linkage disequilibrium (cont)

Visualizing LD

29
(No Transcript)
30
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Genetic Variations PowerPoint PPT Presentation