Single Nucleotide Polymorphism - PowerPoint PPT Presentation

About This Presentation
Title:

Single Nucleotide Polymorphism

Description:

Single Nucleotide Polymorphism. Anshu Bhardwaj. Research Fellow ... (http://zebrafish.doc.ic.ac.uk/Sean/) SNP Finder: For analyzing user-submitted. trace data ... – PowerPoint PPT presentation

Number of Views:2300
Avg rating:3.0/5.0
Slides: 44
Provided by: Ans671
Category:

less

Transcript and Presenter's Notes

Title: Single Nucleotide Polymorphism


1
Single Nucleotide Polymorphism
Anshu Bhardwaj Research Fellow Centre for
Cellular Molecular Biology Hyderabad 8th
November, 2003
2
Single Nucleotide Polymorphism
Single base-pair differences occurring in a
population with a frequency of gt1
...C C A T T G A C...
G G T A A C T G...
...C C G T T G A C...
G G C A A C T G...
3
SNPs can be found in..
  • NON-CODING REGION
  • 5 and 3 UTRs
  • Introns
  • splice sites
  • CODING REGION

Non-synonymous Amino acid substitution
Synonymous Silent
4
Single base-pair differences occurring in a
population with a frequency of gt1
5
GENOTYPIC FREQUENCY Relative distribution of
genotypes in a population for a particular locus
6
ALLELIC FREQUENCY The relative abundance of an
allele of a particular gene with reference to its
other alleles
Let pf(M) and qf(N). Thus, pf(MM) ½ f(MN)
and qf(NN) ½ f(MN).
7
ALLELIC FREQUENCY The relative abundance of an
allele of a particular gene with reference to its
other alleles
Percent
p
q
MM
MN
NN
Location
83.5
15.6
0.9
0.92
0.08
Greenland
Let pf(M) and qf(N). Thus, pf(MM) ½ f(MN)
and qf(NN) ½ f(MN).
GENOTYPIC FREQUENCY Relative distribution of
genotypes in a population for a particular locus
8
WHY SNPs ? ?
  • SNPs are distributed non-randomly throughout the
    genome
  • On an average a significant SNP is found for
    every 1kb of
  • the human genome, resulting in approximately
    3 million SNPs
  • Large number
  • Unambiguous assay techniques
  • High levels of polymorphisms in population
  • Most of the phenotypic differences arise from
    SNPs in
  • genes, but these form only a small fraction of
    the total number

9
dbSNP DENSITY DISTRIBUTION IN HUMAN
  • Mean Density
  • 0.001765 SNPs per base (17.652 SNPs per 10 kb)
  • Mean Spacing
  • 566.5118 bases per SNP

10
SNP Discovery
  • SNP Discovery refers to the initial
    identification of new
  • SNPs
  • The established method is electrophoresis(DNA
    sequencing)
  • with subsequent data analysis. Some indirect
    Discovery
  • techniques (e.g., dHPLC, SSCP) only indicate
    that a SNP
  • (or other mutation) exists
  • DNA sequencing of multiple individuals is used
    to determine
  • the point and type of polymorphism

11
SNP Validation
  • SNP Validation refers to genetic validation, the
  • process of ensuring that the SNP is not due
    to
  • sequencing error
  • Confirmation of SNPs found in discovery
  • Larger numbers of individual samples to get
    statistical
  • data on occurrence in the population

12
  • THE EXPERIMENTAL APPROACH
  • RESTRICTION FRAGMENT LENGTH POLYMORPHISM
  • SINGLE STRANDED CONFORMATIONAL POLYMORPHISM
  • DENATURING HIGH PRESSURE LIQUID CHROMATOGRAPHY
  • HYBRIDIZATION METHOD
  • MALDI-TOF METHOD

SEQUENCING ALIGNMENT THEREAFTER
13
  • THE EXPERIMENTAL APPROACH
  • RESTRICTION FRAGMENT LENGTH POLYMORPHISM
  • SINGLE STRANDED CONFORMATIONAL POLYMORPHISM
  • DENATURING HIGH PRESSURE LIQUID CHROMATOGRAPHY
  • HYBRIDIZATION METHOD
  • MALDI-TOF METHOD

SEQUENCING ALIGNMENT THEREAFTER
14
IN SILICO SNP PREDICTION
POLYBAYES
SEAN SNP Prediction Program
SNP Finder
15
IN SILICO SNP PREDICTION
POLYBAYES
SEAN SNP Prediction Program
SNP Finder
16
Restriction Fragment Length Polymorphisms
Botstein et al (1980)
CHANGES IN MIGRATION PATTERNS THAT REPRESENT
ALLELIC VARIATION
A
3 Kb
Homolog 1
12 A
12 B
12 C
Homolog 2
1 Kb
2 Kb
PROBE
B
3 Kb
Homolog 1 2
C
Homolog 1 2
2 Kb
1 Kb
CAN BE USED TO DETECT SNPs DIFFERENTIALLY IN
HOMOZYGOUS HETEROZYGOUS INDIVIDUALS
17
MALDI-TOF METHOD
Matrix-assisted laser desorption ionization-time
of flight
18
(No Transcript)
19
SEQUENCING METHOD
20
POLYBAYES BAYESIAN INFERENCE ENGINE TO CALCULATE
THE PROBABILITY THAT A GIVEN SITE IS POLYMORPHIC
  • FRAGMENT CLUSTERING
  • PARALOGUE IDENTIFICATION
  • MULTIPLE ALIGNMENT

21
SNP DETECTION IN REDUNDANT SEQUENCE DATA
SEQUENCE CLUSTERING CLUSTER REFINEMENT MULTIPL
E ALIGNMENT SNP DETECTION
22
The PolyBayes Approach
  • Use genomic sequence as reference
  • cluster and align all available sequences
  • remove repeats/paralogs
  • Use Bayesian statistics to
  • distinguish polymorphic sites from artifacts
  • estimate likelihood
  • Marth, GT, Korf, I, Yandell, MD, Yeh, RT, Gu, Z,
    Zakeri, H, Stitziel, NO, Hillier, L, Kwok, P-Y,
    Gish, WR A general approach to single-nucleotide
    polymorphism discovery. Nature Genet. 1999
    23452-456.

23
(No Transcript)
24
1. Known repeat sequences are masked using
RepeatMasker
2. FRAGMENT CLUSTERING (a) WU-BLAST used to
search against dbEST (b) Sequence traces
processed with PHRED base-calling values
(c) Distinct group of matching ESTs registered
as clusters
3. Each cluster member pair-wise aligned to the
genomic anchor sequence with CROSS_MATCH
25
PARALOGUE IDENTIFICATION
1. May give rise to false SNP predictions
points to difficulties during marker development
2. Calculate probability PNAT that a cluster
member is derived from genomic region.
3. Distinguish between less accurate sequences
that nevertheless originate from the same
underlying genomic location More accurate
sequences with high-quality discrepancies that
are likely to be paralogous
4. Using a threshold value PNAT,MIN paralogous
cluster members are removed
26
DNAT L PPOLY.2 E (PPOLY.2 0.001) DPAR
L PPAR E (PPAR 0.02)
d discrepancies
P(MODELNATD)
PNAT,MIN 0.75
27
MULTIPLE ALIGNMENT
  1. Depth of coverage
  2. The base-quality values of the sequences
  3. The a priori expected rate of polymorphic sites
    in the region
  • PSNP ? PROBABILITY THAT THE SITE IS POLYMORPHIC
  • DISTRIBUTION OF PROBABILITY SCORES EXHIBITS A
  • HIGH LEVEL OF SPECIFICITY

28
THRESHOLD VALUE PSNP 0.4
29
THE POLYBAYES SOFTWARE
30
OTHER SNP PREDICTION SNP FINDING SOFTWARE
  • SEAN Search for localized SNPs
  • and predict SNPs
  • (http//zebrafish.doc
    .ic.ac.uk/Sean/)
  • SNP Finder For analyzing user-submitted
  • trace data
  • (http//gai.nci.nih.g
    ov/)

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
SIGNIFICANCE OF SNPs
  • IN DISEASE DIAGNOSIS
  • IN FINDING PREDISPOSITION TO DISEASES
  • IN DRUG DISCOVERY DEVELOPMENT
  • IN DRUG RESPONSES
  • INVESTIGATION OF MIGRATION PATTERNS
  • ALL THESE ASPECT WILL HELP TO LOOK FOR MEDICATION
    DIAGNOSIS AT INDIVIDUAL LEVEL

36
SNP Screening
  • Two different screening strategies
  • - Many SNPs in a few individuals
  • - A few SNPs in many individuals
  • Different strategies will require different
    tools
  • Important in determining markers for complex
    genetic
  • states

37
SNP genotyping methods for detecting genes
contributing to susceptibility or resistance to
multifactorial diseases, adverse drug reactions
gt case-control association analysis
.GCCGTTGAC. .GCCATTGAC. .GCCATTGAC. .GCCAT
TGAC.
case
control
allele frequency genotype frequency
haplotype frequency A , G AA ,
AG , GG SNP1, SNP2, SNP3
38
HAPLOTYPE
A set of closely linked genetic markers present
on one chromosome which tend to be inherited
together (not easily separable by recombination)
39
SNP-Haplotype
SNP
SNP
BLACK EYE BROWN EYE BLACK EYE BLUE EYE BROWN
EYE BROWN EYE
GATATTCGTACGGA-T GATGTTCGTACTGAAT GATATTCGTACGGA-T
GATATTCGTACGGAAT GATGTTCGTACTGAAT GATGTTCGTACTGAA
T
Haplotypes
AG 2/6(BLACK EYE) GTA 3/6(BROWN EYE) AGA 1/6
(BLUE EYE)
1 2 3 4 5 6
DNA Sequence
40
HAPLOTYPE CORRELATION WITH PHENOTYPE
  • The Haplotype centric approach combines the
    information of adjacent SNPs into composite
    multilocus haplotypes.
  • Haplotypes are not only more informative but
    also capture the regional LD information, which
    is assumed to be robust and powerful
  • Association of haplotype frequencies with the
    presence of desired phenotypic frequencies in the
    population will help in utilizing the maximum
    potential of SNP as a marker.

41
ADVANTAGES
  1. SNPs ARE THE MOST FREQUENT FORM OF DNA VARIATIONS
  2. THEY ARE THE DISEASE CAUSING MUTATIONS IN MANY
    GENES
  3. THEY ARE ABUNDANT HAVE SLOW MUTATION RATES
  4. EASY TO SCORE
  5. MAY WORK AS THE NEXT GENERATION OF GENETIC
    MARKERS

42
LIMITATIONS
1. EXPERIMENTAL DETECTION OF SNPs REQUIRES
IMPLEMENTATION OF EXPENSIVE TECHNOLOGIES
2. NEED FOR LARGE POPULATION DATASETS FOR
ASSOCIATION STUDIES
43
Some important SNP database Resources
1. dbSNP (http//www.ncbi.nlm.nih.gov/SNP/)
LocusLink (http//www.ncbi.nlm.nih.gov/LocusLink/l
ist.cgi) 2. TSC (http//snp.cshl.org/) 3. SNPper
(http//snpper.chip.org/bio/) 4. JSNP
(http//snp.ims.u-tokyo.ac.jp/search.html) 5.
GeneSNPs (http//www.genome.utah.edu/genesnps/) 6.
HGVbase (http//hgvbase.cgb.ki.se/) 7. PolyPhen
(http//dove.embl-heidelberg.de/PolyPhen/)
OMIM (http//www.ncbi.nlm.nih.gov/entrez/query.fcg
i?dbOMIM)
8. Human SNP database (http//www-genome.wi.
mit.edu/snp/human/)
Feb. 25. 2003 SI Hung
Write a Comment
User Comments (0)
About PowerShow.com