Title: Single Nucleotide Polymorphisms
1Single Nucleotide Polymorphisms
- Jennifer Lyon
- Eskind Biomedical Library
- May 1, 2009
- CRC Workshop Series
2Types of Genetic Variations
- Single Nucleotide Polymorphisms (SNP)
- Single base pair changes
- GTCATTCGATT
- GTCAGTCGATT
- Indels
- Small insertion/deletions
- CTT------GATC
- CTTACGGATC
- Small variable repeats microsatellites
- ACGACGACGACGACGACG (6 copies)
- ACGACGACGACGACGACGACG (7 copies)
- Variable Long tandem repeats (can be dozens to
hundreds to thousands) - Chromosomal Aberrations Translocations,
Inversions, etc.
3Focusing on SNPs
- Types of SNPs
- SNP nomenclature
- Resources for SNPs
- Examples and Challenges in Finding SNPs
- http//learn.genetics.utah.edu/content/health/phar
ma/snips/
4SNPs Types
- SNPs can be categorized in a number of ways, the
most common are by location and function
(relative to a gene) - Intragenic SNPs are often categorized by function
are they in a coding region, an intron, part of
the mRNA, outside the mRNA but still in the gene
locus (i.e., in the promoter) - Extragenic SNPs may be considered simply
genomic or might be labeled relative to the
nearest gene, ie. 5 or 3 to a gene - An extragenic SNP may affect regulatory regions
important in gene expression or other DNA
functions such as DNA replication.
5SNP Functional Categories
- coding nonsynonymous
- Missense, nonsense, frame shift
- coding synonymous
- Intronic
- splice site
- mRNA utr
- 5' utr or 3' utr
- (gene) locus region (5 or 3 to the gene)
- near gene usually means within 2000bp of gene
- genomic/extragenic (distant from any gene)
6Coding Nonsynonymous SNPs
http//www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/V
ariation/powerpoint/variation_files/frame.html
7Coding Non-Synonymous SNPs
- Nonsense
- Change an aa to a stop codon
- Results in a shortened protein
- Frame Shift
- Are really single-base indels
- Drop or add one base and the triplet reading
frame is thrown out of shift, altering all
downstream aas and usually resulting in an
earlier stop codon
8SNP Nomenclature
- The Human Genome Variation Society
(http//www.hgvs.org/mutnomen/recs.html) has
proposed some guidelines for SNP nomenclature,
but at the moment, there is minimal consistency. - Different sources will refer to the same SNP in
different ways - While dbSNP identifiers (rs12345678) are
becoming common, they are not required of
publishing authors and not used in all cases.
9SNPs at Base-Pair Level
- The base-pair change is given in various forms
- A/C T?G CgtT 432GgtC T73C
- The HGVS nomenclature recommendations
- "c." for a coding DNA sequence (like c.76AgtT)
"g." for a genomic sequence (like g.476AgtT) "m."
for a mitochondrial sequence (like m.8993TgtC - "r." for an RNA sequence (like r.76agtu)
10Position, position, position!
- The big issue with SNPs is identifying their
location (numerically). - Position can be specified
- Number location within a specific sequence
- Relative to another genetic landmark
- Start site for a coding region of a gene
- Start or end of an exon or intron
- Relative to a marker
- Published articles are not always clear on
this!!! - Different resources may use different
landmarks/numbering - Numbering is always relative to the chosen
sequence
11Coding SNPs
- These are easier because they can be identified
by the amino acid position rather than the
base-pair position - Most common nomenclature uses either 3-letter or
single amino acid codes - Asn332Asp OR A95V
- The HGVS recommendation is similar
- "p." for a protein sequence (like p.Lys76Asn)
- Amino Acid (protein) coding sequence positions
becoming more consistent, but are not always
consistent
12Database of SNPs (dbSNP)
- dbSNP
- is the international central repository for both
single base nucleotide substitutions and short
deletion and insertion polymorphisms - accepts data submissions from scientists
- is integrated with the NCBIs Entrez system
13dbSNP Content
- The SNP database has two major classes of
content - Submitted data, i.e., original observations of
sequence variation Submitted SNPs (SS) with ss
(ss 5586300) - Computed/curated data Reference SNP Clusters
(Ref SNP) with rs (rs 4986582)
14Reference SNP Clusters
- Ref SNP clusters are computer-generated and
curated by NCBI staff - Ref SNP Clusters define a non-redundant set of
SNPs - All individual SNPs submitted by a researcher are
given a submitter SNP number (ss) and then
redundant (repetitive) submitter SNPs are
combined into a RefSNP cluster record, with a
unique rs - Ref SNP clusters may contain multiple submitted
SNPs
15Searching dbSNP
- dbSNP is searched like any other Entrez db
- Specialized fields include
Field Tag Notes
Allele Allele Uses IUPAC codes for bases
Chromosomal Location CHRPOS Uses chromosomal base-pair locations
Contig Position ctpos Uses contig base-pair locations
Function Class Func Includes coding synonymous, missense, nonsense, intron, utr, etc.
SNP Class SNP_Class Includes snp, indel, mixed
16SNP Limits Page
17Creating a Complex Search
- Retrieve all synonymous coding reference SNPs for
the human norepinephrine transporter gene
(Slc6a2) from dbSNP - Search Strategy
- humanorgn AND Slc6a2gene AND coding
synonymous FUNC - Note To use the gene (gene name) field, it is
necessary to have the official gene name or gene
symbol as per the Human Gene Nomenclature
Committee. Entrez Gene can be used to find these.
18dbSNP Output Graphical Display
19dbSNP - Live
- Lets look at a dbSNP reference SNP page
- http//www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs374
3788
20Finding SNPs - Challenges
- If rs is available start with it
- Not all rss have information in all databases
- Another database of interest is the Online
Mendelian Inheritance in Man (OMIM) - OMIM doesnt always provide rss even when there
is one - dbSNP records may link to OMIM or may not, even
if the SNP is in an OMIM record
21Example 1
- rs1800888
- (CgtT) ? Ile164Thr in ADRB2 gene
- HGVS nomenclature
- NP_000015.1p.T164I
- To Find in OMIM
- Search with rs1800888 yield nothing
- Search with ADRB2gene find record
- Look at allelic variants .0003
BETA-2-ADRENORECEPTOR AGONIST, REDUCED RESPONSE
TO ADRB2, THR164ILE - It is a match
22Example 2
- rs2740574
- A/G SNP located 5 to CYP3A4
- HGVS nomenclature
- NT_007933.14g.24616372CgtT
- To find in OMIM
- Search with rs2740574 yields nothing
- Search with gene name CYP3A4 find record
- Find list of allelic variants - .0001 CYP3A4
PROMOTER POLYMORPHISM CYP3A4, a-g PROMOTER - Compare info in dbSNP to info in OMIM (look at
sequence)
23Other Databases
- OMIM NCBI
- HapMap - International HapMap Project
- ALFRED Allele Frequence Databases
- HGVbaseG2P - Human Genome Variation database of
Genotype-to-Phenotype information - PharmGKB Pharmacogenomics Knowledgebase
- F-SNP Functional SNPs