Title: Human Genetic Variation
1Human Genetic Variation
SNPs
2Human Genetic Identity
- 99.9 identical
- 3,196,800,000 nucleotides identical
- 3,200,000 nucleotides different
3Human Genetic Variation
- Single base differences in genomes between any
two individuals 2-5 million - Amino acid differences in proteomes between
between any two individuals about 100,000
4Relevance Variation Bears Difference
- Inherited contribution to phenotypic variation
- Physiological and anatomical differences based on
molecular differences - Inheritance vs. Alterations
- Exception Trauma, environmental impacts
5(No Transcript)
6Variation Types
- Macro
- Chromosome numbers
- Segmental duplications, rearrangements, and
deletions - Medium
- Sequence Repeats
- Transposable Elements
- Short Deletions, Sequence and Tandem Repeats
- Micro
- Single Nucleotide Polymorphisms (SNPs)
- Single Nucleotide Insertions and Deletions
(Indels)
7Relevance
- Identify inherited contribution to
- Disease risk
- Reactions to environmental triggers
- Reactions to treatments
- Cognitive abilities
- Requirements
- Forensics (incl. WTC)
- Ontology Phylogeny
- Evolution
8SNPs in Biomedicine
- Medical Conditions
- Diagnosis and treatment
- Gene therapy
- Pharmacogenomics
- Individualized drugs
- Less drug side effects
- Faster clinical trials
9Most common diseases are caused by a combination
of genes and environment
Stroke
Manic-depression
Myocardial Infarction
Breast cancer
Hypertension
Diabetes
High Cholesterol
Obesity
Schizophrenia
Inflammatory Bowel Disease
10Traditional Approach
- Linkage or recombinatorial mapping
- Successful for single gene disorders
- Little success for common complex traits, such as
heart disease, diabetes, asthma, mental disorders
11Mendelian disease genetics
genotype
disease state
Linkage analysis powerful because genetic risk
factors are highly penetrant
12Complex Disease Genetics
phenotype1
other genes
phenotype2
disease state
genotype
phenotype3
phenotype4
phenotype5
environment
13CHALLENGE
Find genetic susceptibility factors for common
disease, drug and pathogen response
- To understand the fundamental basis of disease
- To identify at-risk populations
- To identify targets for chemical
intervention/drug development - To predict outcomes and treatment efficacy
14Complex Disease Genetics
- Despite significant genetic component,
traditional approaches (e.g., linkage analysis)
have little or no power because the genetic risk
factors are individually too modest/incompletely
penetrant - If traditional approaches arent succeeding -
where do we turn?
15Raw Genome Data
Biological variation vs. sequence variation
16Most abundant type SNPs-Single Nucleotide
Polymorphisms GATTTAGATCGCGATAGAG GATTTAGATCTCGAT
AGAG
17SNPs in Overlapping Genomic Sequences
Overlapping BACs from library
50 of overlaps contain polymorphisms
18SNPs What they are (and arent )
- Single base pair variations among allelic
sequences - Least abundant allele has frequency gt 1
- Not all single base pair differences are SNPs
- Not all point mutations are SNPs (InDels)
19Types of SNPs
- Causative SNPs
- coding SNPS (non/-synonymous)
- non-coding SNPs
- (Read Making Sense of Nonsense NRG 5/02)
- Linked SNPs
- usually intra- and intergenic SNPs
20Occurrence
- Ca. 1/1300 bp in genomic DNA from two equivalent
chromosomes - Ca. 1/300 bp in whole populations
- In intergenic regions, introns exons
- Functionally constrained DNA less diverse
- 50 of CDS SNPs synonymous
- Frequency varies by variation type
21(No Transcript)
22(No Transcript)
23TSC SNPdb
- A/G C/T 33
- A/C A/T C/G G/T 8
- A/C/G A/C/T A/G/T C/G/T 0.006
- A/C/G/T 0.002
24Transversions c?a g?t c?g g?c t?a a?t
25SNPs - Identification
- Sequence comparison
- PCR methods
- Microarrays
- De Novo vs. Database mining
26In Silico SNP Identification
- Mine existing sequence resources
- Genomic sequences
- ESTs
- BAC-end sequences
- Cost-effective genome-wide SNP discovery by
examining regions of redundant sequence coverage - Rarer alleles more difficult to spot
- Frequency below error rate
- Allele copies available in dbs
27(No Transcript)
28GATTTAGATCGCGATAGAGGATTTAGATCTCGATAGAG
- Rare Alleles
- ---o--------------------
- -----o------------------
- -------o----------------
- -----------o------------
- ---------------o--------
- -------------------o----
- Many
- Common Alleles
- ----o-------------------
- ----o-------------------
- ----o-------------------
- --------------------o---
- --------------------o---
- --------------------o---
- Few
Common Alleles ----o------------------- ----o-----
-------------- ----o------------------- ----------
----------o--- --------------------o--- ----------
----------o--- Few
Rare Alleles ---o-------------------- -----o------
------------ -------o---------------- -----------o
------------ ---------------o-------- ------------
-------o---- Many
Rare Alleles ---o-------------------- -----o------
------------ -------o---------------- -----------o
------------ ---------------o-------- ------------
-------o---- Many ?
29Pharmacogenomics
- The use of DNA sequence information to measure
and predict the reaction of individuals to drugs. - Personalized drugs
- Faster clinical trials through selected trial
populations - Less drug side effects
30Processes
Drugs
Patients
- Responders
- Non-responders
- Toxic responders
- Absorption
- Distribution
- Activation
- Metabolism
- Excretion
Goals
- Right Drug
- Right Dose
- Right Patient
31SNPs vs. Haplotypes
GATTTAGATCGCGATAGAGGATTTAGATCTCGATAGAG
CGGGTATCGATTTAGATCGCGATAGAGTTGCCTACA CGAGTATCGATTT
AGATCTCGATAGAGTTGTCTACA
Many polymorphisms make a type
32- Responders
- tcgaggaacagggctcttaaaaatgctttatccgcttag
- tcgaggaacagggctcttaaaaatgctttctccgcttgg
- tagagcaacagggctctaaaaaatgctttctccgcttag
- Non-responders
- tagtgaaacagggctctgaaaactgctttatccgattcg
- tagtggaatagggctctgaaaactgctttatccgattgg
- tcgtggaacagggctctgaaaactgctttgtccgattgg
33- Responders
- -c-a-g--c--------t----a------a----c--a-
- -c-a-g--c--------t----a------c----c--g-
- -a-a-c--c--------a----a------c----c--a-
- Non-responders
- -a-t-a--c--------g----c------a----a--c-
- -a-t-g--t--------g----c------a----a--g-
- -c-t-g--c--------g----c------g----a--g-
34SNPFinder
- SNPFinder at the Cancer Gene Anatomy Project
allows you to search for SNPs (single-nucleotide
polymorphisms). You may upload your ABI or
SCF-format chromatograms, which are basecalled,
assembled and searched for SNPs. Sequences may
optionally be assembled with UniGene data of your
choosing. An account is required to use SNPFinder
- it is free for academic users.
35Sites to view SNPs
- UCSC Browser (http//genome.ucsc.edu)
- SNP Consortium (http//snp.cshl.org//)
- NCBIs dbSNP (www.ncbi.nlm.nih.gov/SNP/)
36Variation Data at NCBI
- Bioinformatics infrastructure
- Databases of sequence variation
- Haplotypes and variations as genome annotation
- Functional variants in genetic disease /
pharmacogenomics - Evolutionary Biology
- Forensic Biology
- mass casualty analysis
37(No Transcript)
38dbSNP - variation and polymorphism
Variant position A/G
39Roles for variations in genome analysis
Physical Mappingenriched marker set Population
Structurehaplotype analysisevolutionary
studies Association Studiespopulation
stratificationmetabolic pathways Functional
Analysispharmacogenomicsprotein structure
40Variations mapped onto protein structures
3,868 variations (6 of coding region variations)
have been mapped to a 3D structure (Summer 2002)
41(No Transcript)
42Haplotypes in NCBI MapViewer