Title: Don%20Seto
1Don Seto Dept of Bioinformatics and Computational
Biology dseto_at_gmu.edu Sept 8, 2008
2Binf 732 Genomics DNA sequencing and analysis
applications I Historical perspectives DNA
chemistry and biochemistry (basic
research) Molecular biology and the Central
Dogma Recombinant DNA technology (applied
research) DNA sequencing methodology Why
genomics? Genome sequencing strategies Instrum
entation Good data/bad data Applications of
sequence data Instrumentation (technology
development) Data processing signal
resolution signal to noise base-calling seque
nce assembly QC Viruses and genomics (new
insights)
3Binf 732 Genomics DNA sequencing and analysis
applications II Genome annotation Applications
of genome methodology small scale Model
organisms, surrogate for human biology ? Big
science and Large-scale and high-throughput
sequencing Industrial-strength Human Genome
Project () Next generation DNA analysis
technology two examples Cancer genomics
4Genomics- what is it?
- Its a new, and changing/adapting field
- ornl.gov Genomics- The study of genes and their
function - Answers.com The study of all of the nucleotide
sequences, including structural genes, regulatory - sequences and noncoding DNA segments, in the
chromosomes of an organism - Wiki (genomics) The study of an organisms
entire genome in contrast, the investigation of - single genes, their functions and roles does
not fall into the definition of genomics - - Cites US EPAs definition
- Study of all the genes of a cell or tissue at the
DNA (genotype), mRNA (transcriptome) - or protein (proteome) levels also ???-ome
- FSanger, et al. sequenced complete genomes of a
bacteriophage F-X174 (5,368 bp 1977) - and a mitochondrion human (16,500 bp 1981)
- Now known as the Cambridge Reference Sequence
(CRS) - Reference for studies on human evolution,
population genetics and mito disease - Established techniques of sequencing, genome
mapping, data storage and bioinformatics analyses
5This is Genomics!
- NYT Jan 22, 2007 Close-ups of the
- Genome, Species by Species by Species
- Circos interactive site/resource
- Outer band represents each speciesfirst
chromosome - Numbers represent millions of bp on the
chromosome - Bar charts tell how many bp, 0-1M, match part of
human chromosome - Line charts show what is similar to each of
other five genomes - Lines join the 200 regions on ea chromosome most
similar to human - thicker more similar
- OTHER types of comparisons
- eg, BRCA1- green lines represent protein
similarity
6DNA and recombinant DNA, short history of....
- Miescher 1868. First isolation of nucleic acids
(RAltmann 1889) from salmon, as nuclein ) - Levene 1919. Identified four bases, sugar and
phosphate as components P-S-B order - Griffith 1928. S.pneumonae inert factor is
infectious, gtdeath is functional - Chargaff 1940s. Base ratios AT, GC PurPyr
- Avery, MacLeod, McCarty 1944. S.pneumonae
nuclease sensitive follow-up - Lederberg 1945 Wollman and Jacob 1955.
Conjugation/ bacteria sex - Hershey, Chase 1952. T2 DNA injected, protein
coat outside - Watson, Crick, Wilkins, Franklin 1952. Double
helix structure, implications of replication - Kornberg, et. al. 1958. Biochemical basis of DNA
replication - Meselson and Stahl 1958. Semi-conservative
replication - Matthaei and Nirenberg Khorana 1961-1965.
Genetic code - 1971. Specific cleavage of SV40 by RE. Danna
and Nathans, (71) PNAS 682913 - Kelly and Smith, (70) JMB 51393
- 1972. DNA cloning
- 1975. Asilomar Conference on Recombinant DNA
(organized by Paul Berg) - (Ref H. Judson. Eighth Day of Creation. 1979)
7Nucleic acids chemistry
- Biochemistry as mRNA, tRNA and rRNA (sn, iRNA)
DNA - Roles information, structure, mediators (,
regulatory, signal, energy)
8(No Transcript)
9Monomers linked
- Properties and constraints
- Physical
- Chemical
- Biological
10Chargaffs Rules
11Bridging chemistry to biology Structure of the
DNA double helix
James Watson, 1928-pres
Francis Crick, 1916-2004
- http//www.achievement.org/autodoc/photocredit/ach
ievers/wat0-001 - DNA photo wiki
12DNA, very stable and contains highly specific
conserved information
-How to ensure fidelity -How to access it without
losing it?
13RNA secondary structures
14Double helix is a problem, so are secondary
structures
15Importance of secondary and/or local structures
as biochemistry
- AAV genome is 4680 nucleotides
- Integration into chr 19 unique site
- DNA and secondary structure
- Rep proteins bind to integration site
- ROwens et al, 1994
16Physical chemistry of DNA replication, RNA
transcription
17Mediation of proteins emulates physical
chemistry (proteins interact with nucleic acids
18Bridging chemistry to biology
- Buchner
- biology (extracts) to synthesize biochemicals
- Approach to research
- Freezer of proteins, as reagents
19Bridging chemistry to biology
- Cell-free extract of yeast cells as
- press juice ferments sugar
- -gt living yeast cells not needed for
fermentation
1860-1917
20Proteins structure and function, roles in the
cell
- Paradigm change- proteins not as research topic
but - As reagents and tools, and
- Processes as well
21Metabolic pathways within cells
22How is the genome information accessed? -recogniti
on sites for gene expression -same for
replication?
- Proteins bridge DNA and molecular biology of the
cell
23Initiation of bacterial chromosome replication
24Genomics of initiation of bacterial chromosome
replication
- Replication initiation of broad host range
plasmid RK2, oriV - Amino acid comparison of DnaA proteins, 47.1 to
83.7 similarity - Most conserved regions domain III- nucleotide
binding region - and domain IV- DNA binding site
- Domain II longer in S. coelicolor than E. coli-
added role? - Arrows indicate protein-protein interacting
domains - http//www.jbc.org/cgi/content/full/275/24/18454/F
1
25Proteins interact very specifically with DNA
sequences
26DNA-binding proteins have several roles
- Modifiers
- Activators
- Gene expression
- Other processes
- Repressors
- ditto
- Recruiters
- Stabilizers
27Replication proteins as a macromolecular
assembly Or, today as, Replisome
and involved in the central processes of the
cell, noted as Central Dogma of Molecular
Biology
28Biochemistry inside the cell
29Central Dogma of Molecular Biology seemingly
often refuted by new understandings
- Describes information flow inside the cell
- How can this be the Central Dogma if it does
not hold up? - LMoran blog http//sandwalk.blogspot.com/2007/01
/central-dogma-of-molecular-biology.html
30Central Dogma of Molecular Biology seemingly
often refuted
- Many younger scientists or the Recombinant DNA,
Molecular Biology and Genomics era - read/studied Watsons Molecular Biology of the
Gene - not many have read the original papers
- LMoran blog http//sandwalk.blogspot.com/2007/01
/central-dogma-of-molecular-biology.html
31Central Dogma of Molecular Biology perception
corrected
- http//sandwalk.blogspot.com/2007/01/central-dogma
-of-molecular-biology.html
32Central Dogma of Molecular Biology perception
corrected
- http//sandwalk.blogspot.com/2007/01/central-dogma
-of-molecular-biology.html
33Elongation
- Components, eg proteins and nucleic acids, and
small molecules, may be used as reagents - once their biochemical roles are understood,
and once they can be isolated and stabilized - Molecular biological processes, eg, replication
and transcription may be used as reagents - Both reagents may be modified, once understood
ex, Mn2 for Mg2 dNTP vs ddNTP replication - Along with the same from Genetics, Cell biology,
etc
34DNA and recombinant DNA, short history of....
- Miescher 1868. First isolation of nucleic acids
(RAltmann 1889) from salmon, as nuclein ) - Levene 1919. Identified four bases, sugar and
phosphate as components P-S-B order - Griffith 1928. S.pneumonae inert factor is
infectious, gtdeath is functional - Chargaff 1940s. Base ratios AT, GC PurPyr
- Avery, MacLeod, McCarty 1944. S.pneumonae
nuclease sensitive follow-up - Lederberg 1945 Wollman and Jacob 1955.
Conjugation/ bacteria sex - Hershey, Chase 1952. T2 DNA injected, protein
coat outside - Watson, Crick, Wilkins, Franklin 1952. Double
helix structure, implications of replication - Kornberg, et. al. 1958. Biochemical basis of DNA
replication - Meselson and Stahl 1958. Semi-conservative
replication - Matthaei and Nirenberg Khorana 1961-1965.
Genetic code - 1971. Specific cleavage of SV40 by RE. Danna
and Nathans, (71) PNAS 682913 - Kelly and Smith, (70) JMB 51393
- 1972. DNA cloning
- 1975. Asilomar Conference on Recombinant DNA
(organized by Paul Berg) - (Ref H. Judson. Eighth Day of Creation. 1979)
35Basic science to applied science laboratory to
industry Bacteriophage host range restriction
system
- Or restriction-modification system
- 1950s, bacteriophage can infect and replicate in
one strain of bacterium and not another - ie, the infected strain inhibited or restricted
the growth of the viruses grown in another strain
first - Due to sequence-specific restriction enzyme
- Recognition site 4-6 bp long and often
palindromic sequence - Paired with own modification enzyme to protect
its own DNA
36Where did molecular biotechnology begin? Basic
science to applied science laboratory to industry
- With HSmith and WArber, Nobel Prize 78
- Molecular scissors as phage host range
restriction - Fundamental tool for recombinant DNA technology
37Where did molecular biotechnology begin? Basic
science to applied science laboratory to industry
- Invented a method of cloning genetically
engineered molecules in foreign cells - initiated what is now the multi-billion-dollar
biotechnology industry - Collaboration began at a conference (on bacterial
plasmids) in Hawaii in 1972 - rock, paper scissors
- http//web.mit.edu/invent/iow/boyercohen.html
38Application of basic science discovery -gt
recombinant DNA technology
- Nov 1972 Honolulu Meeting on plasmids
- Collaboration
- HWBoyer- isolated an enzyme which cut DNA
at specific sites - SCohen- method to introduce
antibiotic-carrying plasmid into bacteria - method of
isolating and cloning genes carried by plasmids - 1973- series of expts resulting in method to
select and replicate specific foreign genes in
bacteria - Feb 1975 Asilomar in Pacific Grove, CA goal to
estimate risk of -
biohazard and formulate
guidelines - Dec 1980 First of three patents on gene cloning
to Stanford and UCalif - April 1976 Genentech incorporated (Boyer,
RASwanson) - 1977 WRutter et al cloned rat insulin gene
- 1981 Founded Chiron
- 1986 First recomb vaccine to receive FDA
approval - Chiron-Merck hepB vaccine
- retrospect, first cancer vaccine
- 29 yo venture capitalist
- http//bancroft.berkeley.edu/Exhibits/Biotech/25
39What does this represent? (Impact beyond
laboratory bench, applied clinical applications)
- 1980 founded as Amgen (Applied Molecular
GENetics) - based on recombinant DNA and molecular biology
- 1983 Amgen
- 1983 F-K Lin clones human erythropoietin
- recombinant as Epogen (epoetin alfa)
- 1985 LM Souza clones human granulocyte colony-
- stimulating factor G-CSF
- recombinant as Neupogen (filgrastim)
- 1987 First epo patent 1989 First neupo patent
- 1992 Sales gt 1B 1996 Sales gt 2B 1999 Sales gt
3B - 2006 Stock falls patent issues, pipeline
40What does this represent? (An example of the
integration of molecular biotechnology and
society) (Impact beyond clinical applications
economy and financials)
- 072608. NYT Amgens experimental bone drug,
- widely considered to be crucial to the companys
future - has succeeded in its most important clinical
trial, - sending the companys shares up sharply.
53.92 to above 61 - Sales up to several billion dollars per year
- 44M Americans over 50 have osteoporosis
- Previously, stock lost 50 due to falling sales
of its anemia drugs, - After some studies linked the drugs to worsening
of cancer, death and cardiovascular problems
- http//seekingalpha.com/article/87572-amgen-gets-m
uch-needed-denosumab-boost - http//blog.seattlepi.nwsource.com/thelifesciences
blog/archives/141562.asp
41What does this represent? (Impact from
recombinant DNA and genomics) (Applied science)
- Targeting the RANK/RANKL/OPG signaling pathway
A novel approach in the - management of osteoporosis (NATHamdy, Curr
Opin Invest Drugs 8299 (07)) - RANK, RANKL and OPG are members of TNF receptor
superfamily - Amgens experimental bone drug,denosumab
- Three-year study with 7,800 postmenopausal women
with osteoporosis - Reduced risks of spine and hip fractures,
compared with placebo - Smaller studies shown to build bone mineral
density but with - questions of whether this translates to
reduction in risks of fractures - See surprising reduction of hip fractures, rarer
than spine fractures - (thus, harder to show statistically significant
effect) - Hip fractures costly and potentially lethal
medical problems - Earlier studies, higher rate of serious
infection and cancer- no mention in news release - Denosumab is a mAb blocking action of RANK
ligand, protein involved in bone equilibrium - Nuclear factor kappa B ligand
- Not initiated from academia, but from internal
studies of genes in mice with - particularly dense bones
- Made mice based on superfamily data
- http//seekingalpha.com/article/87572-amgen-gets-m
uch-needed-denosumab-boost - http//blog.seattlepi.nwsource.com/thelifesciences
blog/archives/141562.asp - http//www.the-scientist.com/article/display/54849
/
42What does this represent? (Impact from
recombinant DNA and genomics) (Basic science)
- Not initiated from academia, but from internal
studies of genes in mice with particularly dense
bones - 1994 SSimonet (Thousand Oaks, CA) engineered
five transgenic mice overexpressing - a previously unknown protein, osteoprotegerin
- Looked and behaved normally but x-rays show
thicker pelvic and vertebral bones - Used this protein because its DNA sequenced
matched family of cytokine receptors - involved in cell death (TNFR)
- But differs in that it is secreted, eg missing
transmembrane-spanning sequence - 1998 Snow Brand Milk Co (Japan) independently
identified OPG - 1998 Both groups also discovered the binding
partner, - Similar to TNFR, called RANK, discovered at
Immunex (Seattle) in 1997 - Phase 1 trials with OPG, then switched to RANKL
- OPG prevents RANKL from binding RANK, denosumab
destroys RANKL directly - Silver bullet
- Immunex (WCDougall) 13 years on RANKL project.
- Now, studying its applications in bone cancer,
- giant cell tumor of the bone dramatically shrunk
- http//seekingalpha.com/article/87572-amgen-gets-m
uch-needed-denosumab-boost - http//blog.seattlepi.nwsource.com/thelifesciences
blog/archives/141562.asp - http//www.the-scientist.com/article/display/54849
/
43What does this represent? (Impact from
recombinant DNA and genomics) (Applied science)
- Targeting the RANK/RANKL/OPG signaling pathway
A novel approach in the - management of osteoporosis (NATHamdy, Curr
Opin Invest Drugs 8299 (07)) - RANK, RANKL and OPG are members of TNF receptor
superfamily - Made transgenic mice based on superfamily data
- Whats a superfamily? TNF to bone growth??? An
orphan receptor? Mice to man??? - -gtGenomics and Bioinformatics
44Addendum DNA sequencing (applied biology)
Basic research versus applied research Technology
(Instrumentation) example translation of DNA
replication and DNA synthesis to DNA
sequencing..... molecular biology, genomics,
bioinformatics
-signal transduction field
45- Originally and Simply, The Cell
- Cells come in all shapes, sizes, functions
- So, if we understand the cell, we can use it
46Huge range of cells given one genome blueprint
47Across all life forms, Diversity and commonality
48Whats in that blueprint? How do we get to it,
read it? Use it???
- Whats a superfamily? TNF to bone growth??? An
orphan receptor? Mice to man???
49Whats in that blueprint? Once we read it, we
find You and the Fly are very similar in genes
- Whats a superfamily? TNF to bone growth??? An
orphan receptor? Mice to man???
50Model organisms value
51Where did genomics begin? Basic science to
applied science laboratory to industry
- Protein sequencing
- DNA sequencing, also Maxam and Gilbert
- Automation of DNA sequencing also et al
52DNA sequencing methodologies ca. 1977!-The
Chemistries
- Maxam-Gilbert
- General
- base modification by general and specific
chemicals - depurination or depyrimidination
- single-strand excision
- not amenable to automation
- Sanger
- Specific
- DNA replication-based
- substitution of substrate with chain-terminator
version - more efficient
- automation
53DNA sequencing Maxam-Gilbert
54DNA sequencing Maxam-Gilbert Close-up
Chemistry of reactions
55DNA sequencing Maxam-Gilbert Close-up
Chemistry of reactions
Note, 4 tubes-gt 4 lanes
56versus bio based methods
- Sanger method or
- dideoxynucleotide chain chemistry
57DNA biochemistry Biochemistry of replication
fork
58DNA replication Chemistry of replication fork
59DNA replication Chemistry of replication fork
Problems with chemistry
60Modify DNA replication biochemistry with
nucleotide analogs
dideoxycytidine triphosphate (ddNTP)
61if last base added is dideoxy, no extension
purine or pyrimidine
N
C
HO
O
purine or pyrimidine
O
N
C
O
O
O
P
- Dideoxy chain termination method
- Sanger method
- The bio approach
OH
H
62DNA sequencing replication reaction
terminations gives ladders, with labels at
fixed primer end
63DNA sequence analysis protocolThe bench
64Shotgun cloning and DNA sequencing method vs
Primer walking strategies (tiled, etc)
- Reduction
- Chromosome (Mb) to
- YAC (gt100kb), BAC (100kb) to
- cosmid (40kb)
- To M13 (1kb)
65DNA re-sequencing method
66Before-whole genome sequencing analyses
Candidate gene huntingalso, one strategy for
genome sequencing
- Mapmaking Chromosome mapping
- Chromosome walks
- Isolation of candidate disease gene (early
strategy) - Clone and sequence bioinformatics
67Resequencing DNA methodology the mitochondria
- Applied Biosys, Innovations, July 08
- mitoSEQr System
- PCR-based resequencing system
- Identification of sequence variations
- entire mito genome
- And control region
- Methodology
- Overlapping regions amplified with specific
primer pairs - Tailed with universal M13 sequences
- Generates resequencing amplicons
- Identifying mitochondrial mutations
- Heteroplasmic mutations in affected tissues
- CFranceschi, GRomeo, EBonora, GGasparre
- Role of mitochondria in diseases, including
cancer and Alzheimers - Oncocytoma characterized by proliferation of
mitochondria
68Resequencing DNA methodology the mitochondria
- Applied Biosys, Innovations, July 08
- mitoSEQr System
- PCR-based resequencing system
- Identification of sequence variations
- entire mito genome
- And control region
- Methodology
- Overlapping regions amplified
- Tailed with universal M13 sequences
- Generates resequencing amplicons
- Identifying mitochondrial mutations
- Heteroplasmic mutations in affected tissues
- CFranceschi, GRomeo, EBonora, GGasparre
- Role of mitochondria in diseases, including
cancer and Alzheimers - Oncocytoma characterized by proliferation of
mitochondria
69Tracking the Woolly Mammoth, Out of America
Ancient DNA Evidence for a New World Origin of
Late Quaternary Woolly Mammoths
- RDebruyne HNPoinar, et al. Current Biology Sept
08 - (NYT 09/04/08)
- Siberian woolly mammoth wasnt really Siberian
- Origins to 6M yrs ago with common ancestor to
African elephant - Sequenced mitoDNA from 160 mammoth samples from
across Eurasia and North America - Identified several clades, some endemic to
Siberia and other parts of Asia, others to NAm - Separated by 1.5Myrs, of eastward migration to NAm
70Tracking the Woolly Mammoth, Out of America
Ancient DNA Evidence for a New World Origin of
Late Quaternary Woolly Mammoths
- RDebruyne HNPoinar, et al. Current Biology Sept
08 - At some point in past 150,000 yrs,
- NAm mammoths migrated back to Siberia over
Bering Strait - At reverse migration time, endemic Siberian
population was crashing - 40,000 yrs ago, NAm mammoths dominated Siberia
- Siberian died out on its own (genetic drift) or
out-competed? - The mammoth that went extinct in Siberia about
10,000 yrs ago was not of Siberian lineage - Common to think of Bering Strait as a one-way
route - Camels went NAm to Asia
- http//mutex.gmu.edu2119/cgi/content/full/2008/90
4/2
71Tracking the Woolly Mammoth, Mammoth Sequences
A Hunt for DNA from the Extinct Titans of the
Klondike
- Sci Am Sept 08 A different type of scientific
research tool - Core drill designed for punching holes in
concrete - Used to dig into ice, dated 100,000 yo
- Retrieve frozen soil from Pleistocene
- Paleomammalogist RMacPhee, AMNH, NCY
- Water leaking into crater and freezing, remaining
frozen, - might hold DNA from mammoths, flora and fauna
- May answer long-standing question of whether two
species of mammoth, rather than just one, - roamed the Americas at the end of last ice
age
72Evolution of simple technologyThrough-put
considerations
- Microbiology and molecular biology
- Insert sizes and vectors
- Plasmid
- Plasmid-derived RE-defined fragments
- 100- 500 base inserts
- M13-based 1 ug template, linear amp
- 1 kb inserts
- M-13 or any 0.1ug or 100ng template, cycle amp
- now, ng quantities, cycle amp plus detection
technology - Labware
- eppy tubes/test tubes/ 50cc conical tubes
- microtiter plates/deep well plates
- 48 to 96 to 384 to
- stacked plates
- Automation, robotics
73The Biology- Preparation of sequence template
M13 replication
http//www.biochem.arizona.edu/classes/bioc471/pag
es/Lecture5/Lecture5.html
74Preparation of sequence template M13
modification
- Cloning sites
- Universal primers
http//wine1.sb.fsu.edu/bch5425/lect33/lect33.htm
75M13 DNA prep
76DNA sequencing Ladders terminate
randomly,allowing reads
77DNA sequencing In practice- resolution of
ladders
template polymerase
1 dCTP dTTP dGTP dATP ddATP primer
2 dCTP dTTP dGTP dATP ddGTP primer
3 dCTP dTTP dGTP dATP ddTTP primer
4 dCTP dTTP dGTP dATP ddCTP primer
electrophoresis
AT GC AT TA CG TA GC GC AT GC TA TA C
G TA GC AT
extension
78Manual radioactive sequencing(high resolution
denaturing PAGE)
- Steps
- Remove sandwich from hot chambers
- Separate top plate of 1-3mm gel
- Denature with acetic acid mix
- Bind to old film
- Wrap in saran
- Assemble sandwich with intensifier screen
- and x-ray cassette box
- Expose overnite at -20oC
- Develop film
- Assess
- (Repeat?)
- Read film
- Enter data
- Discard wastes
- Other problems....
79- Wiki
- http//dnasequencing.wordpress.com/2007/10/26/chai
n-termination-methods/
80(No Transcript)
81Semi-automated fluorescent DNA sequencing
primer label
- Fred Sanger et. al., 1977.
- Maxam and Gilbert, 1977
- Leroy Hood et. al., 1986
- Applied Biosystems Inc., 1987
- JM Prober et. al., _at_DuPont, 2000
- H Swerdlow et. al., 1990 1991
- BL Karger et. al., 1993
82DNA sequencing Upgrade, second iteration, dye
terminator label
- Disadvantages of primer-labels
- four separate sequencing reactions
- tedious manually
- limited to certain regions, custom oligos or
- limited to cloned inserts behind universal
priming sites - Advantages TBD
- Solution Dye terminators
- DuPont Company, sold technology to ABI as being
of limited use
83Semi-automated fluorescent DNA sequencing
Terminator label
Note excitation vs emission
84Semi-automated fluorescent DNA sequencing
Terminator labelSequencing chemistry
- modification of the biochemistry to accommodate
- Pre-PCR-based
85Semi-automated fluorescent DNA sequencing
Terminator label
Note 1 tube-gt 1 lane Also, 4x increase in
thru-put
template polymerase
dCTP dTTP dGTP dATP ddATP ddGTP ddTTP ddCTP
electrophoresis
AT GC AT TA CG TA GC GC AT GC TA TA C
G TA GC AT
extension
86DNA sequencing instrumentation
Equipment/automation
87Biotech Generation 1 Auto Sequencer Value
of Instrumentation
88ABI series 370, 373 and 377
- semi-automated
- ca. 1989
- higher throughput operations
- bioinformatics limitations-gt opportunities
89ABI 377 April 06 retirement planned
- technology moves on
- new Big Science (paradigm shift)
- capabilities vs costs
90Second generation Capillary electrophoresisSand
er-based chemistry
- ABI/Applied Biosystems
- 1-cap 310
- 4-cap 3100, 3130
- 16-cap 3100, 3130xl
- 48-cap 3730
- 96-cap 3730xl
- Amersham
- MegaBACE 96-cap
- Beckman
- CEQ 8000 16-cap
91Cap array screen dump
92Multi-capillary array The Skin
93Third generation ex., Shimadzu, Ltd.(DNA
sequencing technology)
- NEW ORLEANS, March 19, 2002. PittCon
- Shimadzu Ltd. Faster and more economical DNA
Sequencer - 10 times faster and 90 percent cheaper to run
than - current state-of-the-art
- GenoMEMS, MA spinoff that has developed a
microfabrication technology, based on Whitehead
Inst. technology - Microelectromechanical system, or MEMS,
technology microfabricated electrical and
mechanical components - Five million bases per day
- Read lengths of 800 bases
- Target release date 2003
- Still, Sanger chemistry-based fluorescent
- TODAY (2005) Solexa, 454, etc. Looking for
1,000 genome- 100,000 genome - Archon X Prize for Genomics (X Prize Foundation
10/4/06) - 10 M prize for the first team to successfully
sequence 100 human genomes in 10 days with
accuracy lt 1 per 100,000 bases at a recurring
cost of no more than 10,000 per genome
94Signal capture, signal-noise, resolution,
de-convolution, sequence data assembly
95Sequencing artifacts (Difficult templates vs
signal/noise)
- Hardware remedy
- Gel length, thickness
- Gel composition
- Bioware remedy
- Primer
- Vector
- Polymerase
- Reagents and additives
- Radioisotope
- Reaction time
- Reaction temperature
- Modify physical conditions
- Run time
- Temperature
- Film exposure conditions
- Some are unavoidable
- GC-rich
- Repetitive sequences
96Sequencing artifacts
- http//www.nshtvn.org/ebook/molbio/Current20Proto
cols/CPMB/mb0704a.pdf.
97Sequencing artifacts
- http//www.nshtvn.org/ebook/molbio/Current20Proto
cols/CPMB/mb0704a.pdf.
98 Good data/bad data
- High quality
- Good spacing
- Good heights
- Symmetrical peaks
- No or low background
99 Good data/bad data
- Good quality
- Good spacing
- Good heights
- Symmetrical peaks
- Low but more background
100 Good data/bad data
- Poor quality (physical)
- Poor spacing
- Poor heights
- Asymmetrical peaks
- More background?
101 Good data/bad data
- Sudden drop
- Template folding (chemistry)
- (local sequence)
102 Good data/bad data
- Sudden drop
- Template folding
- (local sequence)
- Resolution through better chemistry,
biochemistry, - instrumentation, conditions
103 Good data/bad data
- Stutters (biochemistry)
- Template folding or
- GC-rich or polyN runs
- (local sequence)
104Local sequence and effects(not all sequences
look/act alike)
- Resolution through better chemistry,
biochemistry, - instrumentation, conditions
105Difficult templates
- Resolution through better chemistry,
biochemistry, - instrumentation, conditions
106Difficult templates
- Resolution through better chemistry,
biochemistry, - instrumentation, conditions
107Difficult templates versus Real data
microsatellites
108Difficult templates versus Real data SNP
109Difficult templates versus Real data
Coinfections of viruses
- Interpretation
- Experience
- Keep sending out for re-sequencing due to
- contamination of reaction
110Difficult templates versus Real data SNP (GTHR)
111Difficult templates versus Real data SNP
F Umehara, et al. AmJHumGenet. Nov 00 Desert
hedgehog mutation, patient with 46,XY Yp to
Xp Male phenotype, female karyotype Partial
gonadal dysgenesis (PDG) with polyneuropathy CGD
Swyers syndrome Sex reversal in XY
female Premature female genitalia, blinded vagina
and immature uterus plus Testis on one side and a
streak gonad on other Homozygous missense
ATG-gtACG at initiating Met of exon 1 DHH gene
112DNA Sequencing Applications heterogeneity
- Unbiased molecular genomic/genetic diagnostics
- Cystic Fibrosis
- 24 most common mutations, screening 43,849
chromosomes - 66 at one site
- Of remaining 23 mutations, next highest number is
2.4 at one site - Ranging to 0.1, accounting for 10 of 24 sites
- Generalized Thyroid Hormone Resistance (GTHR)
- Dominant negative mutation
- ADHD?
113Applications molecular diagnostics(localized
mutations)
114- Wiki
- http//dnasequencing.wordpress.com/2007/10/26/chai
n-termination-methods/
115DNA sequencing Photochemistry
Fluorescence-based labels as alternatives(UV,
IR, etc)
116Optimization of dyes
117ABI 370s-series screen dump
118Bioinformatics part one pixel refinement, lane
bleeding
119ABI 377 envelope 96 lanes
120DNA sequencing Computation
- Input from sequencer
- peak intensities
- normalize intensities
- apply mobility corrections
- predict bands
- call bases
- Output to user
- DNA sequence
121Base-calling issues
122ABI 377 data
123To get to nice data-gtSignal de-convolution and
processing
124DNA sequencing Computation
125Signal de-convolution and processing
126Base-calling and matrix issues
- POP 6 vs POP 4 misapplication of resin, 50 cm vs
80 cm capillaries. - Base-calling issues.
- SNP issues.
127Post-processing raw data
- One fragment, two fragments, three
- Now, have handful plus fragments
- Now what?
128Assembling sequence data
129DNA sequence assembly Software
- GCG (Wisconsin Pkg/ Genetics Computer Group)
- DNAstar
- GAP4 (Genome Assembly Program)/ Staden Pkg
- ABI versions
- Phred/Phrap
- DNA Sequencher
- 2008 more?
130Assemblers, a snapshotGAP4
131Quality issues in base-calling
- Base-calling software, with quality scores
- Phred
- TraceTuner
- QV -10logP
- ex., if want 1/1000 error (0.1)
- QV 30 (-10)x(-3)
- lt10 score means base-calling error rate of 10.
- 20 score is considered good, at 1.
- gt30 score is considered excellent, at lt0.1.
- Bermuda Stds, 1/10,000 GenBank now 2006
132Assemblers, a snapshot Phred/Phrap QA/QC
133Assemblers, a snapshot Sequencher
- Mac and icon-based
- final screen
134DNA sequence assembly Assembly of fragments
135Ad 1 assembly Collection of fragments
136Ad 1 assembly at 98
137Done! Consensus
- Joined contigs, no orphans/islands
138DNA sequence assembly Editing
139Sequence assembly Overlapping fragments for
contigs
- x-Fold redundancy for accuracy
140Sequence assembly 21 rule
- For accuracy, local seq considerations
141The destination or the journey?
to be continued (preview)
142Viruses Rule the Deep Sea (The Butterfly
Effect)
- Phrase for the more technical sensitive
dependence on initial conditions in chaos theory - Edward Lorenz, 1917-2008
- 1963 NYAcadSci paper as a shortcut for a
computer modeling weather prediction, - used 0.506 instead of 0.506127 - completely
different weather scenario
- Viruses good, viruses bad?
- Proposed earlier a Jekyll and Hyde role,
killing biomass and sustaining it - Viruses and significance in marine systems 15-20
years old - Proof Nature Aug 28, 2008. RDanovaro, et al.
- Viruses in the deepest ocean environments are
strong regulators - of the deep sea biosphere
- Infecting and killing bacteria and other
prokaryotes - Main producers of organic material that sustains
life at 1,000 meters - Viruses are by far the most abundant life form
in the ocean, this study - Generating biomass, as major contribution to
carbon cycle and other geochemical processes - Virus-induced deaths as 80 of bacterial deaths
- Very large amount of carbon reaching sea floor
through pathways that were thought to be minor - 232 samples of sediment from deep sea
- Viruses surprisingly abundant and reproducing
locally rather than migrating from surface - 65 of earth is dominated by deep sea or
benthic ecosystems - Estimate 0.37- 0.63 gigatons of C per year
oceans absorb billions of tons of atmospheric CO2
per year - Viral shunt killed organism eaten by another
- The Scientist, Aug 08
- http//www.terradaily.com/reports/Viruses_are_hidd
en_drivers_of_oceans_nutrient_cycle_999.html
143Human genome, follow-up (The past is never dead.
Its not even past)
- Aug08 15 years of The Human Genome Project
- 8 of human genome comprises cryptic viral
genomes (06 Aug08, molec fingerprint of inactiv) - molecular equivalents of mounted trophies
insects preserved in genomics amber DNA
fossils - Human endogenous retroviruses (HERVs) during 550M
years of vertebrate evolution - HERVs attack germline cells, become integrated
into genome
- http//www.washingtonpost.com/wp-dyn/content/artic
le/2008/08/31/AR2008083101759.html
144Human genome, follow-up (The past is never dead.
Its not even past)
- Unlike HIV, HERV outlive infected organism
endogenous - Best-preserved HERV-K113 ca 200,000 years ago,
long after human and chimp divergence - Parts of a few have become incorporated into
human genes, taking on new roles - Proteins helped mold the immune system
- Syncytin, protein that helps cells fuse together
in placenta from envelope gene from a HERV - In past two years, labs in France and US
independently reconstructed a functioning HERV-K
from pieces - in the human genome PDBieiasz et al
- This summer 08, both showed the gene sequences
similar fingerprints of APOBEC3, human enzyme
that - mutated them into submission
- http//www.washingtonpost.com/wp-dyn/content/artic
le/2008/08/31/AR2008083101759.html
145Human genome, follow-up (The past is never dead.
Its not even past)
- HERV as junk DNA? eg served no function but
remnants of past infections - MDewanneiux, et al. Genome Research Oct06
THeidmann - Reconstructed an infectious version that
incorporated into genome 5M years ago - Named Phoenix as ancestral ? version of
HERV-K - HERV-K is a young virus lt5M years and contains
complete set of of genes - Proposed roles control gene expression, found
near genes immune system and disease - and linked to cancers and male
infertility (NBannert and Rkurth PNAS 04) - (also, sheep pregnancy and placenta development
(MPalmarini, TSpencer, et al PNAS 06)
- Originally from Genome Research and via
http//www.washingtonpost.com/wp-dyn/content/artic
le/2008/08/31/AR2008083101759.html
146Old Viruses Resurrected Through DNA (ELSI)
- NYT Nov 06
- Reconstruction of extinct lost viruses
- 02 Chem-synthesized polio genome (Cello Sci
02) - 05 US govt scientists reconstructed 1918
influenza - virus genome
- 06 French scientists reconstructed virus that
- infected primate ancestors Phoenix virus
- THeidmann, et al (Gen Research)
- Built and Re-inserted into human cells,
- (some) Infectious particles out
- Plan to study HERV role in cancer
- Alternate view brought back to life
- dont know what this class of viruses do..
Its a - dangerous thing, and a potent biological
weapon. - Systematic crippling of all future
generations
- http//www.nytimes.com/2006/11/07/science/07virus.
- Photo http//elliottback.com/wp/archives/2006/11
/08/the-phoenix-virus-resurrected-rna-retroviruses
/
147Human genome, follow-up (The past is never dead.
Its not even past)
- HERVs attack germline cells, become integrated
into genome - Parts of a few have become incorporated into
human genes, taking on new roles - Syncytin, protein that helps cells fuse together
in placenta - From envelope gene from a HERV
- Jan08, tissue from women with preeclampsia or
intrauterine growth restriction- threaten fetal
health - - had abnormally amts of syncytin
- Proteins derived from HERV genes, or antibodies
against these proteins, are common in - testicular tumors, breast cancer tissue and
melanomas - Does HERV cause cancer or is an effect of, or
both or neither ??? - Mice, chicken Remnant retrovirus env proteins
equivalent proteins are made and attach to/block - receptors that are used by retroviruses for
binding - Sheep lung or nasal tumors caused by
retroviruses ancestors into genome before
sheep/goat divergence - 5M years ago
- http//www.washingtonpost.com/wp-dyn/content/artic
le/2008/08/31/AR2008083101759.html
148Human genome, follow-up (The past is never dead.
Its not even past)
- Jaagsiekte sheep retrovirus causes contagious
lung cancer in sheep, ovine pulmonary
adenocarcinoma - Retroviruses have played a critical role in
understanding oncogenes - Distinct from classical mechanisms of retroviral
oncogenesis by insertional activation of - or virus capture of host oncogene, native
envelope (Env) structural protein is itself the
oncogene - MPalmarini, et al wild species had versions of
two retroviruses differing from domesticated
versions - The domesticated versions have mutation that
impedes infection by cancer-causing viruses - Argue domestication of wild sheep 9,000 years
ago, with cancer-causing virus, - selected for mutant non-cancerous
- MDeLasHerasJMSharp. Eur Resp J Dec01
- Evidence for a protein related immunologically to
the JSRV in some human lung tumors - Review MPalmarini and HFan. JNCI 01
- Review S-LLie and ADMiller, http//www.nature.co
m/onc/journal/v26/n6/abs/1209850a.html
149miniEvolution of a protein syncytin (Fear
of the unknown)
- ASmallwood, et al. BioOne Maternally imprinted
PEG10 and SGCE, separated from Syncytin (HERV-W)
gene - at 7q21.3, are implicated in
choriocarcinoma and Silver-Russell syndrome - AMalassine.Theidmann. Placenta 07 Expression
of human endogenous retrovirus HERV-FRD encodes
fusogenic - envelope proteins (syncytin2)
observed in human placenta - AMuirAMoffett. JGV 06 Human endogenous
retrovirus-W envelope (syncytin) expressed in
trophoblast - Placenta is unique amongst normal tissues in
transcribing numerous different - human endogenous retroviruses at high levels
- Syncytin expressed widely in normal cells as well
as choriocarinoma cell lines - HERVs arose from ancient germ-cell infections by
exogenous retroviruses - Most HERVs inactivated due to accumulated
mutations small number of HERV genes retained
ORFS - IKnerrWrascher Mol Hum Reprod Jun04
placental syncytin first described in 2000 as a
fusogenic glycoprotein - derived from a human endogenous
retroviral envelope gene - Stable integrated retroviral elements within
human genome known for many years, biological
significance obscure, - usually designated as irrelevant or even
harmful - Syncytin, however, demonstrates tissue-specific
expression and distinctive receptor interaction
during trophoblast cell - differentiation and syncytium formation
- http//www.washingtonpost.com/wp-dyn/content/artic
le/2008/08/31/AR2008083101759.html
150(No Transcript)
151PCR linear amplification DNA sequencing
- http//www3.appliedbiosystems.com/cms/groups/porta
l/documents/web_content/cms_051956.gif