artwork: commons.wikimedia.org - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

artwork: commons.wikimedia.org

Description:

... from mixtures Haplotyping Simulated SMRT Sequencing Data Platform Comparisons Other Technologies Mass spectrometry TEM ... Read Alignment FASTQ Format ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 69
Provided by: Rober336
Learn more at: https://www.cs.usfca.edu
Category:

less

Transcript and Presenter's Notes

Title: artwork: commons.wikimedia.org


1
Biological Sequence Determination
protein
RNA
DNA
Robert M. Horton, PhD, MS rmhorton_at_cybertory.org
artwork commons.wikimedia.org
2
(No Transcript)
3
Sequencing
context
  • protein
  • RNA
  • DNA
  • old methods

technological biological
concepts
  • classical sequencing (Sanger)
  • automation, base calling, quality scoring
  • shotgun sequencing, assembly, finishing

chemistry, enzymes physics, computers
contemporary
  • "next generation"
  • methods pyrosequencing, CRT, SOLiD
  • applications resequencing, epigenetics, RNA-Seq

microfluidics microfabrication
contemplation
  • third generation
  • SMRT, nanopores, etc.

of the future
4
Protein Sequencing
Why Proteins?
Digestible (pepsin, trypsin, chymotrypsin) Impor
tant
Small Chemically distinguishable (purifyable)
Insulin Fred Sanger Nobel prize, 1958
5
Classes of RNA
  • mRNA
  • modified bases ( cap with m7G , 2'-O-methylation
    )
  • splicing
  • polyadenylation
  • tRNA
  • modified bases (GMe, GMe2, CMe, T, ?, UH2, I,
    IMe)
  • rRNA
  • prokaryotic 70S 50S (5S, 23S) 30S (16S)
  • eukaryotic 80S 60S (5S, 5.8S, 28S) 40S (18S)
  • 7SL
  • RNA of Signal Recognition Particle (SRP)
  • homologous to Alu SINE (11 of human genome)
  • snRNA
  • splicosomes (U1, U2, U4, U5, U6)
  • snoRNA
  • pre-rRNA processing (U3)
  • guide 2'-O-methylation
  • guide pseudouridylation
  • RNAi
  • siRNA (short interfering RNA)
  • miRNA (microRNA)
  • post-transcriptional gene silencing
  • 3' UTR, conserved
  • piRNA
  • transcriptional silencing of retrotransposons

... et cetera ...
6
DNA Sequencing
  • 1977
  • The modern era of DNA sequencing begins

7
Chemical Sequencing of DNA (Maxam-Gilbert)
February 1977
Two steps Damage bases specific,
partial Cleave backbone
Four reactions A AG C CT
http//nobelprize.org/nobel_prizes/chemistry/laure
ates/1980/gilbert-lecture.pdf
8
(Sanger Sequencing)
Chain Termination Sequencing
2',3'-dideoxy TTP
  • Sanger F, Nicklen S Coulson AR
  • DNA sequencing with chain-terminating inhibitors
  • PNAS 745463-7, December 1977

9
Primer Extension
Bacterial DNA polymerase I adds nucleotides to
the 3' end of primer to complement 5'
-overhanging template.
Each strand is an ordered sequence with a
direction.
Arrows indicate 5' to 3' direction (DNA grows
biochemically in this direction).
(pyrophosphate released)
10
Sanger sequencing
Individual reactions with one dNTP partially
poisoned with dideoxynucleotides (ddATP, ddCTP,
ddGTP, ddTTP)
  • Decades of improvements
  • automated
  • fluorescence
  • four colors
  • one lane
  • dye terminators
  • one reaction
  • capillaries

11
Automated Sanger sequencing
trace base calls
quality scores
12
Quality Score
q -10 log1 0(p)
p predicted error probability 1/1000
probability of error q score of 30
uses data quality monitoring assembly
consensus finishing criteria
13
Sequencing Strategy
Primer walking (serial)
Shotgun Sequencing (parallel)
14
Universal Primers
15
Assembly
16
read length affects assembly
17
Next-Generation Sequencing
  • Pyrosequencing (454/Roche)
  • Cycles of Reversible Termination
    (Solexa/Illumina)
  • Ligation (ABI SOLiD)

"Third-Generation" Sequencing
  • SMRT (Pacific Biosciences)

18
pyrosequencing

pyrophosphate
APS
adenosine 5-phosphosulfate
(released by dNTP incorporation)
ATP sulfurylase

sulfate
ATP
19
pyrosequencing
O


2
luciferin
oxygen
ATP
firefly luciferase
light



oxyluciferin
AMP
pyrophosphate
20
pyrosequencing
more biochemistry
problem
solution
apyrase breaks down ATP to AMP 2 Pi (or wash
out solution)
pyrophosphate recycling
use an analog suitable for polymerase but not
luciferase
luciferase can use dATP
21
pyrosequencing
flowgram
Ronaghi M. Genome Res 113-11, 2001
22
Emulsion PCR
water droplet in oil
one primer bound to solid bead
individual template molecule
23
Emulsion PCR
DNA anchored to bead all comes from the same
template molecule
"polony" "PCR colony"
24
pyrosequencing
Alternatives to chemiluminescence
  • heat (thermosequencing)
  • pH change ("Ion Torrent")

25
Cycles of Reversible Termination
  • Illumina/Solexa
  • Helicos

Helicos
Illumina
Illumina
Metzker M. Sequencing Technologies - The Next
Generation. Nature Reviews Genetics 1131-46,
2010.
26
Short Read Alignment
27
FASTQ Format
maq.sourceforge.net/fastq.shtml
q chr((Qlt93? Q 93) 33) Q ord(q) -
33
0
60
!"'(),-./0123456789ltgt?_at_ABCDEFGHIJKLMNOPQR
STUVWXYZ\
28
Paired End Tags
Mme I
TCCRAC (20/18)
29
Illumina Genome Analyzer
  • Library Preparation

30
Illumina Genome Analyzer
  • Bridge Amplification forms "Polonies"

31
Illumina Genome Analyzer
  • Cycles of Reversible Termination

32
Ligation-based Sequencing
  • SOLiD (ABI)
  • Complete Genomics
  • Polonator (Church Lab)

33
SOLiD
Sequencing by Oligonucleotide Ligation and
Detection
3'- ATNNNZZZ-5'
artwork is from the pamphlet Dibase Sequencing
and Color Space Analysis
34
(No Transcript)
35
(No Transcript)
36
SOLiD
37
SOLiD Dibase Encoding
AT CG GC TA
AT CG GC TA
AC CA GT TG
AA CC GG TT
AG CT GA TC
38
SOLiD Dibase Encoding
color space
base space
Each color sequence can represent four different
base sequences. The base sequence is one unit
longer than the color sequence. You need to know
one base to tell which sequence is represented.
39
SOLiD Dibase Encoding
single color change is probably an error
SNP causes two color changes
40
Single-Molecule, Real-Time (SMRT) Sequencing
  • High throughput
  • Parallelism (small reactions)
  • Speed (immediate results)
  • Long reads
  • Read individual templates from mixtures
  • Haplotyping

41
SMRT Sequencing
42
Simulated SMRT Sequencing Data
43
Platform Comparisons
Xu M, Fujita D, and Hanagata N. Perspectives and
Challenges of Emerging Single-Molecule DNA
Sequencing Technologies. Small 5(23)26382649,
2009
44
Other Technologies
  • Mass spectrometry
  • TEM
  • STM
  • nanonozzle probes
  • nanopores (protein, graphene)
  • ionic current blockage
  • transverse tunneling currents
  • exonuclease

45
Targeted Exome Capture
nimblegen.com
46
Bonus Slides
47
Selenocysteine tRNA
48
Omics
  • transcriptome
  • exome
  • kinome

49
Plus and Minus Method
(circa 1975)
"minus" polymerase stops at missing
base "plus" T4 DNA polymerase 3' exonuclease
stalled by dNTP
Sanger F, Coulson AR. J Mol Biol. 94(3)441-8,
1975
50
pyrosequencing
Animation http//www.pyrosequencing.com/DynPage.a
spx?id7454
51
Bioinformatics Classics
  • Needleman SB, Wunsch CD. A general method
    applicable to the search for similarities in the
    amino acid sequence of two proteins. J Mol Biol
    48443-453, 1970.
  • Smith TF, Waterman MS. Identification of common
    molecular subsequences. J Mol Biol 147195-197,
    1981.

52
Automated Base Calling
  • 1. identify idealized peak locations
  • assume locally even spacing
  • 2. find observed peaks
  • 3. match observed to expected
  • omit and split as necessary
  • 4. add "good" unmatched peaks

53
Error Probabilities
  • predictive
  • does not require knowing actual sequence
  • valid
  • the set of bases assigned to probability p should
    have an actual error rate of p
  • discriminating
  • helps to distinguish correct vs. incorrect base
    calls
  • 1,000,000 base calls with 1,0000 errors (p
    0.01)
  • better if we can break it into two 500,000 sets
  • p0.018 in one set (9000 errors)
  • p0.002 in second set (1000 errors)

54
Error Probability Calibration
'Given a set of parameters and a training set of
reads for which it is known which base-calls are
correct and which are errors, find a way of
associating parameter values to error
probabilities that has (near) maximum
discrimination power for small r.'
55
Phred Quality Score Parameters
Empirical.
Small values tend to correspond to more accurate
base-calls.
Window-based parameters smooth out error
probabilities.
  • Peak spacing (7 peak window)
  • largest / smallest peak-to-peak spacing
  • Uncalled/called ratio (7 peak window)
  • amplitude of largest uncalled / smallest called
    peak
  • Uncalled/called ratio (3 peak window)
  • Peak resolution
  • -1 bases to the next unresolved base

56
Lookup Table Production
  • Select a range of 50 threshold values for each of
    the 4 parameters.
  • These 50 values are chosen so that each increment
    contains approximately the same number of bases
    in the training set.
  • For each 4-tuple of parameter thresholds
    (5046,250,000)
  • find the set of bases defined by these thresholds
  • compute empirical error rates
  • The parameter set with the lowest error rate goes
    into the table.
  • if multiple 4-tuples give the same rate, choose
    the largest set
  • These bases are removed, and the process is
    repeated until all bases are represented in the
    table.

57
Post-translational Modification
(or co-translational)
  • acylation (at O, N, or S)
  • acetylation (acetate, CH3CO2- )
  • myristoylation (myristate, a C14 fatty acid)
  • palmitoylation (palmitate, a C16 fatty acid)
  • alkylation
  • methylation
  • isoprenylation
  • phosphorylation
  • signal transduction
  • ADP-ribosylation
  • signal transduction
  • cholera toxin
  • glycosylation (glycoproteins)
  • mucin, cellular interaction, structural
  • N-linked
  • asparagine
  • O-linked
  • serine, threonine, hydroxylysine, hydroxyproline
  • iodination
  • thyroid hormone
  • hydroxylation
  • hydroxylysine in collagen
  • covalently bound enzyme cofactors
  • FAD, biotin, etc
  • ubiquitination

... and many more
58
Wandering Spot Method
ca.1970s RNA or DNA
partial digestion 2D separation Horizontal
base composition Vertical size
This is an RNAse T1 fragment, so it ends in G
Fuke, M., and Busch, H. Nucleic Acids Res.
4339-352, 1977.
59
Enzymatic vs Chemical Partial Cleavage of RNA
Sequence-specific RNases Phy M AU A
pyrimidine-specific (CU) U2 A or AG T1
degrades after G residues V1 degrades paired
bases
Peattie DA. PNAS 761760-1764, 1979.
enzymatic
chemical
60
Modified Nucleotides in tRNA
(post-transcriptional)
  • pseudouridine (?)
  • dihydrouridine (UH2)
  • inosine (I)
  • methylinosine (IMe)
  • methyl guanine (GMe)
  • dimethylguanine(GMe2)
  • methylcytosine (Me)
  • ribothymine (T)

61
Nucleotide Ambiguity Codes
(IUPAC)
Unambiguous A, C, G, T, U 2-fold degenerate M
A or C R A or G (puRine) W A or
T (Weak) S C or G (Strong) Y C or
T (pYrimidine) K G or T
3-fold degenerate V A, C or G (not T) H A,
C or T (not G) D A, G or T (not C) B C, G
or T (not A) 4-fold degenerate X A, C, G or
T N A, C, G or T
62
Automated Base Calling
Phred third-party base caller with better
accuracy than ABI's open source(ish)
Ewing B, Hillier L, Wendl MC, Green P.
Base-Calling of Automated Sequencer Traces Using
Phred. I. Accuracy Assessment. Genome Res.
8175-185, 1998 Ewing B and Green
P. Base-Calling of Automated Sequencer Traces
Using Phred. II. Error Probabilities. Genome Res.
8186-194, 1998
63
Shotgun Sequencing
Staden R. A strategy of DNA sequencing employing
computer programs, Nucleic Acids Research 7
2601-2610, 1979
With modern fast sequencing techniques and
suitable computer programs it is now possible to
sequence whole genomes without the need of
restriction maps. This paper describes computer
programs that can be used to order both sequence
gel readings and clones. A method of coding for
uncertainties in gel readings is described. These
programs are available on request.
The whole of the DNA to be sequenced is
shotgunned into a suitable vector and cloned.
Ideally the cloned fragments would be of at least
200 bases in length. The clones are then
sequenced and the computer used to collate the
data. Collation involves searching for overlaps
in the data.
64
2D gel electrophoresis
65
cybertory.org/exercises/primerDesign
66
(No Transcript)
67
Protein Sequencing
Edman Degradation
phenylisothiocyanate
invented ca. 1950s automated ca. 1973
proceeds from N-terminus read 50-70 aa
http//en.wikipedia.org/wiki/Edman_degradation
Mass Spectrometry
Precise determination of molecular weights of
peptides
A few amino acids can ID a spot on 2D gel
68
(Sec)
modified from Wikimedia commons
Write a Comment
User Comments (0)
About PowerShow.com