Title: Max Bachour
1Jessica Chen
2- Shotgun or 454 sequencing
- High throughput sequencing technique that can
collect a large amount of data at a fast rate. - Works by partially digesting a genome or big
strand of DNA into small overlapping fragments - These small fragments are sequenced and fragments
that overlap are matched together.
3Steps Behind 454 sequencing
- The genome is fragmented and the fragments are
denatured. - Fragments are amplified and assigned to beads.
One fragment per one microbead. - Each bead is placed in the wells of a fiber optic
slide. - Packing beads placed in all the wells.
4Steps Behind 454 sequencing
- Solution of one nucleoside is flooded onto tray.
- If base added is next in the sequence, it will be
added to the single stranded DNA on the bead. - When a nucleoside is added to DNA, 2 phosphates
are given out - Enzymes in packing beads convert phosphate groups
to ATP and then the ATP to light energy.
5Steps Behind 454 sequencing
- Computer and camera detect light in a certain
well as a certain base is added to the tray. - Base is washed off and process is repeated with
another base. - End product is large amount of fragments
sequenced.
6Genome Sequence Analysis
- ?Contig Assembly
- ?Identifying open reading frames (ORF) using gene
prediction programs
7What is the initial problem with assembly?
Sequenced fragmented DNA
CONTIG 1
CONTIG 2
Incorrectly Assembled DNA Sequence
8How is this problem solved?
Sequenced fragmented DNA
Masked DNA Sequence
Assembled DNA Sequence
CONTIG 3
CONTIG 1
CONTIG 5
CONTIG 4
CONTIG 2
9How do we identify genes?
- Use gene prediction programs (Fgenesh, Genscan,
Genemark) to determine potential genes also
determine any repeat sequences - Enter contig
- Which of the predicted genes are most likely
existing genes? - ? Use BLAST
10How do we use BLAST?
- ? tblastn all predicted genes against an EST
database (ESTDB) - Why ESTDB? record of all known/identified mRNA
(cDNA library) - Why tblastn? -- amino acid sequence more likely
to be conserved - ? use blastn and blastp
- -blastp determine expression of gene
11Analyzing BLAST data
Gene 1
Protein sequence MFVVQYLGSSRSWTSCSHSSKPGVDSRGRAEPHLAVGRSSLLGRVQTGLKGGGMKDSDLT
GDSSLARANQSMGICKSEGTVDRRLKSQVSQLLLGLLLIRLEGLLATCMTGPHGDAGAGS
THK
gtgbFC457105.1 UCRVU04_CCNI646_g1 Cowpea 524B Mixed Tissue and Conditions cDNA
Library UCRVU04-1 Vigna unguiculata cDNA clone CCNI646, mRNA
sequence.
Length807
Score 215 bits (548), Expect(2) 2e-55, Method Compositional matrix adjust.
Identities 110/112 (98), Positives 110/112 (98), Gaps 0/112 (0)
Frame -1
Query 12 SWTSCSHSSKPGVDSRGRAEPHLAVGRSSLLGRVQTGLKGGGMKDSDLTGDSSLARANQS 71
SWTSCSHS KPGVDSRGRAEPHLAVGRSSLLGRVQTGLKGGGMKDSDLTGDSSLARANQS
Sbjct 438 SWTSCSHSKPGVDSRGRAEPHLAVGRSSLLGRVQTGLKGGGMKDSDLTGDSSLARANQS 259
Query 72 MGICKSEGTVDRRLKSQVSQLLLGLLLIRLEGLLATCMTGPHGDAGAGSTHK 123
MGICK EGTVDRRLKSQVSQLLLGLLLIRLEGLLATCMTGPHGDAGAGSTHK
Sbjct 258 MGICKEGTVDRRLKSQVSQLLLGLLLIRLEGLLATCMTGPHGDAGAGSTHK 103
- Critical data
- e-value
- match
- EST source
12Advantages and Disadvantages
- Fast sequencing at a high volume
- Cheap compared to other methods
- Much higher coverage protection
- Repetitive sequences can disrupt computer program
into thinking that unrelated sequences are in
fact connected. - More prone to error and missing sequences
13Drastically changed genomics in a very short
amount of time