What data - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

What data

Description:

not dependent on acceleration of sequencing capacity, not dependent on advanced ... Stitch, then sequence. Structural Genomic Strategies #2. Shotgun Approach: ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 29
Provided by: Jeffer74
Category:
Tags: data | stitch

less

Transcript and Presenter's Notes

Title: What data


1
What data?
  • Today Structural Genomics
  • Friday Data Base Resources

2
Structural Genomics
  • Characterizing and locating the entire set of
    genes in a genome.

3
Structural Genomic Strategies 1
  • Ordered Approach
  • Order clones along the genome, then sequence,
  • not dependent on acceleration of sequencing
    capacity,
  • not dependent on advanced computer analysis,
  • not dependent on as-of-yet sequencing
    technologies.
  • heavy up-front demand for human labor.

4
Sequence Ready Ordered Approach
Stitch, then sequence.
5
Structural Genomic Strategies 2
  • Shotgun Approach
  • Sequence first, then order,
  • dependent on advances in computer analysis and
    sequencing technologies,
  • dependent on automated labor.

6
Genomes...T.A. Brown
  • The big question is whether the 70 million
    sequences could be assembled correctly...
  • ...if the conventional shotgun approach is used,
    which makes no reference to a genome map, then
    the answer is certainly no.

Wrong, the answer is Yes, but the truth is,
with a map the sequence is better.
7
First the Genome(s) Celera
  • 5 individuals
  • two males / three females
  • one African American
  • one Asian Chinese
  • one Hispanic Mexican
  • two Caucasians
  • take tissue and sperm samples, immortilize,
  • extract DNA,
  • shred and package in vectors.

8
Bacterial Artificial ChromosomesBACs
  • F plasmid ancestry,
  • maintain bacterial replication system and copy
    number control system.

9
Science 291 (5507), 1304-1351
8, September 1999 - 25, June 2000
10
Single Strand PCR
dNTPs
5 - ATACATACTACTAACTAACTAA - 3
3 - TATGTATGATGATTGATTGATT
- 5
Template
1 Primer
Taq Polymerase w/ Buffer
Cycles

Polymerization until Taq falls off, linear
amplification.
11
Cycle SequencingChain Termination
ddNTPs
dNTPs
5 - ATACATAC - 3
3 - TATGTATGATGATTGATTGATT - 5
Template
1 Primer
Taq Polymerase w/ Buffer
Cycles

Polymerization until Taq hits ddNTP, linear
amplification.
12
Fluorescent ddNTPs
13
(No Transcript)
14
ABI 3700
  • Automated,
  • Capillary Action,
  • 15 minutes a day maintenance,
  • 65 full-time staff.

15
Systems Biology
16
Mate Pairs
  • BAC End Sequencing,
  • sequence both ends of the BAC using vector
    derived primers.

17
Science 291 (5507), 1304-1351
8, September 1999 - 25, June 2000
18
(No Transcript)
19
Whole Genome Assembly
20
WGA
  • 1. Screener
  • 2. Overlapper
  • 3. Unitigger,
  • 4. Scaffolder,
  • 5. Repeat Resolver.

21
Screener
  • ...finds and masks microsatellite repeats,
    known repeat regions and ribosomal DNA,
  • marks the rest for overlapping.

22
Overlapper
  • ...looks for end-to end overlaps of at least 40
    bp with no more than 6 differences in match.

Whats the significance?
...a one in 1017 event.
23
But(t)!
  • ...the Screener doesnt include all of the low
    frequency level repeats,
  • ...so, a majority of the Overlapper outputs are
    bogus.

24
What Now?
  • ...some uniquely assembled contigs (unitigs) are
    readily identifiable,
  • all of the assembled sequences match over all of
    the known sequence,

- and -
  • ...are consistent with an 8x coverage,
  • over-collapsed assemblies are identified and
    broken down into unitigs when possible.

25
Scaffolder
  • ...contigs the contigs,
  • uses mate-pair information.

26
Repeat Resolver
  • ...most of the remaining gaps were due to repeats.

...91 sequence, 9 gaps,
  • Gaps,
  • average 2.43 kb,
  • over 50 lt 500 bp,
  • over 62 lt 1 kb,
  • no gap larger than 100 kb.

27
Scaffolds
28
Friday
  • Databases,
  • Database Mining.
Write a Comment
User Comments (0)
About PowerShow.com