Today - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Today

Description:

Map First: then sequence. Sequence First: then map. Project Comparisons ... born by the National Institutes of Health, by Genoscope in France and other sources. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 34
Provided by: jeffy8
Category:
Tags: today

less

Transcript and Presenter's Notes

Title: Today


1
Today
  • Please read
  • Science 291 1304-1315

2
Map First then sequence
Sequence First then map
3
Project Comparisons(NTY 10/3/2002)
  • Decoding the genome of Plasmodium falciparum, the
    most dangerous of the four single-cell parasites
    that cause malaria, took six years and cost about
    20 million, paid for by the Wellcome Trust of
    London, the National Institutes of Health in
    Bethesda, Md., and other sources. Dr. Malcolm J.
    Gardner of the Institute for Genomic Research in
    Rockville, Md., led a large team of scientists
    there and at the Sanger Centre near Cambridge in
    England. Completion of the falciparum genome was
    first announced at a conference in Las Vegas in
    February.
  • The genome of Anopheles gambiae, the primary
    carrier of the parasite, was begun more recently
    and took a mere 15 months even though its genome
    is far larger some 278 million units of DNA
    encoding 14,000 genes compared with the
    parasite's 23 million units of DNA and 5,268
    genes. The mosquito team was led by Dr. Robert A.
    Holt of Celera Genomics in Rockville. The 14
    million cost was born by the National Institutes
    of Health, by Genoscope in France and other
    sources.

Hybrid
WGA
4
Human Genom Project DissentersMy Brush with
Greatness?
  • 1992 Two years into the HGP, two of the projects
    biggest critics were
  • Sydney Brenner believed that the HGP should
    focus on human EST collections, and sequence the
    genome of a simple vertebrate (Fugu).
  • Craig Venter believed that the clone-by-clone
    approach was not the most efficient way to
    proceed, suggested that shotgun approaches, and
    even believed a whole genome approach was
    feasible.

they were both right.
5
Sydney Brenner
  • 2002 Nobel Prize (Medicine/Physiology)
  • Sydney Brenner and John E. Sulston, Britain
  • H. Robert Horvitz, United States
  • for discoveries concerning how genes regulate
    organ development and a process of programmed
    cell death.

Dr. Carol Trents Ph.D. Advisor! Dr. Trents
work is a significant part of the body of
research that warranted the prize.
6
Expressed Sequence TagsESTs
Brenner was right.
  • End sequenced cDNAs
  • (complementary DNA)
  • cDNA synthetic DNA transcribed from a mRNA
    template,
  • through the action of an RNA dependant DNA
    polymerase called reverse transcriptase.

Online Primer est.html
7
  • Still Sequencing cDNAs,
  • first and easiest look into any genome,
  • useful in understanding genomic sequence (gene
    finding),
  • helps determine splice site variants,
  • shorter than genomic clones, fits in plasmids,
  • etc.

8
tissue specific ESTs are very useful.
9
Whole Genome Assembly
Venter was right.
  • 1995 1.8 Mbp Haemophilus influenza genome
    sequenced,
  • 1996 - on Mycoplasma, E. coli and others,
  • 1999 Chromosome 2 of Arabidopsis,
  • 2000 Drosophila (120 Mbp) genome,
  • Human, Mosquito, etc
  • Lots of genomes, several applications...

WGA of bacterial, viral populations...
10
(No Transcript)
11
  • 1 year, 120 megabases,
  • Assembly algorithms could generate accurate
    genomic sequences,
  • Interim assemblies (or mapping) were not
    necessary.

24 MARCH 2000 VOL 287 SCIENCE
12
Big Biology
13
Think About This
  • the plasmid library construction is the first
    critical step in shotgun sequencing,
  • if the DNA libraries are not uniform in size,
    non-chimeric, and do not randomly represent the
    genome, then the subsequent steps cannot
    accurately reconstruct the genome sequence.
  • We used automated high-throughput DNA sequencing
    and the computational infrastructure to enable
    efficient tracking of enormous amounts of
    sequence information (27.3 million sequence
    reads 14.9 billion bp of sequence).

14
Whos DNA?
  • 21 enrolled donors,
  • age, sex, ethnographic group,
  • one African-American,
  • one Asian-Chinese,
  • one Hispanic-Mexican,
  • two Caucasions.

15
Whos Mostly?
16
(No Transcript)
17
back to humans
Individuals, Libraries,
Sequence coverage, Clone coverage, Other?
What to know?
543 bp average sequence read
8, September 1999 - 25, June 2000
18
(No Transcript)
19
WGA Outline
20
Sequence Tagged Sites STS and Mapping
Shear chromosome, lots of different times
Mapping Reagent
PCR Primer Pairs are tested for co-function.
Frequency of co-function is proportional to
linkage.
21
Whole Genome Assembly
  • 1. Screener
  • 2. Overlapper
  • 3. Unitigger/Discriminator,
  • s
  • 4. Scaffolder,
  • 5. Repeat Resolver.

22
Screener
  • ...finds and masks microsatellite repeats,
    known repeated regions and ribosomal DNA,
  • masked regions not used to make contigs,
  • marks the rest for overlapping.

23
Overlapper
  • ...looks for end-to end overlaps of at least 40
    bp with no more than 6 differences in match,

Whats the significance?
...a one in 1017 event.
24
Good News
  • ... uniquely assembled contigs (unitigs) are
    readily identifiable,
  • all of the assembled sequences match over all of
    the known sequence,

- and -
...are consistent with an 8x sequence coverage.
25
Unitigs
But(t)
...the Screener doesnt include all of the low
frequency level repeats, ...so, a majority of
the Overlapper outputs turned out to be bogus.
26
What Now?
  • over-collapsed assemblies are identified and
    broken down into unitigs when possible...
  • these too-large contig sets are sent to the
    Unitigger/Discriminator.

27
Unitigger...differentiates between a true
overlap, and an overlap that includes more than
one loci.
28
Discriminator
29
Discriminator
...may yield u-unitigs.
Unitigger/Discriminator Output correctly
assembled contigs covering 73.6 of the genome.
30
Scaffolder
  • ...contigs the contigs,
  • uses mate-pair information, two or more
    consistent mate-pair matches yields 1 in 1010
    odds of being chance.

31
Repeat Resolver ...most of the remaining gaps
were due to repeats.
Rocks Use low Discriminator Value contig
sets to fill gaps, - find two or more mate
pairs with unambiguous matches in the scaffold
near the gap (2 kb, 10kb or 50 kb), (1 in
107), Stones - find mate pair matches 2 kb,
10 kb, and 50 kb from gap, place the mate in the
gap, check to see if its consistent with other
placed sequences.
32
If that Doesnt Work
  • ...find a mate-pair that spans the gap, and
    sequence it,

Chromosome Walking
33
Weds.
  • Questions about WGA,
  • CSA,
  • Comparisons,
  • Quality Control, etc.
Write a Comment
User Comments (0)
About PowerShow.com