Title: Today
1Today
- Please read
- Science 291 1304-1315
2Map First then sequence
Sequence First then map
3Project Comparisons(NTY 10/3/2002)
- Decoding the genome of Plasmodium falciparum, the
most dangerous of the four single-cell parasites
that cause malaria, took six years and cost about
20 million, paid for by the Wellcome Trust of
London, the National Institutes of Health in
Bethesda, Md., and other sources. Dr. Malcolm J.
Gardner of the Institute for Genomic Research in
Rockville, Md., led a large team of scientists
there and at the Sanger Centre near Cambridge in
England. Completion of the falciparum genome was
first announced at a conference in Las Vegas in
February. - The genome of Anopheles gambiae, the primary
carrier of the parasite, was begun more recently
and took a mere 15 months even though its genome
is far larger some 278 million units of DNA
encoding 14,000 genes compared with the
parasite's 23 million units of DNA and 5,268
genes. The mosquito team was led by Dr. Robert A.
Holt of Celera Genomics in Rockville. The 14
million cost was born by the National Institutes
of Health, by Genoscope in France and other
sources.
Hybrid
WGA
4Human Genom Project DissentersMy Brush with
Greatness?
- 1992 Two years into the HGP, two of the projects
biggest critics were - Sydney Brenner believed that the HGP should
focus on human EST collections, and sequence the
genome of a simple vertebrate (Fugu). - Craig Venter believed that the clone-by-clone
approach was not the most efficient way to
proceed, suggested that shotgun approaches, and
even believed a whole genome approach was
feasible.
they were both right.
5Sydney Brenner
- 2002 Nobel Prize (Medicine/Physiology)
- Sydney Brenner and John E. Sulston, Britain
- H. Robert Horvitz, United States
- for discoveries concerning how genes regulate
organ development and a process of programmed
cell death.
Dr. Carol Trents Ph.D. Advisor! Dr. Trents
work is a significant part of the body of
research that warranted the prize.
6Expressed Sequence TagsESTs
Brenner was right.
- End sequenced cDNAs
- (complementary DNA)
- cDNA synthetic DNA transcribed from a mRNA
template, - through the action of an RNA dependant DNA
polymerase called reverse transcriptase.
Online Primer est.html
7- Still Sequencing cDNAs,
- first and easiest look into any genome,
- useful in understanding genomic sequence (gene
finding), - helps determine splice site variants,
- shorter than genomic clones, fits in plasmids,
- etc.
8tissue specific ESTs are very useful.
9Whole Genome Assembly
Venter was right.
- 1995 1.8 Mbp Haemophilus influenza genome
sequenced, - 1996 - on Mycoplasma, E. coli and others,
- 1999 Chromosome 2 of Arabidopsis,
- 2000 Drosophila (120 Mbp) genome,
- Human, Mosquito, etc
- Lots of genomes, several applications...
WGA of bacterial, viral populations...
10(No Transcript)
11- 1 year, 120 megabases,
- Assembly algorithms could generate accurate
genomic sequences, - Interim assemblies (or mapping) were not
necessary.
24 MARCH 2000 VOL 287 SCIENCE
12Big Biology
13Think About This
- the plasmid library construction is the first
critical step in shotgun sequencing, - if the DNA libraries are not uniform in size,
non-chimeric, and do not randomly represent the
genome, then the subsequent steps cannot
accurately reconstruct the genome sequence. - We used automated high-throughput DNA sequencing
and the computational infrastructure to enable
efficient tracking of enormous amounts of
sequence information (27.3 million sequence
reads 14.9 billion bp of sequence).
14Whos DNA?
- 21 enrolled donors,
- age, sex, ethnographic group,
- one African-American,
- one Asian-Chinese,
- one Hispanic-Mexican,
- two Caucasions.
15Whos Mostly?
16(No Transcript)
17back to humans
Individuals, Libraries,
Sequence coverage, Clone coverage, Other?
What to know?
543 bp average sequence read
8, September 1999 - 25, June 2000
18(No Transcript)
19WGA Outline
20Sequence Tagged Sites STS and Mapping
Shear chromosome, lots of different times
Mapping Reagent
PCR Primer Pairs are tested for co-function.
Frequency of co-function is proportional to
linkage.
21Whole Genome Assembly
- 1. Screener
- 2. Overlapper
- 3. Unitigger/Discriminator,
- s
- 4. Scaffolder,
- 5. Repeat Resolver.
22Screener
- ...finds and masks microsatellite repeats,
known repeated regions and ribosomal DNA, - masked regions not used to make contigs,
- marks the rest for overlapping.
23Overlapper
- ...looks for end-to end overlaps of at least 40
bp with no more than 6 differences in match,
Whats the significance?
...a one in 1017 event.
24Good News
- ... uniquely assembled contigs (unitigs) are
readily identifiable, - all of the assembled sequences match over all of
the known sequence,
- and -
...are consistent with an 8x sequence coverage.
25Unitigs
But(t)
...the Screener doesnt include all of the low
frequency level repeats, ...so, a majority of
the Overlapper outputs turned out to be bogus.
26What Now?
- over-collapsed assemblies are identified and
broken down into unitigs when possible... - these too-large contig sets are sent to the
Unitigger/Discriminator.
27Unitigger...differentiates between a true
overlap, and an overlap that includes more than
one loci.
28Discriminator
29Discriminator
...may yield u-unitigs.
Unitigger/Discriminator Output correctly
assembled contigs covering 73.6 of the genome.
30Scaffolder
- ...contigs the contigs,
- uses mate-pair information, two or more
consistent mate-pair matches yields 1 in 1010
odds of being chance.
31Repeat Resolver ...most of the remaining gaps
were due to repeats.
Rocks Use low Discriminator Value contig
sets to fill gaps, - find two or more mate
pairs with unambiguous matches in the scaffold
near the gap (2 kb, 10kb or 50 kb), (1 in
107), Stones - find mate pair matches 2 kb,
10 kb, and 50 kb from gap, place the mate in the
gap, check to see if its consistent with other
placed sequences.
32If that Doesnt Work
- ...find a mate-pair that spans the gap, and
sequence it,
Chromosome Walking
33Weds.
- Questions about WGA,
- CSA,
- Comparisons,
- Quality Control, etc.