Title: Mo17 shotgun project
1Mo17 shotgun project
- Goal sequence Mo17 gene space with inexpensive
new technologies - Datasets in progress
- Four-phases of 454-FLX sequencing to max of 12X
- Include 3kb paired-end sequencing (for
short-range structural variation) - Ultra-short-read Solexa or ABI-SOLID (for
polishing) - Preparation of methyl-spanning linkers to augment
IBM map integration, detect rearrangements
(Sanger end-sequence) - (Ideally would add Mo17 BAC-ends from DuPont, if
available)
2Shotgun
- Independent of tiling path
- -Can detect non-repetitive gene space even
within otherwise complex regions that may not be
in tiling path - Disadvantages of short-reads
- -Cant expect to recover repetitive sequences
3Four Phases of Sequencing Complete in 2007
- Sequencing contract established with 454/Roche.
Four Phases, including collaborative runs at no
cost in P2-4. - Phase I underway (30 FLX runs.) Library QC and
initial assessment of data quality (30 FLX runs).
- 10 FLX runs totaling 1 Gb (0.4X)
- 20 FLX pair runs spanning 12 Gb (5X span in 3kb
inserts) - Assess quality, coverage, contamination,
chimerism, accuracy - Phase II. (80 runs plus 30 runs from Roche, total
110 runs). Rough draft stage. - 40 FLX-pair runs spanning 36 Gb (total 48 Gb10X
span) - 70 FLX runs for 7 Gb (total 8Gb 3.5X sequence)
- Assess rough draft assembly (3 methods), compare
B73, sorghum
4Phases III and IV
- Phase III (50 runs 20 contributed)
- 20 FLX-pair runs (total spanning cover 20X)
- 50 FLX runs (total 13 Gb sequence 5.5X)
- Draft assembly. Rough annnotation. Assessment
of structural - variation based on 20X clone cover.
Assessment complete by - end of 2007.
- Phase IV (60 runs 30 contributed)
- 90 FLX runs (to reach total 22 Gb 10X)
- Data collection complete by end of 2007.
- Early 08. Final assembly. Integration with MSSL
ends and IBM - map. Proceed to annotation and full analysis.
- Note Later phases may use next FLX release with
longer - read lengths. To be conservative, sequence from
FLX-pair - reads not included in sequence coverage
estimates. - Total sequencing cost for Phase I-IV 1.6M
5454-FLX reads are typically either mostly masked,
or mostly clean
29 of reads have lt quarter of positions masked
58 of reads have gt 2/3 of positions masked
0 0.5
1.0 Percent masked by over-repd
16mers
6Mo17 454 unique full length alignments vs. B73
MAGIs show high quality of unique alignments
Residual repeats in MAGIs with multiple hits in
454 data
Unique full alignments
7SNPs and indels of 454 reads relative to MAGIs
consistent with few variation of Mo17/B73
(combines variation with sequencing errors)
SNPs or indels per base
Frequency of reads
8Multiple assembly alternate plans
- Divide and conquer
- Reduce 100 million reads to 50K unique gene
- spaces of thousands of reads each (10kb) by
- clustering based on various comparisons
- Plan A De novo clustering of masked reads
- Plan B map to B73, assemble (de novo for
remainder) - Plan C sorghum-assisted
- Use various assemblers to lay-out and produce
- consensus for each cluster (454 assembly team
engaged) - Polish sequence with Solexa or SOLID for
- accuracy
- Link with MSSL pairs, integrate with map
9Backup analyses vs. B73 reference
- SNP/variation detection by alignment to B73
sequence - -454/Solexa/Solid (various successful models in
other species at JGI, elsewhere) - Structural variation detection via paired-end
placements - -Needs to be tolerant of chimerism rate
- -Model of successful human structural analysis
done with 454 (unpublished)
10Timeline
- Phase I in progress, complete by end of month.
Analysis to OK phase II 10 days. - Phase II October
- Phase III November
- Phase IV December
- 454 sequencing complete by end of year
1158 of each BAC is masked by over-represented
16-mers
12Outreach Dick McCombie
13Types of Outreach
- Public presentations
- Collaborations
- CSHL DNA Learning Center
14Public Presentations
15Collaborations
- The Maize Genetics and Genomics Database.
--Letter - for Carolyn Lawrence-MaizeGDB
- MaizeGDB-web site text, links to data
- Gramene
- EBI Ensembl
- Affymetrix Maize Pilot Expression Array Project
- Optical map
- TWINSCAN
- Vmatch
- Full-Length cDNA Project
16CSHL DNA Learning Center
http//www.dnalc.org/maize/maize.html