The Medicago truncatula genome: a progress report - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

The Medicago truncatula genome: a progress report

Description:

The Medicago truncatula genome: a progress report – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 46
Provided by: bruce83
Category:

less

Transcript and Presenter's Notes

Title: The Medicago truncatula genome: a progress report


1
The Medicago truncatula genomea progress report
Dr. Bruce A. Roe Advanced Center for Genome
Technology Department of Chemistry and
Biochemistry University of Oklahoma broe_at_ou.edu
www.genome.ou.edu
Plant and Animal Genome San Deigo January 11, 2004
Photos by Steve Hughes, Genetic Resource Centre
(PIRSA-SARDI), Adelaide, Australia. http//www.fao
.org/ag/AGP/AGPC/doc/gallery/pictures/meditrunc/me
ditrunc.htm
2
Why sequence the Medicago genome?
  • An important forage crop
  • A genetically tractable model legume
  • A relatively small (500 Mbp) diploid genome
  • Active legume research community
  • Medicago Research Consortium
  • Large collection of ESTs
  • Excellent BAC library
  • Integrated physical and genetic map
  • Large number of BAC-end sequences


3
Sequence Pipeline at the University of Oklahoma
Genome Center, OU-ACGT
DNA
GenBank
Sequencing (ABI 3700)
Growing subclones (HiGroTM)
Subclone isolation II (VPrepTM)
DNA shearing (HydroshearTM)
Data assembly and Analysis
Thermocycling (ABI 9700)
Subclone Isolation I (Mini-StaccatoTM)
Colony Piking (QPixIITM)
Closure
Miscelaneous liquid handling
Primer Synthesis
4
Subclone Isolation (Mini-StaccatoTM)
  • This Zymark robot has 384 cannula array, four
    built in shakers, three attached storage racks,
    built-in barcoding and a Twister II robotic arm.
  • This automation has allow us to perform the DNA
    isolation completely unattended from as many as
    eighty 384 well plates of bacterial cells per
    day.

5
Subclone Isolation (Mini-StaccatoTM)
  • Once all three solutions have been added, the
    plates are transferred from the SciClone
    workspace deck to a storage rack by the Twister
    II robotic arm.

6
Subclone Isolation and Sequencing Reaction
Pipetting (Velocity 11 VPrep)
  • Liquid handling station with 384-channel pipettor
    head
  • Four movable shelves on either side of the
    pipettor head
  • Used for subclone isolation, sequencing reaction
    set-up and clean-up.

7
Data assembly and Analysis
Phred/Phrap/Consed
Sun V880 server
Exgap
  • 32 GB RAM running Solaris 8 OS and 3 TB of data
    stored on RAID-5 arrays with autoloader tape
    backup
  • Also
  • 12 workstations each with 1 GB RAM

8
Initial WGS Skimming for 500 Mb Medicago
truncatula genome
  • Collected 25,000 end-sequences from 12,500
    plasmid-based WGS clones.
  • Of these 25,000 sequences, 1,000 have homology
    with Medicago truncatula ESTs.
  • URL http//www.genome.ou.edu/medicago.html

9
Phrap assembly of our Medicago truncatula whole
genome shotgun survey sequencing data at
0.005-fold genomic sequence coverage
10
DotPlot of a Phrap assembled whole genome shotgun
contig showing multiple repeated regions
11
DotPlot of a Phrap assembled whole genome shotgun
contig showing 4 repeated blocks of 600 bases
12
Yet another genomic contig showing extensive
repeated regions
Contig 1931
13
gtContig1931 TTTACGTCCCCGTAGTGAACTATTTCCTAAGTTGACT
AGTCAATTAGGTG ATAGTTCGTCCGGATGACGTACCGCCGTGAACCCGA
TATGAGAATTTCAT GTGGTGCATCCTTCTATGTTTGATAAGGTCATTTT
GAACGGTCGGATTGA ACGTGGCTGGTGTCGTTCACGATAGAGGCACGTT
TAGGTCCCTACGGTGA ACTAGTTCCTAAGTTGACTAGTCAATTAGGTGA
TAGTTTGTCCGGATGAC GTACCTCCGTGAACCCGATCTGAGAAATTCAA
GTTTCTGCATCCTTCTAT GTTTGATAAGGTCATTTTGAACGGTCGGATT
GAAGGTGGCTGGTGTTCTT CACATTCTAGGCACGTTTAGGTTCCCGCGG
TGAACTAGTTCCTAAGTTGA CTAGTCAATTAGGTGATAGTTCGTCCGGA
TGACCTACCTCCGTGAACCCG ATATTAGAAATTCAAGTTTCTGCATCCT
TCTATGTTTGATAAGGTCATTT TGAACGGTCAGATTGAACGTGGCTGGT
GTCGTTCACGATCTAGGCACGTT TAGGTCCCCGCAGTGAACTAGTTCCT
AAGTTGACTAGTCAATTAGGTGAT AGTTTGTCCGGATGACGTGACTCCG
TAAAGCCAGTATGAGAACTTCTAGT TTCTGCATCCTTTTATGTTTGATA
AGGTCATTTTGAACGGTGGGATTGAA CGTTGTTGGTGTCGTTCACGATC
TAGGCACGTTTAGGTCCCCGCAGTGAA CTAGTTCCTTAGTTGACTAGTC
AATTAGGTGATAGTTCGTCCGGATGACG TATCTCCGTCAGCCCGATCTG
AGAAATTCAAATTTCTGCATCCTTCTATG TTTGATAAGGTCATTTTGAA
CGGTCGGATTGAACGTGGCTGGTGTCGTGC ACGATCAAGGCACGTTTAG
GTCCCCGCAGCGAACTAGTTCCTAAGTTGAC TAGTCAATTAGGTGATAC
CTTGTCCGGATGACGTACCTCCGTGAACCCGA TCTGAGAAATTCAAGTT
TCTGCATCCTTCTATGTTTGATAAGGTCATTTT GAACGGTTGGATTGAA
CATGGCTGGTGTCGTTCACGATCTAGGCACGTTT AGGTCCCCGCAGTGA
ACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATA GTTCGTCTGGATGA
CGTACCTCCTTGAACCCAATATGAGAAATTCAATTT TCTTCATCCTTCT
ATGTTTGATAAGGTCATTTTGAACGGTCGGATTGAAC GTGCCTGGTGTC
GTTCACGATCGAGGCACGTTTAGGTCCCCGCAGTGAAC . . .
14
Summary of our Medicago truncatula WGS Sequencing
Assembly with only 0.005-fold Genomic Sequence
Coverage
  • The largest contig (21,157 bp) contained the 26S
    rRNA genes
  • 19 smaller contigs (105,455 bp total) were from
    the chloroplast genome
  • The remaining 500 contigs, ranging in size from
    2,000 to 12,000 bp contain highly repetitive DNA,
    which were unique to Medicago, as they had no
    significant homology in the GenBank database
  • We concluded that a more directed strategy was
    needed

15
Mapped BAC approach in collaboration with Doug
Cook and DJ Kim at U.C. Davis with funding from
the Noble Foundation, Ardmore, OK
16
The first 1000 Medicago truncatula BACs
  • Initially concentrated on BACs with known
    biological markers and in regions of biological
    interest that were supplied to us by the UC Davis
    group.
  • Requests for sequencing specific BACs were
    directed to Doug Cook and DJ Kim at UC Davis and
    they supplied us with the BACs once these BACs
    have been characterized.
  • Once the BACs were received, we created the
    shotgun libraries, isolated the sequencing
    templates and obtained the working draft sequence
    followed by closure and finishing.
  • All data was made publically available in GenBank
    within 24 hours of sequence assembly.

17
UC Davis -------- Oklahoma University
18
(No Transcript)
19
The next 750 Medicago truncatula BACs
  • With recent NSF funding, we will be sequencing
    BACs from chromosomes 1,4, 6, and 8 with the goal
    of completing the sequence of the euchromatic
    regions of these chromosomes over the next 3
    years.
  • Chromosomes 2 and 7 will be sequenced at TIGR,
    chromosome 3 at The Sanger Institute and and
    chromosome 5 at Genoscope.
  • All data will be released immediately as before.

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
myosin-like protein
Gene density 1 gene per 10 kb
25
(No Transcript)
26
(No Transcript)
27
Gene Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
28
Exon Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
29
Intron Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
30
Gene Density of the 450 Mb Medicago truncatula
genome
FgeneSH Genscan Total number of
genes 13,397 11,488 Total length of
genes 30,793,326 51,687,528 Total exon
length 15,794,243 14,400,445 Total number of
exons 59,808 55,792 Total intron
length 14,999,083 37,287,083 Total number of
introns 46,412 44,305 ____________________________
___________________________ Base Pairs Sequenced
87,423,457 87,423,457 ___________________________
____________________________ Gene Space (Gene
Length/BP Sequenced) 35 59 _____________________
__________________________________ Gene
Density (Genes/200Mb) 30,649 26,281 1 gene/6.5
kb 1 gene/7.6 kb _________________________________
______________________ Arabidopsis 25,498
protein coding genes
31
Medicago GC Content for 90 Mb of Genomic BAC
Clones Sequenced (mainly from gene rich regions)
32
Metabolic Overview of Medicago 13,396 FgeneSH
predicted genes using the COG Database
33
Metabolic Overview (detailed view) of
Medicago 13,396 FgeneSH predicted genes using the
COG Database
34
Gene Duplication Three copies of the
phosphoglycerate kinase gene in one BAC
35
Gene Duplication Three copies of
phosphoglycerate kinase in one BAC
AC138448.fg.10 MATKRSVGTLKEAELKGKRVFVRVDLNVPLDDNLN
ITDDTRIRAAVPTIKYLTGYGAKVILSSHL----- AC138448.fg.11
MA-KKSVGDLSGAELKGKKVFVRADLNVPLDDNQNITDDTRIRAAIPTI
KYLIQNGAKVILSSHL----- AC138448.fg.8
MATKRSVGTLKEGELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIK
YLTGYGAKVILSSHLEIYKT AC138448.fg.10
------------------------------------------GRPKGVTP
KYSLKPLVPRLSELLGTQVK AC138448.fg.11
------------------------------------------GRPKGVTP
KYSLAPLVPRLSELIGIEVI AC138448.fg.8
EVSVSEYNLAVSEYKLAISDTYRYRIRVRHDSSPFLEYRGSQGRPKGVTP
KYSLKPLVPRLSELLETQVK AC138448.fg.10
IADDSIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNDPEFAKKLASLAD
LYVNDAFGTAHRAHASTEGV AC138448.fg.11
KAEDSIGPEVEKLVASLPDGGVLLLENVRFYKEEEKNDPEHAKKLAALAD
LYVNDAFGTAHRAHASTEGV AC138448.fg.8
ISDDCIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNEPEFAKKLASLAD
LYVNDAFGTAHRAHASTEGV AC138448.fg.10
AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESL
LEKVDILLLGGGMIFTFYKA AC138448.fg.11
TKYLKPSVAGFLLQKELDYLVGAVSSPKRPFAAIVGGSKVSSKIGVIESL
LEKVDILLLGGGMIFTFYKA AC138448.fg.8
AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESL
LEKVDILLLGGGMIYTFYKA AC138448.fg.10
QGYAVGSSLVEEDKLDLATTLIEKAKAKGVSLLLPTDVVIADKFAADAND
KIVPASSIPDGWMGLDIGPD AC138448.fg.11
QGLAVGSSLVEEDKLELATTLIAKAKAKGVSLLLPSDVVIADKFAPDANS
QIVPASAIPDGWMGLDIGPD AC138448.fg.8
QGYSIGSSLVEEDKLDLATSLMEKAKAKGVSLLLPTDVVIADKFSADAND
KIVPASSIPDGWMGLDIGPD AC138448.fg.10
SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTT
IIGGGDSVAAVEKVGLADKM AC138448.fg.11
SIKTFNEALDTTQTIIWNGPMGVFEFDKFAVGTESIAKKLADLSGKGVTT
IIGGGDSVAAVEKVGVADVM AC138448.fg.8
SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTT
IIGGGDSVAAVEKVGLADKM AC138448.fg.10
SHISTGGGASLELLEGKPLPGVLALDDA 401 amino
acids AC138448.fg.11 SHISTGGGASLELLEGKELPGVLALDEAT
PVAV 405 amino acids, differs at 42
positions AC138448.fg.8 SHISTGGGASLELLEGKPLPGVLAL
DDA 448 amino acids, differs at 6 positions
36
25 kb region
5 kb region
37
PIP of M. truncatula BAC AC121240 vs. A. thaliana
Chr.2
38
Medicago truncatula Summary and Conclusions
  • Average Predicted Gene Density of 1 gene per 6.5
    to 7.6 Kb by FgeneSH and Genscan, respectively.
  • Genome characteristics such as GC, intron/exon
    size and conserved unique 5 splice sites reveal
    Medicago characteristics
  • The sequence of the Medicago truncatula genome
    shows homology to the sequenced Arabidopsis
    thaliana genome but expansion, rearrangements and
    duplications are evident.

39
Data Release and Preliminary Annotation
  • All our sequence data is available through links
    on our web site to GenBank and on our ftp site at
    URL ftp.genome.ou.edu/medicago
  • keyword and blast searches can be done on our web
    site at URL http//www.genome.ou.edu/medicago.htm
    l
  • Additional annotation via Genome Browser database
    are available on our web site at URL
    http//www.genome.ou.edu/medicago_table.html
  • E-mail suggestions for additional annotation to
    Bruce Roe at broe_at_ou.edu

40
Three Year Plan
  • Obtain the contiguous sequence of the Gene Rich
    regions of four of the 8 Medicago truncatula
    genome at OU, with the remaining four being
    completed by our international partners at TIGR,
    Sanger, and Genoscope.
  • This information will serve as a solid foundation
    for anticipated comparative and functional legume
    genomics.

41
(No Transcript)
42
The ACGT Team
43
(No Transcript)
44
Conserved Intron/Exon Boundry Features by a
FELINEs Analysis of 181,444 Medicago truncatula
ESTs in GenBank vs Genomic Sequence
Size Range Mean Length Exons 6 - 5,789 nt 268
nt Introns 20 - 3,921 nt 429 nt Intron Conserved
Splice Site Sequence Elements Percent Introns w/
5 GU 99.21 Introns w/ 5 GC
0.36 Introns w/ 5 AU 0.31 Introns w/
U12 branch sites instead of A12
0.13 Compared to 0.5 - 2.5 in fungi, and 0.5
in mammals with an EST minimum identity of 90
S. Drabensctot, D. Kupfer, J. White, D. Dyer, B.
Roe, K. Buchanan and J. Murphy. FELINES A
Utility for Extracting and Examining EST-Defined
Introns and Exons. Nucleic Acid Research 31(22),
E141 (2003).
45
Consensus Logogram of the 5GU vs the 5AU Class
of Introns in Medicago truncatula determined by
FELINES
Write a Comment
User Comments (0)
About PowerShow.com