Sharing Genomic Data and Annotations using GFF3 format - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Sharing Genomic Data and Annotations using GFF3 format

Description:

Sharing Genomic Data and Annotations using GFF3 format Dina Sulakhe and Natalia Maltsev Bioinformatics Group MCS, Argonne National Laboratory Computation Institute – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 11
Provided by: MCSDi
Category:

less

Transcript and Presenter's Notes

Title: Sharing Genomic Data and Annotations using GFF3 format


1
Sharing Genomic Data and Annotations using GFF3
format
Dina Sulakhe and Natalia Maltsev Bioinformatics
Group MCS, Argonne National Laboratory Computation
Institute University of Chicago
2
What we are going to talk about?
  • GFF3 overview
  • GFF3 Standards
  • For Sharing Annotations
  • For cross-referencing the data
  • Extending GFF3
  • Adding annotations from public databases
  • Adding users annotations
  • Sharing and exchanging annotations using
    Web-services
  • GFF3 genomes repository at Argonne
  • Downloads
  • Web-services

3
GFF3 overview (Lincoln Stein, 2004)
  • A tab-delimited flat file representation of
    genomic features
  • GFF3 format
  • provides a mechanism for representing of
    hierarchical grouping of genomic features and
    sub-features
  • separates the ideas of group membership and
    feature name/id
  • Enforces the use of controlled vocabularies by
    imposing constraints on the definitions of
    genomic features
  • allows a single feature (e.g. an exon) to belong
    to more than one group at a time.
  • provides an explicit convention for pair wise
    alignments
  • provides an explicit convention for features that
    occupy disjoint regions

4
An Example
5
PUMA2/GNARE Systems
  • PUMA2 (http//compbio.mcs.anl.gov/puma2) is an
    Interactive Integrated Environment for
    High-throughput Genetic Sequence analysis and
    Metabolic reconstructions of public genomes with
    Grid-based computational backend
  • GNARE is PUMA2 for analysis of user-submitted
    genomes
  • http//compbio.mcs.anl.gov/gnare)
  • PUMA2 contains
  • Integrates Information from over 25 genomic,
    metabolic, structural and taxonomic databases
    (RefSeq, Unirot, IproClass, PDB, KEGG, EMP, CATH,
    NCBI Taxonomy, Phenotypes, etc)
  • Pre-computed analysis of publicly available
    completely and almost completely sequenced
    genomes (517 bacteria, 41 archaeal, 24
    eukaryotic, 638 mitochondrial and 2127 viral
    genomes) in interactive PUMA2 framework
  • Automated Metabolic reconstructions for 300
    completely sequenced organisms
  • GNARE User Models a framework for analysis of
    genomes provided by users (Shewanella federation,
    Apicomplexa genomes, strains of B. anthracis,
    Yersinia, Staphylococcus, Haemophilus, etc)
  • A suite of unique tools for evolutionary analysis
    of enzymes and metabolic networks (Chisel,
    PhyloBlocks, etc) developed by our group
  • PUMA2 satellite databases Pathos (GLRCE
    biodefence), TarGet (MCSG structural bilogy),
    Sentra (prokaryotic signal transduction),
    SubUnit, Physiological Profiles. MetaGenomes
    (PNNL Hanford Site), etc

6
GFF3 genomes repository at Argonneftp//ftp.mcs.a
nl.gov/pub/compbio/PUMA2/gff/gff_files/
  • All completely sequenced genomes from RefSeq are
    converted into GFF3 format.
  • GFF3 files for 8419 bacterial, eukaryotic,
    mitochondrial, viral, etc genomes can be
    downloaded from
  • ftp//ftp.mcs.anl.gov/pub/compbio/PUMA2/gff/gff_f
    iles/
  • The file names correspond to the NCBI-RefSeq
    accession numbers, e.g
  • ftp//ftp.mcs.anl.gov/pub/compbio/PUMA2/gff/gff_fi
    les/NC_006815.gff

7
Future Plans Annotations
  • In 2007 we will supplement Genome GFF3
    annotations for public genomes with
  • additional annotations from public databases
    (e.g. NCBI, UniProt, Integr8, GenomeNet, etc) and
  • annotations from our analysis tools (e.g. Chisel
    and PUMA2_FP), and other analysis tools
  • Supplement the GFF3 files for RefSeq genomes with
    annotations provided by users via the GNARE system

8
Future Plans Sharing Annotations and
Cross-referencing the data
  • GFF3 format can be used to share annotations and
    cross-references by different annotation centers
  • We plan to build services (Web-services and
    Web-interfaces) to allow users to
  • Submit and share their annotations via the PUMA2
    GFF3 converter
  • Extract public annotations from PUMA2 integrated
    database as well as user-submitted annotations in
    GFF3 format
  • Support customization of the GFF3 format (e.g.
    include only the fields of interest to a user,
    provide information from particular resource)
  • Cross-references to various databases (e.g.
    NCBI-RefSeq, PIR, SwissProt, UniProt, and others)
    will be included as feature data in the GFF3
  • Explore the use of ontologies for extension of
    the GFF3 format (we need your advice!)

9
Future Plans Data Distribution..(GFF3 genomes
repository at Argonne)
  • All the feature data collected and computed by
    the PUMA2 project for publicly available genomes
    will be distributed in the GFF3 format.
  • We will distribute the data through
  • Web-services
  • Web Interface (http)
  • FTP downloads

10
Acknowledgements
  • Our Team and
  • Globus Ian Foster, Mike Wilde, Nika Nefedova,
    Jens Voeckler Condor Zach Miller, Miron Livny
    OSG, TeraGrid
  • MCS Rick Stevens, systems, Susan Coghlan, and a
    lot of others.
Write a Comment
User Comments (0)
About PowerShow.com