ChEBI: The story so far - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

ChEBI: The story so far

Description:

Bioinformatics data too large to keep track of chemical compounds ... H l ne Courrier, Stephane Nauche, Jeremy Parsons. Database supporters ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 33
Provided by: paulad7
Category:
Tags: chebi | far | stephane | story

less

Transcript and Presenter's Notes

Title: ChEBI: The story so far


1
ChEBI The story so far
2
Private Data
Public Data
3
The state of affairs of bioinformatics in 2002
  • Bioinformatics is booming
  • Human Genome sequence rough draft published June
    2000
  • Free resources and free data

4
A different story for chemoinformatics
  • Private data and private software

5
Too hard to solve lets put our head in the sand
6
Bioinformatics data too large to keep track of
chemical compounds
  • 100000 Protein entries in SwissProt (2002)
  • 20 million entries in EMBL Database (2002)
  • Small databases unable to keep track
  • ENZYME resources 3500 enzymatic reactions

7
New initiatives start up
  • PubChem
  • Chemical repository, millions of entries, focus
    on screening assays
  • ChEBI
  • Manually annotated database, nomenclature
    reference and compound database, tens of
    thousands of entries

8
Principles of foundation
  • December 2002 email exchanges within the EBI to
    address the issue of chemistry
  • Three principles outlined

9
  • Nothing held in the database must be
    proprietary or derived from a proprietary source
    that would limit its free distribution/availabilit
    y to anyone.

10
Every data item in the database should be fully
traceable and explicitly referenced to the
original source/version.
11
Although the EBI will provide a web interface,
the entirety of the data should be available to
all without constraint as, for example, SQL table
dumps, ASCII tables, and XML (e.g. DAMLOIL)
12
We make a start using existing resources
  • Integrate three resources
  • KEGG Compound
  • IntEnz
  • Chemical Ontology
  • Annotation starts summer 2003
  • Focus on nomenclature

13
Our first release was modest but it was a start
  • 21 July 2004
  • 2783 annotated entities
  • Data
  • ChEBI Name, ChEBI Id
  • IUPAC Names, Synonyms
  • Formula
  • Cross-references

14
We introduce structures - Sep 2005
  • Molfiles
  • InChI (IUPAC International Chemical Identifier)
  • SMILES (Simplified Molecular Input Line Entry
    System)
  • Image (PNG)

15
Marvin in ChEBI
16
We start editing the chemical ontology Dec 2005
17
Web Services - Oct 2006
  • Programmatic access to a ChEBI entry
  • SOAP based Java implementation
  • Clients currently available in Java and perl
  • Four methods with which to access data
  • getLiteEntity
  • getCompleteEntity
  • getOntologyParents
  • getOntologyChildren

18
Automated Cross References Aug 2007
Current Databases UniProtKB, Reactome,
BioModels, IntAct, SABIO-RK, PubChem and
ArrayExpress
19
Chemical Structure Searching May 2008
20
After all this, where are we?
21
(No Transcript)
22
(No Transcript)
23
Annotation is linear
24
Diversity of users
  • Constant challenge of balancing our users' varied
    interests.

25
Our positives
  • Nomenclature database
  • Manually annotated data
  • Attention to detail
  • Free and accessible
  • Loyal users

26
Our not so positives
  • Size for some people
  • Not well integrated into other bioinformatics
    resources
  • Community interaction
  • No software publicly available to manipulate the
    database

27
Involve the community
  • Create a submission web based tool
  • Users can easily submit their entities on a one
    to one basis
  • Also allowing bulk submission from other
    resources.

28
Improvements to data depth
  • Addition of more Xrefs PDB, MACIE ???
  • Addition of more chemical attributes? What
    chemical attributes?
  • Text mining projects to extract relevant chemical
    information from patents, journals
  • European Patent Office

29
Going Open Source
  • Commercial software packages will be replaced
    with Open Source
  • Long term goal allow people to create a free
    local installation of ChEBI
  • Distribution of data in useful formats CML, SDF

30
Acknowledgements
  • IntEnz Team
  • Rafael Alcantára, Volker Ast, Kristian Axelsen,
    Anne Morgat
  • EPO Collaborators
  • Hélène Courrier, Stephane Nauche, Jeremy Parsons
  • Database supporters
  • ArrayExpress, IntAct, Reactome, SABIO-RK, RSC,
    GO, RESID etc
  • ChEBI Team
  • Paula de Matos, Kirill Degtyarenko, Marcus Ennis,
    Janna Hastings, Christoph Steinbeck
  • Alumni
  • Michael Darsow, Mickael Guedj, Alan McNaught,
    Martin Zbinden
  • ChEBI supporters
  • Rolf Apweiler, Michael Ashburner, Henning
    Hermjakob, Janet Thornton

31
Requirement for submitting data to ChEBI
  • Disclaimer this is only the summary of a chat I
    have had with the ChEBI coordinator last night.
    So no promises !
  • Information needed to submit a compound
  • Structure
  • Name, synonyms
  • Registry
  • Database accession(s)
  • Mapping to ChEBI Ontology
  • ChEBI currently quite busy with ongoing projects,
    but would consider taking submissions.

ChEBI The story so far
31
32
What Could be done within APO-SYS
  • From Pekkas talk, I gathered that there are
    about 5,000 to 10,000 compounds in these siRNA
    libraries.
  • Question who else is dealing with compounds in
    APO-SYS?
  • One could use the ChEBIs web service using InCHI
    to identify what is already in the database.
  • ChEBI can do targeted curation provided funding
    for the curation team.

ChEBI The story so far
32
Write a Comment
User Comments (0)
About PowerShow.com