Title: July 2005 Henning Hermjakob
1Proteomics Services team Standards, Data and
Tools for Proteomics
July 2005Henning Hermjakob
2- Proteomics 2005
- Proteomics results are perfectly compatible, but
only if they are from the same lab, from the same
software - Fragmentation of proteomics data
- Publish and vanish
- Urgent need for standardisation
- Engineering 1850
- Nuts and bolts fit perfectly together, but only
if they originate from the same factory - Standardisation proposal in 1864 by William
Sellers - It took until after WWII until it was generally
accepted, though
3HUPO Proteomics Standards Initiative
- Develop data format standards
- Data representation and annotation standards
- Involve data produces, database providers,
software producers, publishers
4PSI work groups
PSI-GPS General Proteomics Schema
PSI-PTM
PSI-MI Molecular Interactions
PSI-MS Mass Spectrometry
5MGED collaboration
FuGE Functional Genomics Experiment model
MGED MIAME MAGE-OM Microarray Standard
PSI-GPS General Proteomics Schema
PSI-PTM
PSI-MI Molecular Interactions
PSI-MS Mass Spectrometry
6PSI work groups MI
FuGE Functional Genomics Experiment model
MGED MIAME MAGE-OM Microarray Standard
PSI-GPS General Proteomics Schema
PSI-PTM
PSI-MI Molecular Interactions
PSI-MS Mass Spectrometry
7PSI-MI XML format
- Community standard for Molecular Interactions
- XML schema and detailed controlled vocabularies
- Exchange format, not internal format
- Jointly developed by major data providers BIND,
CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct,
MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U.
Cambridge, and others - Version 1.0 published in February 2004The HUPO
PSI Molecular Interaction Format - A community
standard for the representation of protein
interaction data.Henning Hermjakob et al, Nature
Biotechnology 2004, 22, 176-183.
8PSI-MI 2.5
- Currently in beta stage
- To be released for HUPO conference Munich, August
2005 - More interactor types
- DNA
- RNA
- Small molecules
- More annotation detail
- Better (protein) identifier handling
9PSI-MI XML benefits
- Collecting and combining data from different
sources has become easier. - Standardized annotation through PSI-MI ontologies
- Tools from different organizations can be
chained, e.g. analysis of IntAct data in Cytoscape
10IntAct project
- EU framework 5, 2002-2004
- Coordinated by EBI
- 8 partners across Europe
- Production mode since summer 2003
- Open source code, public data
- http//www.ebi.ac.uk/intact
- To be continued
SDU
EBI
GSK
MPI
SIB
UBX
CNB
HUJI
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19Data exchange and collaborations
- UniProt gt10.000 crossreferences and CC blocks
- Reactome Import and maintenance of complexes (in
progress) - MSD Import of structural complexes, exchange of
interaction domain data (in progress) - MINT, Rome Joint curation and development
- UniProt, GO Joint curation
20IntAct statistics
21The IMEx consortium
- International Molecular-Interaction Exchange
consortium - BIND, DIP, IntAct, MINT, MIPS will establish an
exchange of user-submitted data in PSI-MI format
from beginning of 2005 onwards to provide a
network of stable, comprehensive resources for
molecular interaction data - Aims
- Consistent body of public data
- Avoid redundant curation
22PSI work groups MS
FG-OM Functional Genomics Experiment model
MGED MIAME MAGE-OM Microarray Standard
PSI-GPS General Proteomics Schema
PSI-MI Molecular Interactions
PSI-MS Mass Spectrometry
23PSI-MS based data flow
proprie-tary format
mass spectrometer A
search engine A
converter
mzData
mzData
mzIdent
mass spectrometer B
search engine B
Public repository
24Mass spectrometry mzData
- mzData format as common instrument output format
- Format beta version accepted in Nice, April 2004
- EBI workshop July 2004
- Version 1.05 released January 4, 2005
- Controlled vocabularies developed jointly with
ASTM - Key concept Request direct vendor support to
avoid version problems due to vendor API changes
25Announced mzData support
- SIB Aldente (next release)
- GeneBio Phenyx (current release)
- Matrix Science Mascot (in release 2.1)
- Bruker Daltonics (next release)
- Kratos (next release)
- Thermo Electron (next release)
- Agilent, ABI, (exploratory phase)
- Proteome Systems Ltd. (converter)
- X! Tandem (current release)
26PSI-MS based data flow
proprie-tary format
mass spectrometer A
search engine A
converter
mzData
mzIdent
mzIdent
mass spectrometer B
search engine B
Public repository
27Mass spectrometry mzIdent
- mzIdent format as common search engine output
format - Suggested in Nice, April 2004
- Aim Facilitate comparison and archiving of
search engine output, in particular in
comparative projects like the HUPO PPP - Current beta version released October 2004
- Update in workshop at ISB, Seattle, July 2005
28PSI-MS based data flow
proprie-tary format
mass spectrometer A
search engine A
converter
mzData
mzIdent
mass spectrometer B
search engine B
Public repository
Public repository
29PRIDE
- Proteomics IDEntification database
- Collaboration with U. Gent (Lennart Martens)
- Status Beta today, Production tomorrow -)
- Public repository for HUPO Plasma Proteome
Project - Implements mzData
- http//www.ebi.ac.uk/pride
30(No Transcript)
31PRIDE collaborations
- HUPO Plasma Proteome Project, ca. 10.000 IDs
- HUPO Brain Proteome Project
- HUPO Liver Proteome Project ??
- CellMapBase, U. Montreal
- U. Gent
32Medium term vision
- Collaborate with regional or project centers for
data collection and analysis - Establish data exchange and collaboration similar
to PSI-MI/IMEx between PeptideAtlas, GPMDB,
PRIDE, - Provide a set of compatible, synchronized, public
resources for protein identification data
33PSI work groups GPS
FuGE Functional Genomics Experiment model
MGED MIAME MAGE-OM Microarray Standard
PSI-GPS General Proteomics Schema
PSI-MI Molecular Interactions
PSI-MS Mass Spectrometry
34General Proteomics Standards (GPS)
- Capture all relevant aspects of a proteomics
experiment - Iterative, long-term development
- Building on PEDRo work
- Taylor et al A systematic
approach to modeling, capturing,
and disseminating proteomics experimental data.
Nat Biotechnol. 2003
Mar21(3)247-54.
35GPS principles
- Modular approach, technology by technology, with
shared common components - Each domain covered by
- Guidelines (MIAME-style)
- Model (MAGE-style)
- Ontologies
- First documents coming up now (Gels)
36DAS Distributed Annotation System
- Simple, lightweight protocol for annotation of
sequences - Sequence from reference server
- Feature annotation from local annotation
servers - Client-side data integration and visualisation
- Tried and tested
- Broad array of software available
- Used in
- BioSapiens
- Transfog
37Available DAS infrastructure
- UniProt reference server
- UniProt annotation servers
- Many third party annotation servers, though
currently focussed on human - Open source packages for local installation of
annotation servers - Dasty client
38(No Transcript)
39Biochemical nomenclature IntEnz
- Integrated relational Enzyme database
- Integration of a number of Enzyme resources in a
relational database - Facilitate the searching and curation of data via
a relational database as opposed to HTML/ Flat
files
40Biochemical nomenclature ChEBI
- Chemical Entities of Biological Interest
- Provide a standard of biochemical compounds as a
reference for other databases - Integration of existing sources
- Instant reference for non-chemists
- Xrefs to/from Reactome
41ChEBI data integration
Web interface
ChEBI
Data Dumps
Chemical Ontology by Michael Ashburner
Manual curation of ChEBI data only
Etc
42Proteomics Services Team Summary
- Development of standards for proteomicsHUPO
Proteomics Standards Initiative - Implementation of repositoriesIntAct, PRIDE
- International data exchangeProtein DAS, IMEx
- Biochemical reference ChEBI, IntEnz
43Acknowledgements
- ChEBI, IntEnz
- Paula de Matos
- Kirill Degtyarenko
- Markus Ennis
- Martin Zbinden
- Astrid Fleischmann
- EU Temblor grant QLRI-CT-2001-00015
- EU BioSapiens grantLHSG-CT-2003-503265
- YOU!
- HUPO analysis
- Michael Mueller
- PSI
- All participants
- Luisa Montecchi-Palazzi
- Chris Taylor
- Weimin Zhu
- Sandra Orchard
- Gary Bader, MSKCC
- Lukasz Salvinski, UCLA
- Randall Julian, Lilly
- Rolf Apweiler
- IntAct
- Samuel Kerrien
- Sugath Mudali
- Catherine Leroy
- Jyoti Khadake
- Karine Robbe
- Dave Thorneycroft
- Xavier Brochet
- Alexandre Liban
- Rafael Gimenez
- UniProt and GOA teams!
- Pride
- Phil Jones
- Richard Cote
- Lennart Martens, U Gent
- William Derache
- DAS
- Nisha Vinod
44Resources
- http//www.ebi.ac.uk/intact
- http//www.ebi.ac.uk/pride
- http//www.ebi.ac.uk/dasty
- http//www.ebi.ac.uk/chebi
- http//www.ebi.ac.uk/intenz
- http//psidev.sf.net
- IntAct - an open source molecular interaction
database. H. Hermjakob et al. Nucl. Acids. Res.
2004, 32 D452-D455. - The HUPO PSI Molecular Interaction Format - A
community standard for the representation of
protein interaction data.H. Hermjakob et
al.Nature Biotechnology 22, 2004 177-183. - Dasty and UniProt DAS a perfect pair for protein
feature visualization.Jones P, Vinod N, et al.
Bioinformatics. 2005 Jul 1521(14)3198-9.
45Resources
- http//www.ebi.ac.uk/intact
- http//www.ebi.ac.uk/pride
- http//www.ebi.ac.uk/dasty
- http//www.ebi.ac.uk/chebi
- http//www.ebi.ac.uk/intenz
- http//psidev.sf.net
- IntAct - an open source molecular interaction
database. H. Hermjakob et al. Nucl. Acids. Res.
2004, 32 D452-D455. - The HUPO PSI Molecular Interaction Format - A
community standard for the representation of
protein interaction data.H. Hermjakob et
al.Nature Biotechnology 22, 2004 177-183. - Dasty and UniProt DAS a perfect pair for protein
feature visualization.Jones P, Vinod N, et al.
Bioinformatics. 2005 Jul 1521(14)3198-9.
46Resources
- http//www.ebi.ac.uk/intact
- http//www.ebi.ac.uk/pride
- http//www.ebi.ac.uk/dasty
- http//www.ebi.ac.uk/chebi
- http//www.ebi.ac.uk/intenz
- http//psidev.sf.net
- IntAct - an open source molecular interaction
database. H. Hermjakob et al. Nucl. Acids. Res.
2004, 32 D452-D455. - The HUPO PSI Molecular Interaction Format - A
community standard for the representation of
protein interaction data.H. Hermjakob et
al.Nature Biotechnology 22, 2004 177-183. - Dasty and UniProt DAS a perfect pair for protein
feature visualization.Jones P, Vinod N, et al.
Bioinformatics. 2005 Jul 1521(14)3198-9.