Title: Standards for Proteomics Data
1XML Standards for Proteomics Data
- Andrew Jones, Dr Jonathan Wastling and Dr Ela
Hunt - Department of Computing Science and
- the Institute of Biomedical and Life Sciences,
University of Glasgow
2Proteomics
2D-PAGE
1.
1. 2D-PAGE to separate proteins
2.
3.
Mass Spectrometry
Image Analysis
2. Image analysis to determine the volume of
protein spots
3. Mass spectrometry (MS) to characterise protein
spots
Database Search
4. Database searches to identify proteins
4.
3Proteomics Data Issues
Instruments
- Many different instruments for data collection
- Great variety of software used for analysis
- Access to external databases
- For protein identification
- Protein characterisation after ID
- High-throughput techniques generate very large
data sets
Scanner, MS
Software
Image analysis, MS viewer
Databases
Genome, microarray, publications, more...
4A Standard Model for Proteomics
- Improve management of laboratory workflows
- Data Integration link local data to external
data sources - Development of public databases, enabling
- Queries over protocols, raw data and analysis
- Experiments to be reproduced or re-analysed by
other research groups - Co-analysis of proteome data with genome,
transcriptome and other resources
5Biological Collaborators
- Parasitology research group
- Investigating host-parasite response with
Toxoplasma gondii - Ras/Raf pathway research at the Beatson institute
- Functional Genomics facility at the IBLS
Functional Genomics Facility - http//www.gla.ac.u
k/departments/ibls/ASU/fgf/
6MAGE model for Proteomics
- The MAGE model has been developed to store
microarray protocols, data and analysis - A similar model will facilitate integration
between microarray and proteome data - Aspects of the model require few modifications to
be applicable to proteomics - We are developing a new representation of 2D gel
analysis and MS data
7Experimental Protocols in MAGE
Protocol
- MAGE model is extensible
- Protocol is generated as an ordered list events,
materials and hardware - Few changes required to focus on protein
extraction rather than mRNA production
ArrayDesign
BioEvent
BioMaterial
BioAssay
Array
8Experimental Protocols for 2D gels
Protocol
- MAGE model is extensible
- Protocol is generated as an ordered list events,
materials and hardware - Few changes required to focus on protein
extraction rather than mRNA production
2D_PAGE_Setup
BioEvent
BioMaterial
BioAssay
2D_PAGE
9Proteomics Data Model
- Image analysis identifies spots observable on the
gel - Important to store raw data and analysis from MS
- Separate package for cross gel analysis e.g. time
series
MS_Setup
MS_Data
BioSequence
Protein_Spots
Data_Analysis
2D_PAGE
Multiple_ Analysis
Link From Protocol
10Proteomics Model
Protocol
Protocol
BioEvent
2D_PAGE_Setup
BioMaterial
- Experimental protocol packages require few
changes from MAGE - New data model includes MS data and statistical
analysis between gels - Model incorporates storage of external database
searches
BioAssay
Data
2D_PAGE
Experiment
Protein_ Spots
Multiple_ Analysis
Data_ Analysis
MS_Setup
MS_Data
BioSequence
Annotation
Audit Security
Common
BQS
Description
Measurement
11Proteomics Database and Indexing Technology
- A prototype database for proteomics has been
developed - We are developing a specialised index structure
for XML, in order to improve query performance - The performance of the index has currently been
tested with 800MB of protein data1
Data Stores
XML Index
6
2
Data Path Tree
7
1
3
8
4
XML Dictionary 1 Experiment 2 gelImage 3 spots 4
spot
9
1. Protein Information Resource -
http//pir.georgetown.edu/
12Related Research
- Databases
- SWISS-2DPAGE, LIMS systems
- Standards
- Proteomics Standards Initiative (PSI)
- Standards for protein-protein interactions and
mass spectrometry - PEDRo system with PEML Proteomics experiment
markup language
PSI http//psidev.sourceforge.net/
13Work In Progress
- Work towards an XML standard for proteomics
- Create standards for capturing statistical
processing of large data sets - Developing XML indexing technology to improve
data integration and query power - Developing a proteome database utilising XML
indexing and a standard model
14Contact jonesa_at_dcs.gla.ac.uk Bioinformatics
Research Centre - www.brc.dcs.gla.ac.uk
Acknowledgements Researchers in Jonathan Wastling
lab for input into the model. Dr Ashwin
Kotiwaliwale at the Beatson for the collaboration
on the prototype database.
The Functional Genomics Facility is supported by
a Wellcome Trust grant for 2.4M. My research is
supported by an MRC Bioinformatics PhD
studentship, Ela Hunt is supported by an MRC
Fellowship.