CVA for NMR data - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

CVA for NMR data

Description:

Context-dependent (the complement of mRNAs varies with ... Aspergil. Extensions Phosph. Extensions. PPI Validation Analysis Client. Protein ID Client ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 41
Provided by: hele105
Category:
Tags: cva | nmr | aspergil | data

less

Transcript and Presenter's Notes

Title: CVA for NMR data


1
(No Transcript)
2
From Yeast to Mouse to HumansFrom Manchester
to Hinxton and beyond
  • Steve Oliver
  • Professor of Genomics
  • Faculty of Life Sciences
  • The University of Manchester
  • http//www.cogeme.man.ac.uk
  • http//www.ispider.ac.uk

Faculty of Life Sciences
3
Functional Genomics
   
4
Tie everything back to the genome
5
Mind your Ps ( Qs)
PEDRo Pedro Pierre
6
PEDRo Model Life History
  • Developed in COGEME Project (Consortium for
    Genomics of Microbial Eukaryotes) within BBSRC
    Investigating Gene Function (IGF) Initiative.
  • Published in Nature Biotech, after feedback from
    wider community.
  • No complete data sets at point of publication.
  • Recent activities
  • Collecting data from multiple (mostly IGF) sites
    that conform to the PEDRo model.
  • Developing database containing PEDRo data (Pierre)

7
The PEDRo UML schema in reduced form
8
The nature of proteomics experiment data
  • Sample generation
  • Origin of sample
  • hypothesis, organism, environment, preparation,
    paper citations
  • Sample processing
  • Gels (1D/ 2D) and columns
  • images, gel type and ranges, band/spot
    coordinates
  • stationary and mobile phases, flow rate,
    temperature, fraction details
  • Mass Spectrometry
  • machine type, ion source, voltages
  • In Silico analysis
  • peak lists, database name version, partial
    sequence, search parameters, search hits,
    accession numbers

9
Implementing the schema
ltxselement name"Column"gt ltxscomplexTypegt
ltxssequencegt ltxselement
name"description" type"xsstring"/gt
ltxselement name"manufacturer"
type"xsstring"/gt lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
map
read
UML Model
Pedro
XML Schema
map
type/ load
CREATE TABLE LCColumn ( id integer PRIMARY
KEY , description varchar(200) NOT NULL ,
manufacturer varchar(100) NOT NULL ,
part_number varchar(50) NOT NULL )
ltSamplegt ltsample_idgtD0117lt/sample_idgt
ltsample_dategt2001-07-09lt/sample_dategt  
ltexperimentergtDavid SteadZhikang
Yinlt/experimentergt lt/Samplegt
RDB Schema
XML Data
XML Database Pierre
10
Modelling goals for PEDRo
  • Enough detail to
  • Allow results of different experiments to be
    analysed/compared.
  • Allow suitability of experiment design and
    implementation decisions to be assessed.
  • Allow protein identifications to be rerun in
    future with new databases or software.
  • Not detailed enough
  • To allow experiments to be rerun.

11
IGF Datasets currently in Database
12
There must be carrots as well as sticks
13
Proteomics aint the only show in town
14
INTEGRATION
15
Why integrate data?
  • These 200 proteins over expressed in my mouse
    model compared to WT. What are the interaction
    partners of these proteins?
  • Data is stored at a variety of sites and formats.
  • Databases designed mainly for browsing
  • (MIPS, SGD, BIND, SCPD, KEGG).
  • Need databases that allow complex queries.
  • Need to be easily usable by biologists.

16
Genome Information Management System (GIMS)
Paton NW, Khan SA, Hayes A, Moussouni F, Brass A,
Eilbeck K, Goble GA, Hubbard SJ, Oliver SG
(2000) Conceptual modelling of genomic
information. Bioinformatics 16, 548-557.
17
Database implementation
  • Uses the object database FastObjects.
  • All database classes and analysis programs are
    written in Java.
  • Allows close integration of the programming
    language with the database.
  • Allows fast access to database data from
    application programs.
  • Allows data to be stored in a way that reflects
    the underlying mechanisms in the organism.
  • Very flexible and extensible.

18
(No Transcript)
19
GIMS User Interface
  • Java application.
  • Can download from http//img.cs.man.ac.uk/gims
  • Communicates with database via RMI.
  • On start-up, application is sent information
    about database classes and canned queries.
  • Very flexible.
  • Allows user to browse database, ask canned
    queries, and store and combine data sets.
  • Can save results as txt, html or xml.

20
(No Transcript)
21
Cross-validation of high-throughput data is
essential
22
Evaluating protein-interaction data
von Mering C, Krause R, Snel B, Cornell M,
Oliver SG, Fields S, Bork P (2002) Comparative
assessment of large-scale data sets of
proteinprotein interactions. Nature 417,
399-403. Cornell M, Paton NW, Oliver SG (2004) A
critical and integrated view of the yeast
interactome. Comp. Funct. Genom. 5, 382-402
23
Set of confirmed Y2H interactions
Confirmation of an interaction requires
  • Identification in more than one Y2H screen, OR
  • The reverse interaction must have been
    identified, OR
  • The two proteins must have been identified in the
    same protein complex (from either classical or
  • high-throughput affinity purification
    studies).

A total of 451 reliable interactions,
involving 581 proteins have been identified
from a combined data set comprising 5214
interactions and 4025 proteins
24
   
25
Quantitative comparison of interaction datasets.
26
GENOME
TRANSCRIPTOME
You dont get owt for nowt
PROTEOME
METABOLOME
27
Yeast aint the only show in town either!
28
GIMS empowers the biologist
29
Resources at the centre
Workflows that could be used to generate this data
People who have registered an interest in these
data
Related Data
Provenance record on how the data were produced
Ontologies describing data
30
Biologists/Clinicians at the centre
Workflows they wrote or used
People they collaborate with
31
myGrid
  • EPSRC UK e-Science pilot project.
  • Open Source Upper Middleware for Bioinformatics.
  • (Web) Service-based architecture -gt Grid services.

www.mygrid.org.uk
32
iSPIDER A Pilot Grid for Integrative Proteomics
  • In Silico Proteome Integrated Data Environment
    Resource

33
Diversity of proteome data
gels
sequences
gtA01562 MAPKATYLIGAADKFHW gtA01567 MAQQPKEMLNILADKF
HWFLYC
Other data Species, PTMS, pathways, functional
annotation, transcriptome data
Structures/folds
mass spec
34
Integration problems
  • Lack of specific middleware
  • Existing resources not wrapped
  • Lack of data standards
  • Standards for proteomics, incl. MS and protein
    identification are emerging
  • Data not modelled
  • New challenges from proteomics
  • Data not captured/modelled
  • Data not captured
  • No mature repositories/databases for some
    proteome data
  • But there are lots of data

35
Aims
  • To develop an integrated platform of proteomic
    data resources enabled as Grid/Web services
  • Integrate existing proteome resources, enabling
    them as Grid/Web services.
  • To develop novel, proteome-specific databases as
    part of iSPIDER delivered as Grid/Web and
    browser-based services
  • A repository for experimental proteome data
  • A proteome protein identification server and
    database
  • A phosphoproteome specific database
  • To develop middleware support for distributed
    querying, workflows and other integrated data
    analysis tasks
  • Demonstrate effectiveness of the resulting
    infrastructure studies in proteomics, including
  • Visualisation clients for proteomic data e.g. LRF
    data
  • Analyses for fungal species of industrial
    interest
  • Protein structural/functional trends in
    experimental proteomics e.g. linking domain
    structural patterns

36
Integrated Proteomics Informatics Platform -
Architecture
ISPIDER Proteomics Clients
Vanilla Query Client
PPI Validation Analysis Client
Protein ID Client
WP3
WP4
WP6
WP1
WP5
WP2
Web services
ISPIDER Proteomics Grid Infrastructure
Existing E-Science Infrastructure
WP1

Public Proteomic Resources
WP6
WP3
Existing Resources
iSPIDER Resources
KEY WS Web services, GS Genome sequence, TR
transcriptomic data, PS protein structure, PF
protein family, FA functional annotation, PPI
protein-protein interaction data, WP Work
Package
37
Existing infrastructure and skills
  • myGRID
  • OGSA-DQP
  • AutoMed
  • PSI/Pedro infrastructure/standards
  • Protein id tools at Manchester
  • 3 primary data integration strategies
  • Workflows
  • DQP using OGSA-DAI
  • Heterogenous schema integration technologies

38
Need to share and cross-validate analytical
procedures
39
Workflow Components
Freefluo
Freefluo Workflow engine to run workflows
Scufl Simple Conceptual Unified Flow
Language Taverna Writing, running workflows
examining results SOAPLAB Makes applications
available
40
Security Ethics
Write a Comment
User Comments (0)
About PowerShow.com