Title: DIAMONDS project
1 DIAMONDS project
- DELIVERABLE 5.2 draft 2.0
- Definition of the essential design
characteristics of the toolbox GUI - BILBAO TECHNICAL MEETING
- June 20th and 21st
2GENERAL INDEX
- BLOCK DIAGRAM
- General View
- Workbench operation
- Block Diagram modules
- Tools assignments
- WEB SERVICES
- Usual Operation
- Diamonds context
- Example
- CORE DATABASE
- XML LANGUAGES
- DEVELOPMENT STAGES AND TIMING PROPOSAL
3BLOCK DIAGRAM. General view
VISUALIZATION
4BLOCK DIAGRAM. Modules
In the present module we classify proteins into
sets based on function similarity.
This module will be available to predict residue
substitutions to modulate (or change) the binding
specificity and therefore change the regulation
of the interaction network.
To upload to the workbench any user data.
Specially designed to upload wet lab experiments
results.
For microarray expression data, we can execute
different analysis options available in the
platform
Promoters will be scanned for known and unknown
regulatory elements using three different methods
that use different strategies
- This module should allow the visualization of
- Any data stored by the user during the whole
process - Any other data (not users property) stored in
GIN-DB - Some new external user data
From a gene sequence we could infer homologies in
other species or in some other genes in the same
species. This module is structured in a single
block.
The objective of this module is to reconstruct
the evolutionary relationships between cell cycle
genes.
This module looks for the annotation of
information related to pathways, specifically
metabolic pathways and signal transduction
pathways.
UPLOADING DATA
VISUALIZATION
This module is used for the prediction of
interaction modes and key interactive residues,
using the Y2H and TAP data generated within the
project, complemented with public data on protein
interactions.
- Differential Expression Analysis
- Detection of genes periodically expressed
- Microarray clustering
- Expression Profile Alignment
- Saco Patterns
- Gibbs Sampler
- Transfac DB matching
- Specify type of file
- Microarray Data
- Y2H, TAP
- Affinity Chromatography
- Phenotype Information
- GUS fusion promoters
- ChIP
Genes with significant diff expression
INTERACTION MODES AND KEY INTERACTIVE RESIDUES
optional
VISUALIZATION
These methods are complementary
Genes periodically expressed
- Clustering
- Graph based
- Hierarchical
homologies
Expression Profile Alignment
Gibbs Sampler
Differential Expression Analysis
Detection of genes periodically expressed
Microarray clustering
Saco Patterns
Transfac DB matching
species
Identifies patterns significantly
over-represented in the upstream regions relative
to a background set of upstream regions from the
same organism.
The known transcription factor binding sites in
the TRANSFAC database will be matched against the
same upstream regions.
EXTERNAL SOURCE
Replicate variation and the Hochberg-Benjamini
t-test
To detect periodically expressed genes in the
time series (transcriptome, proteome).
Similarity search, hierarchical clustering,
flat-flat clustering comparison, etc.
Template matching or alignment mode.
Looks for over-representation of degenerate
patterns.
MODELLING TYPES
Data clusters
Several approaches for inferring molecular
phylogenies will be applied
EXTERNAL SOURCE
- Boolean
- ODE
- PLDE
- Stochastic equations
- Hybrid modelling
The main objectives are the identification of
possible partner proteins to build
three-dimensional models of the corresponding
complexes, their domain decomposition, and the
careful construction of corresponding alignments
between target and sequences to model.
The amount of information stored in GIN-DB will
be vast, therefore there should be some
mechanisms for filtering to make it easy for the
user to select the information to visualize.
Residue substitution effect prediction
- Microarray data
- Specify chip used
- Initial preprocessing block
- normalization, hybridisation, expression index
calculation
- Pairwise Distance methods
- Maximum Likelihood
- Maximum Parsimony
- Bayesian statistics
- Genes are matched to the
- KEGG description of known cellular pathways.
- TRANSPATH database for signal transduction
Quantification (optional) Later and optionally,
the user can introduce some quantitative
properties for each node of the cluster
previously obtained.
Alignment results
Genes periodically expressed
Data clusters
Genes with significant diff. expression
Regulatory elements
Regulatory elements
Regulatory elements
Output can be useful in other modules, like
phylogenetic trees
Alignment results
Changing some residues in protein complex models
can change the protein function or structure,
generating new complex models
TEMPLATE MATCHING allows mining gene expression
time series for patterns that fit best a template
expression profile provided by the user
ALIGNMENT MODE allows finding the best time
alignment between two sets of gene expression
time series
5RESIDUE SUBSTITUTIONS
UPLOADING WET-LAB DATA
INTERACT MODES KEY INTERACTIVE RESIDUES
PROTEIN CLASSIFICATION
EXPRESSION DATA ANALYSIS
VISUALIZATION
PATHWAYS ANNOTATION
MODELLING SIMULATION
GENE HOMOLOGIES
PROMOTER ANALYSIS
species
PHYLOGENETIC TREES
6BLOCK DIAGRAM. Background modules
Annotation
Text Mining and Data Curation
Performed automatically and in a transparent way
for the user.
7BLOCK DIAGRAM. Background modules
Annotation
Text Mining and Data Curation
Performed by Pub Gene prior to the platform
launching
Text Mining
Pub Gene is going to extract relevant information
related to cell cycle mined from relevant
literature. The information will be mined from a
great variety data sources Medline, Entrez gene,
Homologene, Locuslink, Unigene, Ensembl, TIGR,
GoldenPath, RefSeq/GenBank, UniProt, PIR/NREF,
ENZYME, etc.
Data Curation
In addition, Pub Gene will make a curation tool
that the various participants with expert
knowledge within the different research fields
should use to curate the first round of data.
8BLOCK DIAGRAM. Workbench operation
VISUALIZATION
9BLOCK DIAGRAM. Workbench operation
FLEXIBLE and EXTENDABLE
10BLOCK DIAGRAM. Workbench operation
INFORMATION
AVAILABLE
o Input data files
o Processes done
UserName
LOGIN
o Result files
PASSWORD
11BLOCK DIAGRAM. Workbench operation
INFORMATION
DIAMONDS project
AVAILABLE
o Input data files
Expression Data Analysis
o Processes done
o Result files
Promoter Analysis
DONE
data1
data2
Microarray data
12BLOCK DIAGRAM. Workbench operation
INFORMATION
DIAMONDS project
AVAILABLE
o Input data files
Expression Data Analysis
o Processes done
o Result files
DONE
data1
data3
Promoter Analysis
13BLOCK DIAGRAM. Workbench operation
INFORMATION
DIAMONDS project
AVAILABLE
o Input data files
Expression Data Analysis
o Processes done
o Result files
- Metabolic Pathways Extraction
- Signal Transduction Pathways Extraction
DONE
data1
data4
Promoter Analysis
14BLOCK DIAGRAM. Workbench operation
INFORMATION
DIAMONDS project
AVAILABLE
o Input data files
Expression Data Analysis
o Processes done
o Result files
Promoter Analysis
15BLOCK DIAGRAM. Workbench input files
INTRODUCE INPUT FILE
- External user data
- File uploaded
- Result file
data1
EXTERNAL USER DATA
- Browse button
- Specify File and Type of file
- Generate folder structure
data1
Y2H file
16BLOCK DIAGRAM. Workbench input files
INTRODUCE INPUT FILE
- External user data
- File uploaded
- Result file
data1
FILE UPLOADED
- Input File has been previously uploaded using
the UPLOADING DATA option.
- Select the file in the package explorer
17BLOCK DIAGRAM. Workbench input files
INTRODUCE INPUT FILE
- External user data
- File uploaded
- Result file
data3
RESULT FILE
- Input File exists previously as a result file of
another process done.
- Select the file in the package explorer
- A new folder structure must been created for the
input data selected
18BLOCK DIAGRAM. Workbench files
19WEB SERVICES. Definitions
PLATFORM IMPLEMENTATION
Offer a single uniform method for application
integration through the Internet. They provide a
model for accessing SW systems over the web by
pointing to their web address, while their public
interfaces and bindings are defined and described
using an XML standard format.
WSDL Web Service Description Language
Defines the web service interface and the
exchange of messages between the provider and the
requester.
SOAP Simple Object Access Protocol
XML based protocol for stateless message
exchange. Generally built on top HTTP
UDDI Universal Description,Discovery and
Integration
Designed to provide a description and definition
of web services in a central repository.
20WEB SERVICES. Usual operation
1) Development and description WSDL
CLIENT
SERVER
2) Register
Web Service Consumer
Web Service Provider
7) Invoke methods (SOAP, HTTP)
5) Build Proxy
3) Get the service web addres, URI (Uniform
Resource Identifier) 4) Get the Web Service
description directly from the server
Envelope
2) UDDI registry Universal Description,
Discovery and Integration Registry
SERVER Entity which produces and offers the web
service
1) WSDL file Identifies the service and
indicates the way to use it and the protocols
allowed.
6) Build Client
Set of encoding rules
SOAP
CLIENT Consumer
Convention for RPCs and responses
21WEB SERVICES. DIAMONDS context
PARTNER
PARTNER
WEB PORTAL
PARTNER
PARTNER
22WEB SERVICES. Example creation
VISUAL STUDIO
Files generated
sample
sample
Global.cs
Service1.asmx
Service1.asmx.cs
This file has several response methods for some
events over general objects.
This is the page a user should ask for, in a web
browser. Invocates the source code.
Web service source code, invocated by the asmx
file.
23WEB SERVICES. Example source code
using System using System.Collections using
System.ComponentModel using System.Data using
System.Diagnostics using System.Web using
System.Web.Services namespace sample ///
ltsummarygt /// Summary Description for
Service1. /// lt/summarygt public class
Service1 System.Web.Services.WebService
public Service1()
//CODEGEN This call is required by the ASP
.NET Web services designer
InitializeComponent()
region Component Designer generated code
//Required by the Web Services Designer
private IContainer components null
private void InitializeComponent()
/// ltsummarygt
/// Clean up any resources being used
/// lt/summarygt protected override void
Dispose( bool disposing )
if(disposing components ! null)
components.Dispose()
base.Dispose(disposi
ng) endregion
//WEB SERVICES EXAMPLE
WebMethod public string HelloWorld()
return "Hello"
24WEB SERVICES. Example service use
25WEB SERVICES. Example
26CORE DATABASE
Must store all data involved in the process
input, result, models, images, etc
Two storing modes public and private
All queries to the database done by the web
portal, not by the component tools.
PARTNER a
WEB PORTAL
PROCESSING
CORE DB
Input data
PARTNER b
Ask for input data
Input data
Output data
Input data
Store output data
PROCESSING
Output data
ADVANTAGE Centralizes the queries.
It lowers the chances to
having inconsistency
27XML LANGUAGES
It is appropriate and necessary to have a UNIQUE
and uniform interchange language inside the
platform
XML LANGUAGE
MAGE-ML microarray data GXL pathways SBML
models DOT, SVG graphs FUGE-ML functional
genomics BSML gene sequences BIOML biosequence
features CML molecular information
28XML LANGUAGES
1. Specify the complete set of data
2. DB design
3. Extract XML language from the DB design
PARTNER a
SERVER
Web Service Provider
SERVER
Web Service Provider
WEB PORTAL
XML SCHEMA
CORE DB
Data extracted
Input data
CLIENT
WEB SERVICE
Web Service Consumer
PARTNER b
Ask for input data
SERVER
Input data
Output data
Web Service Provider
Input data
Application
Store output data
Data generated
Output data
29DEVELOPMENT STAGES AND TIMING PROPOSAL
NEXT 6 MONTHS
YEAR 2006
2 months
1. INPUT/OUTPUT SPECIFICATION.
July and August
1. INPUT/OUTPUT SPECIFICATION.
FINISH WEB SERVICE DEVELOPMENT.
INTEGRATION STAGE.
2.5 months
Sept, Oct and half Nov
2. DB DESIGN.
2. DB DESIGN.
PLATFORM DEVELOPMENT.
1 month
3. XML LANGUAGE DEVELOPMENT.
3. XML LANGUAGE DEVELOPMENT.
TESTING.
Half Nov and half Decem.
30QUESTIONS AND COMMENTS
?