Semantic Aggregation, Integration, and Inference of Pathway Data - PowerPoint PPT Presentation

1 / 96
About This Presentation
Title:

Semantic Aggregation, Integration, and Inference of Pathway Data

Description:

This file lists all chemical reactions in the PGDB. Attributes: UNIQUE ... 5. Identify biological and chemical properties and disease associations of biomarker ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 97
Provided by: joannel3
Category:

less

Transcript and Presenter's Notes

Title: Semantic Aggregation, Integration, and Inference of Pathway Data


1
Semantic Aggregation, Integration, and Inference
of Pathway Data
(Pedantic Aggravation, Irritation, and
Interference)
  • Co-Destructors
  • Joanne Luciano, PhD
  • jluciano_at_biopathways.org
  • Jeremy Zucker
  • zucker_at_research.dfci.harvard.edu

ISMB 2005 Tutorial Detroit Michigan June 25th
2005 http//www.biopathways.org/ismb2005tutorial-a
m6/
2
Overview
  • Introduction (45 minutes)
  • Time Out (15 minutes)
  • Workshop Case Studies Exercises (2 hrs 15
    minutes)
  • Subdivide into groups of triads and dyads with 5
    minute breaks in between case studies
  • Case Study I (40 minutes)
  • Case Study II (40 minutes)
  • Case Study III (40 minutes)
  • Time Out (15 minutes)
  • Lessons Learned (30 minutes)
  • Lessons Not Yet Learned (take home)

3
Introduction (45 minutes)
  • Semantic Aggregation, Integration and Inference
    of Pathway Data
  • Pathway Data (domain)
  • Why do we care? (motivation)
  • What is it?
  • What does it look like?
  • Definitions Disclaimers
  • Methodology

4
Pathway Data Why do we care?
  • Pathway Research has Broad Impact
  • Drug Discovery (pathway of target, safety)
  • Basic Science (identify pathways)
  • Disease Research (cancer pathways)
  • Environmental Research (microbial research)
  • Combine knowledge from multiple sources
  • Whole is greater than the sum of its parts
  • Biological knowledge is fragmented
  • Need database to manage resources

5
Different types of pathways(different strokes
for different folks, its OK.)
Glycolysis
Protein-Protein
Apoptosis
Lac Operon
Molecular Interaction Networks
Gene Regulation
Signaling Pathways
Metabolic Pathways
The Main Categories
6
Different representations of the same pathways
lt!ELEMENT reaction (substrate,product)gt lt!ATTLIS
T reaction name keggid.type
REQUIREDgt lt!ATTLIST reaction type
reaction-type.type REQUIREDgt lt!ELEMENT
substrate EMPTYgt lt!ATTLIST substrate name
keggid.type REQUIREDgt lt!ELEMENT product
EMPTYgt lt!ATTLIST product name keggid.type
REQUIREDgt
starts at a-D-Glucose 1P
KEGG Reference Pathway GLYCOLYSIS
7
Different representations of the same pathways
reactions.dat This file lists all chemical
reactions in the PGDB. Attributes UNIQUE-ID
TYPES COMMON-NAME ACTIVATORS
BASAL-TRANSCRIPTION-VALUE DBLINKS DELTAG0
DEPRESSORS EC-LIST EC-NUMBER
ENZYMATIC-REACTION EQUILIBRIUM-CONSTANT
IN-PATHWAY INHIBITORS LEFT MOVED-IN
MOVED-OUT OFFICIAL-EC? REACTANTS REQUIREMENTS
RIGHT SIGNAL SPECIES SPONTANEOUS?
STIMULATORS SYNONYMS
starts at b-D-glucose6-phosphate
BioCYC Reference Pathway GLYCOLYSIS
8
Different representations of the same pathways
ltreaction name"R_alpha_D_glucose_6_phosphate_D_fr
uctose_6_phosphate" id"R_163457"gt ltlistOfReactant
sgt ltspeciesReference species"R_30537_alpha_D_Gluc
ose_6_phosphate" /gt lt/listOfReactantsgt ltlistOfProd
uctsgt ltspeciesReference species"R_29512_D_Fructos
e_6_phosphate" /gt lt/listOfProductsgt ltlistOfModifie
rsgt ltmodifierSpeciesReference species"R_163455_gl
ucose_6_phosphate_isomerase_dimer_name_copied_from
_complex_in_Homo_sapiens_" /gt lt/listOfModifiersgt lt
/reactiongt
DatabaseObject 41245 Event 8285
Reaction 6598 ConcreteReaction 4034
GenericReaction 2564
Reactome Pathway GLYCOLYSIS
9
Different representations of the same pathways
Does not compute. Pretty, but useless
Reactions clickable but...
Starts at Glucose (but it doesnt matter)
BioCarta Reference Pathway GLYCOLYSIS
10
Pathway Data (domain)
How bad is it? Pathway Databases
So many pathway databases, so little time.
Graphic from Mike Cary and Gary Bader
11
Definitions Disclaimers
  • Aggregation
  • 2 (or more) data sources, different schema. How
    are they related? By creating explicit cross
    references between them.
  • Integration
  • 2 (or more) data sources, same schema. How does
    it all fit together? Creating a standard schema,
    semantic mapping and instance merging, or (entity
    resolution) required.
  • Inference
  • 1 (or more) data sources, one schema. creating
    new instances or new relationships from existing
    data rules.
  • (Evidence code type kind of inference)
  • Disclaimer
  • Controlled Vocabulary scope this tutorial

12
Our methodology
  • Define the goal of the project
  • What are the questions are you trying to answer?
  • Take stock of current information resources
  • Experts
  • Tools
  • Data sources
  • Scope the work to get from B to A
  • Aggregation, Integration, or Inference?
  • Data Cleaning

13
Assembling KnowledgeAggregation, Integration,
Inference
When it comes to data cleaning, theres no such
thing as a free lunch. Tim Berners-Lee
Some tasks are specific to a use case, some are
common to more than one and theres no escaping
others.
14
Time Out (15 minutes)
15
Workshop Case Studies Exercises(2 hrs 15
minutes)
  • Break into groups of triads and dyads
  • Aggregation Workshop (45 minutes)
  • Aggregation The Siderean Demo
  • Group Exercise 1 Pedantic Aggravation
  • Integration Workshop (45 minutes)
  • Integration BioPAX Initiative
  • Group Exercise 2 Pedantic Irritatation
  • Inference Workshop (45 minutes)
  • Inference Flux Balance Analysis
  • Group Exercise 3 Pedantic Interference

16
Methodology
  • Define the goal of the integration
  • How will the integrated data be used?
  • This defines the level of integration from
    syntactic through semantic
  • Take stock of current resources
  • This defines your staring point
  • Data base sources, programmers, lab access,
    collaborators
  • Scope the work to get from B to A
  • Data Profiling
  • Resource Profiling

17
Case Study IThe Siderean DemoAggregation
  • Question What drugs can be used as candidates
    for treating for B-cell Lymphoma patients?
  • By comparing gene expression patterns between
    patients with and without B-cell lymphoma, a top
    biomarker was found BRKCB-1

18
Seamark Demonstration Identification of new
drug candidates for BRKCB-1
1. Differentiate different forms of disease 2.
Identify patients subgroups. 3. Identify top
biomarkers 4. Identify function 5. Identify
biological and chemical properties and disease
associations of biomarker 6. Identify
documents 7. Identify role in metabolic
pathways 8. Identify compounds that interact 9.
Identify and compare function in other
organisms 10. Identify any prior art
Gene
GO.rdf
Enzyme
Enzymes.rdf
19
Seamark Demonstration Identification of new
drug candidates for BRKCB-1
1. Differentiate different forms of disease 2.
Identify patients subgroups. 3. Identify top
biomarkers 4. Identify function 5. Identify
biological and chemical properties and disease
associations of biomarker 6. Identify
documents 7. Identify role in metabolic
pathways 8. Identify compounds that interact 9.
Identify and compare function in other
organisms 10. Identify any prior art
Gene
GO.rdf
GO2Enzyme.rdf
Enzyme
Enzymes.rdf
20
Seamark Demonstration Identification of new
drug candidates for BRKCB-1
1. Differentiate different forms of disease 2.
Identify patients subgroups. 3. Identify top
biomarkers 4. Identify function 5. Identify
biological and chemical properties and disease
associations of biomarker 6. Identify
documents 7. Identify role in metabolic
pathways 8. Identify compounds that interact 9.
Identify and compare function in other
organisms 10. Identify any prior art
Gene
MIM Id
OMIM.rdf
GO.rdf
GO2Enzyme.rdf
Enzyme
Enzymes.rdf
21
Seamark Demonstration Identification of new
drug candidates for BRKCB-1
1. Differentiate different forms of disease 2.
Identify patients subgroups. 3. Identify top
biomarkers 4. Identify function 5. Identify
biological and chemical properties and disease
associations of biomarker 6. Identify
documents 7. Identify role in metabolic
pathways 8. Identify compounds that interact 9.
Identify and compare function in other
organisms 10. Identify any prior art
GO2OMIM.rdf
Gene
MIM Id
OMIM.rdf
GO.rdf
GO2Enzyme.rdf
Enzyme
Enzymes.rdf
22
Seamark Demonstration Identification of new
drug candidates for BRKCB-1
1. Differentiate different forms of disease 2.
Identify patients subgroups. 3. Identify top
biomarkers 4. Identify function 5. Identify
biological and chemical properties and disease
associations of biomarker 6. Identify
documents 7. Identify role in metabolic
pathways 8. Identify compounds that interact 9.
Identify and compare function in other
organisms 10. Identify any prior art
23
Seamark Demonstration Identification of new
drug candidates for BRKCB-1
1. Differentiate different forms of disease 2.
Identify patients subgroups. 3. Identify top
biomarkers 4. Identify function 5. Identify
biological and chemical properties and disease
associations of biomarker 6. Identify
documents 7. Identify role in metabolic
pathways 8. Identify compounds that interact 9.
Identify and compare function in other
organisms 10. Identify any prior art
24
Aggregation Methodology
  • For each Data source
  • Motivation Why is this database included? What
    will it allow the user to do?
  • Source Where can the original data be
    found/downloaded?
  • Original format RDF, XML, flat file
  • Link values What predicate classes found in the
    data will be used to link to other data sources?
    What sources will it link to?
  • Transformation How was the data changed to
    expose the links?
  • Notes Instructive examples of usage, caveats,
    etc.

25
Aggregation Methodology
  • GO to Probe Set
  • Motivation Links the Probe Set to the Genes
  • Source Affymetrix, Genecruiser
  • Original format text
  • Link values identified 42 genes to drive the
    demo
  • Transformation Manually created from output of
    Gene Cruiser
  • Notes Scaleable solution still needed.

26
GO to Probe Set
27
GO to Enzymes
  • Motivation Links between GO and Enzyme
  • Source ec2go.txt
  • Original format flat file
  • Link values goid and EC number are referenced
  • Transformation inserted flatfile to relational
    table and issued SQL query to create rdf
  • Notes Need authoritative LSID

28
Gene Ontology to Enzymes
29
Group Exercise IYou-do-it Aggregation
  • Aggregation or Aggravation?

30
Time Out (15 minutes)
31
Different types of pathways(different strokes
for different folks, its OK.)
Glycolysis
Protein-Protein
Apoptosis
Lac Operon
Molecular Interaction Networks
Gene Regulation
Signaling Pathways
Metabolic Pathways
The Main Categories
32
Different representations of the same pathways
lt!ELEMENT reaction (substrate,product)gt lt!ATTLIS
T reaction name keggid.type
REQUIREDgt lt!ATTLIST reaction type
reaction-type.type REQUIREDgt lt!ELEMENT
substrate EMPTYgt lt!ATTLIST substrate name
keggid.type REQUIREDgt lt!ELEMENT product
EMPTYgt lt!ATTLIST product name keggid.type
REQUIREDgt
starts at a-D-Glucose 1P
KEGG Reference Pathway GLYCOLYSIS
33
Different representations of the same pathways
reactions.dat This file lists all chemical
reactions in the PGDB. Attributes UNIQUE-ID
TYPES COMMON-NAME ACTIVATORS
BASAL-TRANSCRIPTION-VALUE DBLINKS DELTAG0
DEPRESSORS EC-LIST EC-NUMBER
ENZYMATIC-REACTION EQUILIBRIUM-CONSTANT
IN-PATHWAY INHIBITORS LEFT MOVED-IN
MOVED-OUT OFFICIAL-EC? REACTANTS REQUIREMENTS
RIGHT SIGNAL SPECIES SPONTANEOUS?
STIMULATORS SYNONYMS
starts at b-D-glucose6-phosphate
BioCYC Reference Pathway GLYCOLYSIS
34
Different representations of the same pathways
ltreaction name"R_alpha_D_glucose_6_phosphate_D_fr
uctose_6_phosphate" id"R_163457"gt ltlistOfReactant
sgt ltspeciesReference species"R_30537_alpha_D_Gluc
ose_6_phosphate" /gt lt/listOfReactantsgt ltlistOfProd
uctsgt ltspeciesReference species"R_29512_D_Fructos
e_6_phosphate" /gt lt/listOfProductsgt ltlistOfModifie
rsgt ltmodifierSpeciesReference species"R_163455_gl
ucose_6_phosphate_isomerase_dimer_name_copied_from
_complex_in_Homo_sapiens_" /gt lt/listOfModifiersgt lt
/reactiongt
DatabaseObject 41245 Event 8285
Reaction 6598 ConcreteReaction 4034
GenericReaction 2564
Reactome Pathway GLYCOLYSIS
35
Case study II IntegrationThe BioPAX initiative
  • A. How to create a standard pathway
    representation?
  • B. Resources
  • Pathway Databases/data providers
  • Granting agencies/program managers
  • Software tools/tool developers
  • Ontologies/ontology experts
  • Leadership/dedicated group of users

36
Case study II IntegrationThe BioPAX initiative
  • A. How to create a standard pathway
    representation?
  • B. Resources
  • Pathway Databases/data providers
  • Granting agencies/program managers
  • Software tools/tool developers
  • Ontologies/ontology experts
  • Leadership/dedicated group of lusers

37
Methodology part cTake over the world or have
the world take over itself?
  • Develop bridging technologies
  • Develop pathway representation standard within
    the Life Science community (BioPAX) (Social
    Engineering!)
  • Utilize Semantic Web Integration Technologies
    (LSID, RDF/OWL)

38
Exchange Formats in Pathway Data Space(Scope)
Graphic from Mike Cary Gary Bader
39
BioPAX Objectives
  • Accommodate existing database representations
  • Integration and exchange of pathway data
  • Interchange through a common (standard)
    representation
  • Provide a basis for future databases
  • Enable development of tools for searching and
    reasoning over the data

40
BioPAX Motivation
gt180 DBs and tools
Application
Database
User
Before BioPAX
With BioPAX
Common format will make data more accessible,
promoting data sharing and distributed curation
efforts
41
BioPAX Biological PAthway eXchange
  • An abstract data model for biological pathway
    integration
  • Initiative arose from the community

42
Biological pathways of the Cell What is a
Pathway?
Glycolysis
Apoptosis
Lac Operon
Protein-Protein
Molecular Interaction Networks
Gene Regulation
Metabolic Pathways
Signaling Pathways
BioPAX Level 1
BioPAX Level 2
43
Data integration with BioPAX
  • Multiple kinds of pathway databases
  • metabolic
  • molecular interactions
  • signal transduction
  • Constructs designed for integration
  • DB References
  • XRefs (Publication, Unification, Relationship)
  • synonyms
  • provenance
  • OWL DL to enable reasoning

44
BioPAX Biochemical Reaction
OWL (schema)
Instances (Individuals) (data)
phosphoglucose isomerase
5.3.1.9
45
BioPAX uses other ontologies
  • Use pointers to existing ontologies to provide
    supplemental annotation where appropriate
  • Cellular location ? GO Component
  • Cell type ? Cell.obo
  • Organism ? NCBI taxon DB
  • Incorporate other standards where appropriate
  • Chemical structure ? SMILES, CML, INCHI

46
BioPAX Ontology Overview
a set of interactions
parts
how the parts are known to interact
Level 1 v1.0 (July 7th, 2004)
47
BioPAX Ontology Top Level
  • Pathway
  • A set of interactions
  • E.g. Glycolysis, MAPK, Apoptosis
  • Interaction
  • A set of entities and some relationship between
    them
  • E.g. Reaction, Molecular Association, Catalysis
  • Physical Entity
  • A building block of simple interactions
  • E.g. Small molecule, Protein, DNA, RNA

Graphic from Gary Bader
48
BioPAX Ontology Root
  • Root class Entity
  • Any concept referred to as a discrete biological
    unit when describing pathways. This is the root
    class for all biological concepts in the
    ontology, which include pathways, interactions
    and physical entities

49
Metabolic Pathways
  • Interaction sub-classes
  • Definition
  • An entity that defines a single biochemical
    interaction between two or more entities.
  • An interaction cannot be defined without the
    entities it relates.

50
Metabolic Pathways
  • Interaction sub-classes
  • Definition Two terms exist under interaction
    Control and conversion. In future BioPAX levels,
    this list may be extended to include other
    classes, such as genetic interactions.

Examples Enzyme catalysis controls a biochemical
reaction, transport catalysis controls transport,
a small molecule that inhibits a pathway by an
unknown mechanism controls the pathway.
51
Group Exercise IISemantic Ingegration
  • Mapping of Biocarta Metabolic pathway
  • to BioPAX.

52
Case study IIIInference of a Metabolic flux
model from an annotated genome
  • A. How to infer steady-state flux distributions
    in single-cell organisms?
  • B. Information sources
  • Stoichiometric matrix
  • Thermodynamic constraints
  • Nutrient uptake rates
  • Biomass composition
  • Gene-protein-reaction associations

53
Case study IIIA. How to infer steady-state
metabolic flux distributions in single-cell
organisms?
  • B. Information sources
  • Stoichiometric matrix
  • Thermodynamic constraints
  • Nutrient uptake rates
  • Biomass composition
  • Gene-protein-reaction associations

54
What is Metabolic Flux Analysis?
  • Starts with the metabolic network
  • Assumes steady-state behavior
  • Constrain with Thermodynamics
  • Add Nutrient conditions
  • Choose an objective Biomass growth
  • Predicts growth rate for mutant and wild-type
    organisms under different conditions.

55
Start with the metabolic network
56
Stoichiometric Matrix Representation of the
metabolic network
R1 ? A R2 A ? B R3 A ? C
R4 B E ? 2D
R5 ? E R6 2B ? C F R7 C ? D R8 D ? R9
F ?
57
What is a metabolic flux?
Source fluxes
Metabolite Pool
Sink fluxes
58
What is a metabolic flux?
For a reaction of stoichiometry R2 A ? B the
rate of reaction, or flux is equal to
For a reaction of stoichiometry R4 BE ?
2D the flux is equal to
59
What is a metabolic flux?
For a reaction of stoichiometry R4 BE ?
2D The rate of reaction, or flux, is equal to
60
At steady-state, nonlinear dynamics simplify to
linear fluxes.
B
B
k2
v2
P2
k1
v1
A
P1
Aext
A
Aext
k3
P3
v3
C
C
61
At steady-state, the sum of the fluxes that
produce a metabolite is equal to the sum of the
fluxes that consume it.
B
v2
v1
A
Aext
v3
C
62
Stoichiometric Matrix more unknowns than
equations
63
How to determine the metabolic capabilities of a
network?
Uptake v5
E
B
v4
v2
2B
Biomass v8
2D
Uptake v1
A
v6
D
v3
F
v7
C
Waste v9
64
Using Elementary modes to study the steady
state-behavior
v5
v5
E
E
B
E
E
B
v4
v2
2B
v4
v2
2B
2D
v1
2D
A
v1
v6
v8
A
D
v8
v6
D
v3
v3
F
v7
F
C
v7
C
R9
v9
v5
E
B
v4
v2
2B
2D
v1
A
v8
v6
D
v3
F
v7
C
v9
65
How to draw conclusions about the behavior of the
metabolic network?
Uptake v5
E
B
v4
v2
2B
Biomass v8
2D
Uptake v1
A
v6
D
v3
F
v7
C
Waste v9
66
Optimal wild-type flux distribution
v5
10
Optimal Growth Flux
E
B
v4
2B
v2
10
10
2D
v1
v8
A
v6
10
D
20
v3
F
v7
C
v9
67
Optimal mutant flux distribution
v5
E
B
v4
2B
v2
STOP
2D
v1
v8
A
v6
10
D
10
v3
10
10
F
C
v7
v9
68
Suboptimal mutant flux distribution
v5
E
B
v4
2B
v2
STOP
6.7
2D
v1
v8
3.3
A
10
v6
D
6.7
v3
3.3
6.7
F
C
v7
3.3
v9
69
Case III Integrated Metabolic flux model
  • good flux balance model
  • implicit schema
  • literature curated biochemical reactions
  • 904 enzymatic reactions
  • gene, enzyme-reaction associations

70
Model vs. Exper., Glucose limited
(fluxes in mmol/gr DM h normalized to glucose
uptake flux)
(Segrè, Vitkup and Church, PNAS 2002)
71
Low Glucose Limited
High Glucose Limited
Nitrogen Limited
ni (exper)
ni (exper)
ni (exper)
Corr.coeff.0.91
Corr.coeff.0.97
Corr.coeff.0.78
72
Max growth (optimal)
Min Adjust. (suboptimal)
Corr
.
coeff
.0.564
250
P
-
value0.007
200
7
)
8
150
theor
10
13
9
100
11
14
3
(
12
1
i
v
50
16
2
15
17
6
5
0
4
-
50
-
50
0
50
100
150
200
250
v
(
exper
)
i
73
The power of a model lies in its ability to
distinguish between competing hypotheses
74
Bugs in model assumptions revealed by comparison
to experiment
75
Data Profiling of Flux Model
  • Incorrect constraints (reversibility)
  • Incorrect Nutrient conditions
  • Incorrect Biomass composition
  • Incorrect protein function predictions

76
Data profiling of Flux Predictions
  • Incorrect hypothesis
  • (FBA vs MOMA vs ROOM)
  • Incorrect network architecture
  • (Gene knockouts)
  • Incorrect modeling assumptions
  • (steady state assumption,
  • gene expression profiles)

77
Fixing the problems you find
  • Requires different amounts of time, money, and
    expertise
  • Enzyme Genomics project
  • Community annotation projects
  • Adopt-a-Genome project
  • High-throughput experiments
  • Pathway hole filling algorithms

78
Syntactic Bugs revealed by mass balance
  • maltodextrin phosphate ? maltotetraose
    glucose-1-phosphate
  • maltotetraose glucose-1-phosphate ? Glycogens
    phosphate
  • Glycogens phosphate ? maltodextrin
    glucose-1-phosphate
  • --------------------------------------------------
    -------------------------------
  • phosphate ? glucose-1-phosphate !!!
  • Solution Check that each reaction is balanced
  • Molecular weights WATER 18 daltons
  • Chemical formulae (C 6) (H 12) (O 6)
  • Atomic structure

4-(1-D-ribitylamino)-5-amino-2,6-dihydroxypyrimidi
ne 5-amino-6-ribitylamino-2,4(1H,3H)-pyrimidined
ione
79
Semantic bugs revealed by chemical structure
EcoCyc 7.5 Pathway Riboflavin and FMN and FAD
biosynthesis
No place to go!
4-(1-D-ribitylamino)-5-amino-2,6-dihydroxypyrimidi
ne
80
Semantic bugs revealed by chemical structure
EcoCyc 8.0 Pathway Riboflavin and FMN and FAD
biosynthesis
Synonyms
4-(1-D-ribitylamino)-5-amino-2,6-dihydroxypyrimidi
ne
81
Bugs in Network structure revealed by Forward and
Backward chaining
Known Nutrient set
Fired Reaction
Unfired Reaction
Essential compounds
Missing essential compound
Biomass
82
Bugs in Network structure revealed by Forward and
Backward chaining
Unproduced metabolite
Precursor metabolite
Essential compounds
Missing essential compound
Biomass
83
Case III Semantic Aggregation Case study
  • Prochlorococcus marinus MED4
  • Most abundant species in the ocean
  • Responsible for a significant portion of
    photosynthetic carbon fixation.
  • Iron hypothesis Possible solution to global
    warming?
  • Need to understand details of metabolic network

84
Group Exercise III Inference
  • Metabolic Inference
  • Inference of a metabolic reaction from a pathway

85
Time Out (15 minutes)
86
Lessons Learned(30 minutes)
  • What did you learn?
  • Discussion
  • A good representation is the key to good problem
    solving Patrick Winston
  • Standard is better than bestGerald J Sussman
  • The great thing about standards is that there
    are so many from which to choose --Unknown
  • Above all, one must develop a feeling for the
    organism.Barbara McClintock
  • Someone does it once, everybody benefits.Eric
    Miller, W3C Semantic Web Activity Lead
  • Remember people, process, technology, however
    without people there isnt any process or
    technology, so its all social engineering.

87
Discussions(30 minutes)
  • Does Inference subsume Integration?
  • Does Integration subsume Aggregation?
  • Overlaying Microarray data on to pathways
  • BioPAX used to make SBML more easily integratable
  • Shcema level / instance level problems

88
Bridging Chemistry and Molecular Biology
  • Different Views have different semantics Lenses
  • When there is a correspondence between objects,
    a semantic binding is possible

UniprotP49841
Apply Correspondence Ruleif ?target.xref.lsid
?bpxprot.xref.lsidthen ?target.correspondsTo.
?bpxprot
Source Eric Neumann Haystack BioDASH Demo
http//www.w3.org/2005/04/swls/BioDash/Demo/
89
SMBL integration using BioPAX
  • Use BioPAX to Address SBMLs data integration
    issues
  • Different data types, same representation
  • Same data, different representations
  • External references
  • Synonyms
  • Provenance

90
A problem same representation different
semantics (SBML)
  • Protein-Protein Interaction
  • ltreaction
  • idpyruvate_dehydrogenase_cplx/gt
  • ltlistOfReactantsgt
  • ltspeciesRef speciesPdhA/gt
  • ltspeciesRef speciesPdhB/gt
  • lt/listOfReactantsgt
  • ltlistOfProductsgt
  • ltspeciesRef speciesPyruvate_dehydrogenase_E1
    /gt
  • lt/listOfProductsgt
  • lt/reactiongt

Biochemical Reaction ltreaction
idpyruvate_dehydrogenase_rxn/gt
ltlistOfReactantsgt ltspeciesRef
speciesNADP/gt ltspeciesRef speciesCoA/gt
ltspeciesRef speciespyruvate/gt
lt/listOfReactantsgt ltlistOfProductsgt
ltspeciesRef speciesNADPH/gt ltspeciesRef
speciesacetyl-CoA/gt ltspeciesRef
speciesCO2/gt lt/listOfProductsgt
ltlistOfModifersgt ltmodifierSpeciesRef
speciespyruvate_dehydrogenase_E1/gt
lt/listOfModifiersgt lt/reactiongt
91
SBML annotated with BioPAX
  • ltsbml xmlnsbphttp//www.biopax.org/release1/bio
    pax-release1.owl
  • xmlnsowl"http//www.w3.org/2002/07/owl"
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf
    -syntax-ns"gt
  • ltlistOfSpeciesgt
  • ltspecies idPdhA metaidPdhAgt
  • ltannotationgt
  • ltbpprotein rdfIDPdhA/gt
  • lt/annotationgt
  • lt/speciesgt
  • ltspecies idNADP metaidNADPgt
  • ltannotationgt
  • ltbpsmallMolecule rdfIDNADP/gt
  • lt/annotationgt
  • lt/listOfSpeciesgt
  • ltlistOfReactionsgt
  • ltreaction idpyruvate_dehydrogenase_cplxgt
  • ltannotationgt
  • ltbpcomplexAssembly rdfIDpyruvate_dehydrog
    enase_cplx/gt
  • lt/annotationgt

species is protein protein is PdhA
species is small molecule small molecule is NADP
92
BioPAX External References
  • ltspecies idpyruvate metaidpyruvategt
  • ltannotation
  • xmlnsbphttp//biopax.org/release1/biopax-r
    elease1.owlgt
  • ltbpsmallMolecule rdfIDpyruvategt
  • ltbpXrefgt
  • ltbpunificationXref
    rdfIDunificationXref119"gt
  • ltbpDBgtLIGANDlt/bpDBgt
  • ltbpIDgtc00022lt/bpIDgt
  • lt/bpunificationXrefgt
  • lt/bpXrefgt
  • lt/bpsmallMoleculegt
  • lt/annotationgt
  • lt/speciesgt

93
BioPAX Synonyms
  • ltspecies idpyruvate metaidpyruvategt
  • ltannotation xmlnsbphttp//biopax.org/release1/b
    iopax_release1.owl/gt
  • ltbpsmallMolecule rdfIDpyruvate gt
  • ltbpSYNONYMSgt2-oxo-propionic
    acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgt2-oxopropanoatelt/bpSYNONYMSgt
  • ltbpSYNONYMSgtBTSlt/bpSYNONYMSgt
  • ltbpSYNONYMSgtpyruvic acidlt/bpSYNONYMSgt
  • lt/bpsmallMoleculegt
  • lt/annotationgt
  • lt/speciesgt

94
Lessons Not Yet Learned(Take home exercise)
95
Feedback
  • Our goal is to have you walk away with a clear
    understanding of how to approach any database
    integration project
  • To provide
  • A methodology to scope and plan the project
  • An understanding of what to expect
  • Some specific examples to illustrate what is
    common to all integration projects (data
    cleaning) and what specific to a particular task.
    (i.e. to provide you with examples to give a
    sense of it)
  • Some first hand experience at pedantic
    aggravation, irritation and interference
  • How did we do? Please let us know how we can
    improve this tutorial.

96
Thank You Joanne Jeremy
Write a Comment
User Comments (0)
About PowerShow.com