Semantic Aggregation, Integration, and Inference of Pathway Data - PowerPoint PPT Presentation

1 / 92
About This Presentation
Title:

Semantic Aggregation, Integration, and Inference of Pathway Data

Description:

Bridging Chemistry and Molecular Biology. Uniprot:P49841 ... E.g. Reaction, Molecular Association, Catalysis. Physical Entity ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 93
Provided by: joannel3
Category:

less

Transcript and Presenter's Notes

Title: Semantic Aggregation, Integration, and Inference of Pathway Data


1
Semantic Aggregation, Integration, and Inference
of Pathway Data
(Pedantic Aggravation, Irritation, and
Interference)
  • Co-Destructors
  • Joanne Luciano, PhD
  • jluciano_at_biopathways.org
  • Jeremy Zucker
  • zucker_at_research.dfci.harvard.edu

ISMB 2005 Tutorial Detroit Michigan June 25th
2005
2
Overview
  • Introduction (45 minutes)
  • Time Out (15 minutes)
  • Workshop Case Studies Exercises (2 hrs 15
    minutes)
  • Subdivide into groups of triads and dyads
  • Case Study I (45 minutes)
  • Case Study II (45 minutes)
  • Case Study III (45 minutes)
  • Time Out (15 minutes)
  • Lessons Learned (30 minutes)
  • Lessons Not Yet Learned (take home)

3
Introduction (45 minutes)
  • Semantic Aggregation, Integration and Inference
    of Pathway Data
  • Pathway Data (domain)
  • What is it?
  • What does it look like?
  • Why do we care? (motivation)
  • Definitions Disclaimers
  • Strategies

4
Pathway Data (domain)
What is it? Pathway Databases
So many pathway databases, so little time.
Graphic from Mike Cary and Gary Bader
5
Different types of pathways(different strokes
for different folks, its OK.)
Glycolysis
Protein-Protein
Apoptosis
Lac Operon
Molecular Interaction Networks
Gene Regulation
Signaling Pathways
Metabolic Pathways
The Main Categories
6
Different representations of the same pathways
lt!ELEMENT reaction (substrate,product)gt lt!ATTLIS
T reaction name keggid.type
REQUIREDgt lt!ATTLIST reaction type
reaction-type.type REQUIREDgt lt!ELEMENT
substrate EMPTYgt lt!ATTLIST substrate name
keggid.type REQUIREDgt lt!ELEMENT product
EMPTYgt lt!ATTLIST product name keggid.type
REQUIREDgt
starts at a-D-Glucose 1P
KEGG Reference Pathway GLYCOLYSIS
7
Different representations of the same pathways
reactions.dat This file lists all chemical
reactions in the PGDB. Attributes UNIQUE-ID
TYPES COMMON-NAME ACTIVATORS
BASAL-TRANSCRIPTION-VALUE DBLINKS DELTAG0
DEPRESSORS EC-LIST EC-NUMBER
ENZYMATIC-REACTION EQUILIBRIUM-CONSTANT
IN-PATHWAY INHIBITORS LEFT MOVED-IN
MOVED-OUT OFFICIAL-EC? REACTANTS REQUIREMENTS
RIGHT SIGNAL SPECIES SPONTANEOUS?
STIMULATORS SYNONYMS
starts at b-D-glucose6-phosphate
BioCYC Reference Pathway GLYCOLYSIS
8
Different representations of the same pathways
ltreaction name"R_alpha_D_glucose_6_phosphate_D_fr
uctose_6_phosphate" id"R_163457"gt ltlistOfReactant
sgt ltspeciesReference species"R_30537_alpha_D_Gluc
ose_6_phosphate" /gt lt/listOfReactantsgt ltlistOfProd
uctsgt ltspeciesReference species"R_29512_D_Fructos
e_6_phosphate" /gt lt/listOfProductsgt ltlistOfModifie
rsgt ltmodifierSpeciesReference species"R_163455_gl
ucose_6_phosphate_isomerase_dimer_name_copied_from
_complex_in_Homo_sapiens_" /gt lt/listOfModifiersgt lt
/reactiongt
DatabaseObject 41245 Event 8285
Reaction 6598 ConcreteReaction 4034
GenericReaction 2564
Reactome Pathway GLYCOLYSIS
9
Different representations of the same pathways
Does not compute. Pretty, but useless
Reactions clickable but...
Starts at Glucose (but it doesnt matter)
BioCarta Reference Pathway GLYCOLYSIS
10
Pathway Data Why do we care?
  • Pathway Research has Broad Impact
  • Drug Discovery (pathway of target, safety)
  • Basic Science (identify pathways)
  • Disease Research (cancer pathways)
  • Environmental Research (microbial research)
  • Combine knowledge from multiple sources
  • Whole is greater than the sum of its parts
  • Biological knowledge is fragmented
  • Need database to manage resources

11
Definitions Disclaimers
  • Aggregation
  • 2 (or more) data sources, different data models,
    common link between (among) them.
  • Integration
  • 2 (or more) data sources, same data model,
    semantic mapping and instance merging required.
  • Inference
  • 1 (or more) data sources, one data model,
    creating new instances or new relationships.
  • (Evidence code type kind of inference)
  • Disclaimer
  • Controlled Vocabulary scope this tutorial

12
Assembling KnowledgeAggregation, Integration,
Inference
When it comes to data cleaning, theres no such
thing as a free lunch. Tim Berners-Lee
Some tasks are specific to a use case, some are
common to more than one and theres no escaping
others.
13
Bridging Chemistry and Molecular Biology
  • Different Views have different semantics Lenses
  • When there is a correspondence between objects,
    a semantic binding is possible

UniprotP49841
Apply Correspondence Ruleif ?target.xref.lsid
?bpxprot.xref.lsidthen ?target.correspondsTo.
?bpxprot
Source Eric Neumann Haystack BioDASH Demo
http//www.w3.org/2005/04/swls/BioDash/Demo/
14
Seamark Demonstration Identification of new
drug candidates
1. Differentiate different forms of disease 2.
Identify patients subgroups. 3. Identify top
biomarkers 4. Identify function 5. Identify
biological and chemical properties and disease
associations of biomarker 6. Identify
documents 7. Identify role in metabolic
pathways 8. Identify compounds that interact 9.
Identify and compare function in other
organisms 10. Identify any prior art
15
SMBL integration using BioPAX
  • Use BioPAX to Address SBMLs data integration
    issues
  • Different data types, same representation
  • Same data, different representations
  • External references
  • Synonyms
  • Provenance

16
A problem same representation different
semantics (SBML)
  • Protein-Protein Interaction
  • ltreaction
  • idpyruvate_dehydrogenase_cplx/gt
  • ltlistOfReactantsgt
  • ltspeciesRef speciesPdhA/gt
  • ltspeciesRef speciesPdhB/gt
  • lt/listOfReactantsgt
  • ltlistOfProductsgt
  • ltspeciesRef speciesPyruvate_dehydrogenase_E1
    /gt
  • lt/listOfProductsgt
  • lt/reactiongt

Biochemical Reaction ltreaction
idpyruvate_dehydrogenase_rxn/gt
ltlistOfReactantsgt ltspeciesRef
speciesNADP/gt ltspeciesRef speciesCoA/gt
ltspeciesRef speciespyruvate/gt
lt/listOfReactantsgt ltlistOfProductsgt
ltspeciesRef speciesNADPH/gt ltspeciesRef
speciesacetyl-CoA/gt ltspeciesRef
speciesCO2/gt lt/listOfProductsgt
ltlistOfModifersgt ltmodifierSpeciesRef
speciespyruvate_dehydrogenase_E1/gt
lt/listOfModifiersgt lt/reactiongt
17
SBML annotated with BioPAX
  • ltsbml xmlnsbphttp//www.biopax.org/release1/bio
    pax-release1.owl
  • xmlnsowl"http//www.w3.org/2002/07/owl"
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf
    -syntax-ns"gt
  • ltlistOfSpeciesgt
  • ltspecies idPdhA metaidPdhAgt
  • ltannotationgt
  • ltbpprotein rdfIDPdhA/gt
  • lt/annotationgt
  • lt/speciesgt
  • ltspecies idNADP metaidNADPgt
  • ltannotationgt
  • ltbpsmallMolecule rdfIDNADP/gt
  • lt/annotationgt
  • lt/listOfSpeciesgt
  • ltlistOfReactionsgt
  • ltreaction idpyruvate_dehydrogenase_cplxgt
  • ltannotationgt
  • ltbpcomplexAssembly rdfIDpyruvate_dehydrog
    enase_cplx/gt
  • lt/annotationgt

species is protein protein is PdhA
species is small molecule small molecule is NADP
18
BioPAX External References
  • ltspecies idpyruvate metaidpyruvategt
  • ltannotation
  • xmlnsbphttp//biopax.org/release1/biopax-r
    elease1.owlgt
  • ltbpsmallMolecule rdfIDpyruvategt
  • ltbpXrefgt
  • ltbpunificationXref
    rdfIDunificationXref119"gt
  • ltbpDBgtLIGANDlt/bpDBgt
  • ltbpIDgtc00022lt/bpIDgt
  • lt/bpunificationXrefgt
  • lt/bpXrefgt
  • lt/bpsmallMoleculegt
  • lt/annotationgt
  • lt/speciesgt

19
BioPAX Synonyms
  • ltspecies idpyruvate metaidpyruvategt
  • ltannotation xmlnsbphttp//biopax.org/release1/b
    iopax_release1.owl/gt
  • ltbpsmallMolecule rdfIDpyruvate gt
  • ltbpSYNONYMSgt2-oxo-propionic
    acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgt2-oxopropanoatelt/bpSYNONYMSgt
  • ltbpSYNONYMSgtBTSlt/bpSYNONYMSgt
  • ltbpSYNONYMSgtpyruvic acidlt/bpSYNONYMSgt
  • lt/bpsmallMoleculegt
  • lt/annotationgt
  • lt/speciesgt

20
Strategies
How we get to a Standard Pathway Representation?
(Game plan Take over the world or have the
world take over itself?)
  • Develop bridging technologies
  • Develop pathway representation standard within
    the Life Science community (BioPAX) (Social
    Engineering!)
  • Utilize Semantic Web Integration Technologies
    (LSID, RDF/OWL)

21
Exchange Formats in Pathway Data Space(Scope)
Graphic from Mike Cary Gary Bader
22
BioPAX Objectives
  • Accommodate existing database representations
  • Integration and exchange of pathway data
  • Interchange through a common (standard)
    representation
  • Provide a basis for future databases
  • Enable development of tools for searching and
    reasoning over the data

23
BioPAX Motivation
gt180 DBs and tools
Application
Database
User
Before BioPAX
With BioPAX
Common format will make data more accessible,
promoting data sharing and distributed curation
efforts
24
BioPAX Biological PAthway eXchange
  • A data exchange ontology and format for
    biological pathway integration, aggregation and
    inference
  • Initiative arose from the community

25
Biological pathways of the Cell What is a
Pathway?
Glycolysis
Apoptosis
Lac Operon
Protein-Protein
Molecular Interaction Networks
Gene Regulation
Metabolic Pathways
Signaling Pathways
BioPAX Level 1
BioPAX Level 2
26
Aggregation, Integration, Inference
  • Multiple kinds of pathway databases
  • metabolic
  • molecular interactions
  • signal transduction
  • Constructs designed for integration
  • DB References
  • XRefs (Publication, Unification, Relationship)
  • synonyms
  • provenance
  • OWL DL to enable reasoning

27
BioPAX Biochemical Reaction
OWL (schema)
Instances (Individuals) (data)
phosphoglucose isomerase
5.3.1.9
28
BioPAX Ontology
  • Conceptual framework based upon existing DB
    schemas
  • aMAZE, BIND, EcoCyc, WIT, KEGG, Reactome, etc.
  • Allows wide range of detail, multiple levels of
    abstraction
  • BioPAX ontology in OWL (XML)
  • Designed for pathway database integration
  • Database ID
  • Unification X-REF
  • Relationship X-REF
  • Publication X-REF
  • Synonyms
  • Provenance

29
BioPAX uses other ontologies
  • Use pointers to existing ontologies to provide
    supplemental annotation where appropriate
  • Cellular location ? GO Component
  • Cell type ? Cell.obo
  • Organism ? NCBI taxon DB
  • Incorporate other standards where appropriate
  • Chemical structure ? SMILES, CML, INCHI

30
BioPAX Ontology Overview
a set of interactions
parts
how the parts are known to interact
Level 1 v1.0 (July 7th, 2004)
31
BioPAX Ontology Top Level
  • Pathway
  • A set of interactions
  • E.g. Glycolysis, MAPK, Apoptosis
  • Interaction
  • A set of entities and some relationship between
    them
  • E.g. Reaction, Molecular Association, Catalysis
  • Physical Entity
  • A building block of simple interactions
  • E.g. Small molecule, Protein, DNA, RNA

Graphic from Gary Bader
32
BioPAX Ontology Root
  • Root class Entity
  • Any concept referred to as a discrete biological
    unit when describing pathways. This is the root
    class for all biological concepts in the
    ontology, which include pathways, interactions
    and physical entities

33
Metabolic Pathways
  • Interaction sub-classes
  • Definition
  • An entity that defines a single biochemical
    interaction between two or more entities.
  • An interaction cannot be defined without the
    entities it relates.

34
Metabolic Pathways
  • Interaction sub-classes
  • Definition Two terms exist under interaction
    Control and conversion. In future BioPAX levels,
    this list may be extended to include other
    classes, such as genetic interactions.

Examples Enzyme catalysis controls a biochemical
reaction, transport catalysis controls transport,
a small molecule that inhibits a pathway by an
unknown mechanism controls the pathway.
35
BioPAX as a solution toAggregation, Integration,
Inference
  • Multiple kinds of pathway databases
  • metabolic
  • molecular interactions
  • signal transduction
  • gene regulatory
  • Constructs designed for integration
  • DB References
  • XRefs (Publication, Unification, Relationship)
  • Synonyms
  • Provenance (not yet implemented)
  • OWL DL to enable reasoning

36
Time Out (15 minutes)
37
Workshop Case Studies Exercises(2 hrs 15
minutes)
  • Break into groups of triads and dyads
  • Case Study I (45 minutes)
  • Use Case 1 Inference of a Metabolic Flux Model
    from an Annotated Genome
  • Group Exercise 1
  • Case Study II (45 minutes)
  • Use Case 2 Integration of a metabolic flux model
    from two sources
  • Group Exercise 2
  • Case Study III (45 minutes)
  • Use Case 3 Multi-source aggregation Validation
    and Testing
  • Group Exercise 3

38
Methodology
  • Define the goal of the integration
  • How will the integrated data be used?
  • This defines the level of integration from
    syntactic through semantic
  • Take stock of current resources
  • This defines your staring point
  • Data base sources, programmers, lab access,
    collaborators
  • Scope the work to get from B to A
  • Data Profiling
  • Resource Profiling

39
3 Case Studies
  • Case study I Semantic Inference of metabolic
    pathway data from an annotated genome.
  • Case study II Semantic Integration of a
    metabolic flux model from two sources.
  • Case study III Semantic Aggregation of pathway
    data from multiple sources

40
Case Study IInference of a Metabolic Flux Model
from an Annotated Genome
  • Objective To apply Biological knowledge to
    constrain the possible behaviors of a metabolic
    network.
  • Resources Annotated Genome, Transport DB,
    Pathway databases, experimental community,
    published literature

41
Genes make RNA make Protein
Gene1
P1
RNA 1
Gene2
P2
RNA 2
Gene3
P3
RNA 3
Gene4
P4
RNA 4
Legend
Gene5
P5
RNA 5
Gene
RNA
Protein
Gene6
P6
RNA 6
Transporter
Gene7
P7
RNA 7
Enzyme
Gene8
P8
RNA 8
Transcription Translation
Gene9
P9
RNA 9
42
Proteins catalyze biochemical reactions
P4
Periplasm
P8
P1
P5
P9
B
2 D
Cytoplasm
E
A
F
E
D
P2
P6
Legend Metabolites A-F
C
A
2 B
B
F
Transporter
Enzyme
P7
P3
Catalyzes
A
C
C
D
Reaction
43
Biochemical reactions comprise a metabolic
network
Uptake R5
E
B
R4
R2
2B
Biomass R8
2D
Uptake R1
A
R6
D
R3
F
R7
C
Waste R9
44
Metabolic Inference Subgoals
  • Infer genes from sequence and homology
  • Infer enzymatic reactions from Enzyme Commission
    (EC) numbers
  • Infer metabolic reaction network from enzymatic
    reactions and metabolites.
  • Infer pathway holes using network debugging
    algorithms
  • Propose candidate enzymes using pathway-hole
    filling algorithms
  • Add experimentally verified candidates to the
    annotated genome
  • Lather, rinse, repeat

45
Data Profiling of the Annotated Genome
  • Orphaned genes
  • Orphaned enzymes
  • Misannotated genes
  • Misannotated enzymes
  • Sequencing errors
  • BLAST Algorithm errors

46
Schema Level Errors
Biochemical reaction
Biochemical reaction
Enzyme (protein) that catalyzes the biochemical
reaction
Gene that codes for the gene product (protein
enzyme)
47
Semantic bugs revealed by chemical structure
EcoCyc 7.5 Pathway Riboflavin and FMN and FAD
biosynthesis
No place to go!
4-(1-D-ribitylamino)-5-amino-2,6-dihydroxypyrimidi
ne
48
Semantic bugs revealed by chemical structure
EcoCyc 8.0 Pathway Riboflavin and FMN and FAD
biosynthesis
Synonyms
4-(1-D-ribitylamino)-5-amino-2,6-dihydroxypyrimidi
ne
49
Data Profiling of Pathway/Genome Database
  • Unbalanced Reactions
  • Pathway holes
  • Unproducible metabolites
  • Generalized Metabolites
  • Unconsumable metabolites (toxins)

50
Bugs in Network structure revealed by Forward and
Backward chaining
Known Nutrient set
Fired Reaction
Unfired Reaction
Essential compounds
Missing essential compound
Biomass
51
Bugs in Network structure revealed by Forward and
Backward chaining
Unproduced metabolite
Precursor metabolite
Essential compounds
Missing essential compound
Biomass
52
Case study IIIntegration of a metabolic flux
model from two sources
  • What is metabolic flux analysis?
  • How does one build a metabolic flux model?
  • What can go wrong in building a metabolic flux
    model?

53
What is Metabolic Flux Analysis?
  • Starts with the metabolic network
  • Assumes steady-state behavior
  • Constrain with Thermodynamics
  • Add Nutrient conditions
  • Choose an objective Biomass growth
  • Predicts growth rate for mutant and wild-type
    organisms under different conditions.

54
Start with the metabolic network
55
Stoichiometric Matrix Representation of the
metabolic network
R1 ? A R2 A ? B R3 A ? C
R4 B E ? 2D
R5 ? E R6 2B ? C F R7 C ? D R8 D ? R9
F ?
56
What is a metabolic flux?
Source fluxes
Metabolite Pool
Sink fluxes
57
What is a metabolic flux?
For a reaction of stoichiometry R2 A ? B the
rate of reaction, or flux is equal to
For a reaction of stoichiometry R4 BE ?
2D the flux is equal to
58
What is a metabolic flux?
For a reaction of stoichiometry R4 BE ?
2D The rate of reaction, or flux, is equal to
59
At steady-state, nonlinear dynamics simplify to
linear fluxes.
B
B
k2
v2
P2
k1
v1
A
P1
Aext
A
Aext
k3
P3
v3
C
C
60
At steady-state, the sum of the fluxes that
produce a metabolite is equal to the sum of the
fluxes that consume it.
B
v2
v1
A
Aext
v3
C
61
Stoichiometric Matrix more unknowns than
equations
62
How to determine the metabolic capabilities of a
network?
Uptake v5
E
B
v4
v2
2B
Biomass v8
2D
Uptake v1
A
v6
D
v3
F
v7
C
Waste v9
63
Using Elementary modes to study the steady
state-behavior
v5
v5
E
E
B
E
E
B
v4
v2
2B
v4
v2
2B
2D
v1
2D
A
v1
v6
v8
A
D
v8
v6
D
v3
v3
F
v7
F
C
v7
C
R9
v9
v5
E
B
v4
v2
2B
2D
v1
A
v8
v6
D
v3
F
v7
C
v9
64
How to make predictions about the behavior of the
metabolic network?
Uptake v5
E
B
v4
v2
2B
Biomass v8
2D
Uptake v1
A
v6
D
v3
F
v7
C
Waste v9
65
Optimal wild-type flux distribution
v5
10
Optimal Growth Flux
E
B
v4
2B
v2
10
10
2D
v1
v8
A
v6
10
D
20
v3
F
v7
C
v9
66
Optimal mutant flux distribution
v5
E
B
v4
2B
v2
STOP
2D
v1
v8
A
v6
10
D
10
v3
10
10
F
C
v7
v9
67
Suboptimal mutant flux distribution
v5
E
B
v4
2B
v2
STOP
6.7
2D
v1
v8
3.3
A
10
v6
D
6.7
v3
3.3
6.7
F
C
v7
3.3
v9
68
Case II Palsson JR904
  • good flux balance model
  • implicit schema
  • literature curated biochemical reactions
  • 904 enzymatic reactions
  • gene, enzyme-reaction associations

69
Case II What sources of data are available to
build a Metabolic Flux model?
  • Annotated Genome
  • Literature
  • Pathway Databases
  • Experimental measurements

70
Model vs. Exper., Glucose limited
(fluxes in mmol/gr DM h normalized to glucose
uptake flux)
(Segrè, Vitkup and Church, PNAS 2002)
71
Low Glucose Limited
High Glucose Limited
Nitrogen Limited
ni (exper)
ni (exper)
ni (exper)
Corr.coeff.0.91
Corr.coeff.0.97
Corr.coeff.0.78
72
Max growth (optimal)
Min Adjust. (suboptimal)
Corr
.
coeff
.0.564
250
P
-
value0.007
200
7
)
8
150
theor
10
13
9
100
11
14
3
(
12
1
i
v
50
16
2
15
17
6
5
0
4
-
50
-
50
0
50
100
150
200
250
v
(
exper
)
i
73
The power of a model lies in its ability to
distinguish between competing hypotheses
74
Case II EcoCyc
  • good schema
  • Flux balance model doesnt work

75
What happens if the steady-state behavior of the
model fails to reproduce the steady-state
behavior of the organism?
Genome
Pathologic
Nutrients Objective
Model Definition (SBML)
BioCyc to SBML
Pathway/ Genome Database
FBA MOMA
Transporter prediction
Flux prediction
76
What happens if the steady-state behavior of the
model fails to reproduce the steady-state
behavior of the organism?
Genome
Pathologic
Nutrients Objective
Model Definition (SBML)
BioCyc to SBML
Pathway/ Genome Database
FBA MOMA
Transporter prediction
Network Debugging
Flux prediction
77
Case II EcoCyc/JR904
  • Best of both worlds
  • Biological Objective From nutrients create all
    essential compounds required for growth
  • True test of metabolic databases Is the data
    good enough to predict growth rate under
    different nutrient conditions and effect of gene
    knockouts?

78
Case II Schema level integration
  • Translation from BioCyc ontology to BioPAX
    ontology
  • Translation of implicit JR904 schema to BioPAX
    ontology
  • Integration of JR904 concepts with BioPAX
    ontology (flux limits)

79
Case II Instance level
  • EcoCyc lt-gt JR904 Gene names
  • EcoCyc lt-gt JR904 Enzyme names
  • EcoCyc lt-gt JR904 Reaction names
  • EcoCyc lt-gt JR904 Reversibility/flux limits
  • EcoCyc lt-gt JR904 Gene-gtprotein associations
  • EcoCyc lt-gt JR904 protein-gtenzyme complex
  • associations
  • EcoCyc lt-gt JR904 enzyme-gtreaction
  • associations

80
Data Profiling of Flux Model
  • Incorrect constraints (reversibility)
  • Incorrect Nutrient conditions
  • Incorrect Biomass composition
  • Incorrect protein function predictions

81
Data profiling of Flux Predictions
  • Incorrect hypothesis
  • (FBA vs MOMA vs ROOM)
  • Incorrect network architecture
  • (Gene knockouts)
  • Incorrect modeling assumptions
  • (steady state assumption,
  • gene expression profiles)

82
Fixing the problems you find
  • Requires different amounts of time, money, and
    expertise
  • Enzyme Genomics project
  • Community annotation projects
  • Adopt-a-Genome project
  • High-throughput experiments
  • Pathway hole filling algorithms

83
Case III Semantic Aggregation Case study
  • Prochlorococcus marinus MED4
  • Most abundant species in the ocean
  • Responsible for a significant portion of
    photosynthetic carbon fixation.
  • Iron hypothesis Possible solution to global
    warming?
  • Need to understand details of metabolic network

84
Case III Multi-source aggregation
  • Public
  • KEGG (metabolism)
  • BioCyc (metabolism)
  • WIT (metabolism)
  • TransportDB (transport proteins)
  • Local
  • RNA expression (microarrays)
  • protein expression (mass spec)

85
Case III Goal
  • Constrain metabolic flux model with
  • experimental measurements
  • RNA expression
  • Protein expression
  • Metabolite concentrations
  • Flux measurements

86
Case III Aggregation Problems
  • Higher Level Orphan enzymes
  • Schema Level Bridge ontologies
  • Instance Level Object identity problem
  • Simulation Level underdetermined system.

87
Case III Multi-source aggregation Validation and
Testing
  • Joint-learning from multiple sources
  • Semantic test suite for data validation
  • Network debugging algorithms

88
Time Out (15 minutes)
89
Lessons Learned(30 minutes)
  • What did you learn?
  • Discussion
  • A good representation is the key to good problem
    solving Patrick Winston
  • Standard is better than bestGerald J Sussman
  • The great thing about standards is that there
    are so many from which to choose --Unknown
  • Above all, one must develop a feeling for the
    organism.Barbara McClintock
  • Someone does it once, everybody benefits.Eric
    Miller, W3C Semantic Web Activity Lead
  • Remember people, process, technology, however
    without people there isnt any process or
    technology, so its all social engineering.

90
Lessons Not Yet Learned(Take home exercise)
91
Feedback
  • Our goal is to have you walk away with a clear
    understanding of how to approach any database
    integration project
  • To provide
  • A methodology to scope and plan the project
  • An understanding of what to expect
  • Some specific examples to illustrate what is
    common to all integration projects (data
    cleaning) and what specific to a particular task.
    (i.e. to provide you with examples to give a
    sense of it)
  • Some first hand experience at pedantic
    aggravation, irritation and interference
  • How did we do? Please let us know how we can
    improve this tutorial.

92
Thank You Joanne Jeremy
Write a Comment
User Comments (0)
About PowerShow.com