Title: Knowledge Representation Chapter 10
1Knowledge RepresentationChapter 10
2Outline
- KR Introduction
- Ontological Engineering
- Categories and Objects
- Actions, Situations, and Events
- Mental Events and Mental Objects
- Reasoning Systems for Categories
- Reasoning with Default Information
- Truth Maintenance Systems
- Bio-Ontologies
3KR Introduction
- General problem in Computer Science
- Solutions Data Structures
- words
- arrays
- records
- list
- More specific problem in AI
- Solutions knowledge structures
- lists
- trees
- procedural representations
- logic and predicate calculus
- rules
- semantic nets and frames
- scripts
4Kinds of Knowledge
Things we need to talk about and reason about
what do we know?
- Objects
- Descriptions
- Classifications
- Events
- Time sequence
- Cause and effect
- Relationships
- Among objects
- Between objects and events
- Meta-knowledge
Distinguish between knowledge and its
representation
5Representation Mappings
Reasoning Programs
Internal Representation
Facts
English Representation
- Knowledge Level
- Symbol Level
- Mappings are not one-to-one
- Never get it complete or exactly right
6Ontological Engineering
- Like knowledge engineering but applies to
general-purpose knowledge bases - Ultimate goal is to represent everything in the
world!! - Result is an upper ontology
Anything/Root
AbstractObjects
GeneralizedEvents
RepresentationalObjects
Numbers
Sets
Places
Interval
Processes
PhyscialObjects
Categories
Sentences
Measurements
Things
Moments
Stuff
Solid
Liquid
Gas
Agents
Animals
Weights
Times
Humans
7Special- and General-purpose Ontologies
- Special-purpose ontology
- Designed to represent a specific domain of
knowledge - genetics (GO)
- immune system (IMGT)
- mathematics (Tom Gruber)
- General-purpose ontology
- Should be applicable in any special-purpose
domain - Unifies different domains of knowledge
- Upper ontology provides highest level framework -
all other concepts follow
8Cyc Upper Ontology
- Cycorp released 3,000 upper-level concepts into
public domain - Cyc Upper Ontology satisfies two important
criteria - It is universal Every concept can be linked to
it - It is articulate Distinctions are necessary and
sufficient for most purposes
9Categories - Representation
- Two choices for representation
- Predicate
- Basketball(b)
- Object
- Basketballs
- Member(b, Basketballs)
- Subset(Basketballs, Balls)
10Categories - Organizing
- Inheritance
- All instances of the category Food are edible
- Fruit is a subclass of Food
- Apples is a subclass of Fruit
- Therefore, Apples are edible
- The Class/Subclass relationships among Food,
Fruit and Apples is a taxonomy
11Categories - Partitioning
- Disjoint The categories have no members in
common - Exhaustive Decomposition Every member of the
category is included in at least one of the
subcategories - Partition Disjoint exhaustive decomposition
12Categories - Partitioning
- Disjoint(Animals,Vegetables)
-
13Categories - Partitioning
- Disjoint(Animals,Vegetables)
- Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
Intersection(c1,c2) )
14Categories - Partitioning
- Disjoint(Animals,Vegetables)
- Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
Intersection(c1,c2) ) - ExhaustiveDecomposition(Americans,Canadians,Mexic
ans,NorthAmericans)
15Categories - Partitioning
- Disjoint(Animals,Vegetables)
- Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
Intersection(c1,c2) ) - ExhaustiveDecomposition(Americans,Canadians,Mexic
ans,NorthAmericans) - ExhaustiveDecomposition(s,c) ? (?i i?c ? ?c2
c2?s ? i?c2)
16Categories - Partitioning
- Disjoint(Animals,Vegetables)
- Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
Intersection(c1,c2) ) - ExhaustiveDecomposition(Americans,Canadians,Mexic
ans,NorthAmericans) - ExhaustiveDecomposition(s,c) ? (?i i?c ? ?c2
c2?s ? i?c2) - Partition(Males,Females,Animals)
-
17Categories - Partitioning
- Disjoint(Animals,Vegetables)
- Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
Intersection(c1,c2) ) - ExhaustiveDecomposition(Americans,Canadians,Mexic
ans,NorthAmericans) - ExhaustiveDecomposition(s,c) ? (?i i?c ? ?c2
c2?s ? i?c2) - Partition(Males,Females,Animals)
- Parition(s,c) ? Disjoint(s) ? ExhaustiveDecomposi
tion(s,c)
18Categories - More
- PartOf
- PartOf(Bucharest,Romania)
- PartOf(Romania,EasternEurope)
- PartOf(EasternEurope,Europe)
- PartOf(Europe,Earth)
- Composite Objects
- Biped(a) ? ?c1,c2,b Leg(c1) ? Leg(c2) ? Body(b)
? PartOf(c1,a) ? PartOf(c2,a) ? PartOf(b,a) ?
Attached(c1,b) ? Attached(c2,b) ? c1?c2 ? ?c3
Leg(c3) ? PartOf(c3,a) ? (c3c1 ? c3c2)
19Categories And More
- Count Nouns and Mass Nouns
- How many aardvarks? How many butters!?!
- x ? Butter ? PartOf(y,x) ? y ? Butter
- Intrinsic and Extrinsic Properties
- Intrinsic properties belong to the very substance
of the object e.g. flavor, color, density,
boiling point, etc. - Extrinsic properties change if the object is
changed (cut in half) e.g. weight, length,
shape, etc.
20Actions, Situations and Events
21Situation Calculus
- The states resulting from executing actions
- Ontology
- Situations logical terms describing initial
situation and all situations that result from
executing actions on a given situation - Result(a,s)
- Fluents functions and predicates that may be
different in different situations - Age(Wumpus,S0) is Wumpus age in situation S0
- Atemporal or eternal functions and predicates
that are constant across all situations - Gold(G1)
22Situation Calculus Actions
- Each action described by two axioms
- Possibility Axiom
- Preconditions ? Poss(a,s)
- Effect Axiom
- Poss(a,s) ? changes that result from taking
action
23Situation Calculus - Example
- Possibility Axioms
- At(Agent,x,s) ? Adjacent(x,y) ? Poss(Go(x,y),s).
- Gold(g) ? At(Agent,x,s) ? At(g,x,s) ?
Poss(Grab(g),s). - Holding(g,s) ? Poss(Release(g),s).
- Effect Axioms
- Poss(Go(x,y),s) ? At(Agent,y,Result(Go(x,y),s).
- Poss(Grab(g),s) ? Holding(g,Result(Grab(g),s)).
- Poss(Release(g),s) ? ?Holding(g,Result(Grab(g),s))
.
24Go for the Gold!
- GOAL Bring the gold from 1,2 to 1,1
- At(Agent,1,1,S0) ? At(G1,1,2,S0).
- ?Holding(G1,S0).
- Gold(G1).
- Adjacent(1,1,1,2) ? Adjacent(1,2,1,1).
- Do It
- Go(1,1,1,2).
- Result
- At(Agent,1,2,Result(Go(1,1,1,2),S0)).
- Now, can I grab the gold?
- Grab(G1).
25Go for the Gold!
- GOAL Bring the gold from 1,2 to 1,1
- At(Agent,1,1,S0) ? At(G1,1,2,S0).
- ?Holding(G1,S0).
- Gold(G1).
- Adjacent(1,1,1,2) ? Adjacent(1,2,1,1).
- Do It
- Go(1,1,1,2).
- Result
- At(Agent,1,2,Result(Go(1,1,1,2),S0)).
- Now, can I grab the gold?
- Grab(G1).
26Go for the Gold!
- GOAL Bring the gold from 1,2 to 1,1
- At(Agent,1,1,S0) ? At(G1,1,2,S0).
- ?Holding(G1,S0).
- Gold(G1).
- Adjacent(1,1,1,2) ? Adjacent(1,2,1,1).
- Do It
- Go(1,1,1,2).
- Result
- At(Agent,1,2,Result(Go(1,1,1,2),S0)).
- Now, can I grab the gold?
- Grab(G1).
27Go for the Gold!
- GOAL Bring the gold from 1,2 to 1,1
- At(Agent,1,1,S0) ? At(G1,1,2,S0).
- ?Holding(G1,S0).
- Gold(G1).
- Adjacent(1,1,1,2) ? Adjacent(1,2,1,1).
- Do It
- Go(1,1,1,2).
- Result
- At(Agent,1,2,Result(Go(1,1,1,2),S0)).
- Now, can I grab the gold?
- Grab(G1).
28The Frame Problem
- Result
- At(Agent,1,2,Result(Go(1,1,1,2),S0)).
- Now, can I grab the Gold?
- Grab(G1).
29The Frame Problem
- Result
- At(Agent,1,2,Result(Go(1,1,1,2),S0).
- Now, can I grab the Gold?
- Grab(G1).
- What in the knowledge base allows me to go from
my Result (above) to Grab(G1)?
30The Frame Problem
- Result
- At(Agent,1,2,Result(Go(1,1,1,2),S0).
- Now, can I grab the Gold?
- Grab(G1).
- What in the knowledge base allows me to go from
my Result (above) to Grab(G1)? - nothing
31The Frame Problem
- How do we represent all the things in the world
that stay the same? - Represent all things at all situations the
representational frame problem - Project the results of a sequence of actions the
inferential frame problem
32Representational Frame Problem
- Successor-State Axiom
- Action is possible ? (Fluent is true in result
state ? Actions effect made it true ? It was
true before and action left it alone). - Truth value of each fluent in the next state
depends on action and truth value in the current
state - Poss(a,s) ? (At(Agent,y,Result(a,s)) ? a
Go(x,y) ? (At(Agent,y,s) ? a ? Go(y,z))).
33Time and Event Calculus
- Event Calculus based on points in time
- Fluents hold at points in time as opposed to
holding in situations - A fluent is true at a point in time if the
fluent was initiated by an event at some time in
the past and was not terminated by an intervening
event.
34Event Calculus
- Initiates(e,f,t) and Terminates(w,f,t)
- Event Calculus Axiom
- T(f,t2) ? ?e,t Happens(e,t) ? Initiates(e,f,t)
? (tltt2) ? ?Clipped(f,t,t2) - Clipped(f,t,t2) ? ?e,t Happens(e,t1) ?
Terminates(e,f,t1) ? (t lt t1) ? (t1 lt t2)
35Event Calculus - more
- Can be extended to handle
- indirect effects
- continuous change
- nondeterministic effects
- causal constraints
- . . .
36Generalized Events
- Combines aspects of space and time calculus
- Allows representation of events occurring in a
space-time continuum - World War II is an event that happened in
various geographic locations during a specific
period of time within the 20th century.
37Processes
- Discrete Events the event is a whole and a part
of the event is no longer the same event - Processes can include subintervals a part of a
plane flight is still a member of the Flying
class (aka liquid events) - Stated more precisely Any subinterval of a
process is also a member of the same process
category.
38Intervals
- Moment has temporal duration of zero
- Extended Interval has temporal duration of
greater than zero - Partition(Moments,ExtendedIntervals,Intervals)
- Member(i,Moments) ? Duration(i) Seconds(0).
39Intervals Ontology
- Meet(i,j) ? Time(End(i)) Time(Start(j)).
- Before(i,j) ? Time(End(i)) lt Time(Start(j)).
- After(j,i) ? Before(i,j).
- During(i,j) ? Time(Start(j)) ? Time(Start(i)) ?
Time(End(i)) ? Time(End(j)). - Overlap(i,j) ? ?k During(k,i) ? During(k,j).
40Mental Events and Mental Objects
- Knowledge about beliefs, specifically about those
beliefs held by an agent - Which agent knows about the geography of Maine?
- Provides an agent the ability to reason about
beliefs of agents - However, need to define propositional attitudes,
such as Believes, Knows and Wants as relations
where the second argument is referentially opaque
(no substitution of equal terms)
41Reasoning Systems for Categories
- Categories are KR building blocks
- Two primary systems for reasoning
- Semantic Networks
- Graphical aids for visualizing knowledge
- Mechanisms for inferring properties of objects
based on category membership - Description Logics
- Formal language for constructing and combining
category definitions - Algorithms for classifying objects and
determining subsumption relationships
42Semantic Networks
- Graphical notation with underlying logical
representation - A form of logic, but not FOL
- Capable of representing objects, relations,
quantification, - Convenient representation of inheritance
- Multiple Inheritance (sometimes)
- Inverse links
- Extendable using procedural attachments
43Semantic Networks - More
- Can only express binary relationships making it
more difficult to express n-ary predicates e.g.
Fly(Shankar,NewYork,NewDelhi,Monday) - Negation, disjunction, nested function symbols,
and existential quantification are missing - Some SNs include procedural attachments
- Represents default values assertions may be
overridden by more specific values
44Semantic Networks
Mammals
SubsetOf
Persons
Legs
2
HasMother
SubsetOf
SubsetOf
Females
Males
SisterOf
Legs
Mary
1
John
45Description Logics
- Notations to make it easier to describe
definitions and properties of categories - Taxonomic structure is organizing principle
- Subsumption Determine if one category is a
subset of another - Classification Determine the category in which
an object belongs - Consistency Determine if membership criteria are
logically satisfiable
46Description Logics
- CLASSIC was one of first languages (Borgida, et
al, 1989) - All bachelors are unmarried adult males.
- DL
- Bachelor And(Unmarried,Adult,Male).
- FOL
- Bachelor(x) ? Unmarried(x) ? Adult(x) ? Male(x)
47Description Logics
- What does this DL statement say?
- And(Man,AtLeast(3,Son), AtMost(2,Daughter),
All(Son,And(Unemployed,Married,
All(Spouse,Doctor))), All(Daughter,And(Professor,
Fills(Department,Physics,Math)))).
48Description Logics - More
- Emphasis on tractability of inference
- Inference happens by
- Describe the problem instance
- Asserting the instance into the KB to be handled
by the subsumption apparatus - FOL cannot predict solution time
- DL solve in time polynomial in size of KB
- DLs usually lack disjuntion and negation (for
time/speed considerations)
49Current Description Logic
- DAMLOIL
- DARPA Agent Mark-up Language Ontology Inference
Language (OIL) - Comes out of DARPA initiative
- OIL from University of Manchester
- http//www.w3.org/TR/damloil-reference
- OWL
- Ontology Web Language
- A language for the semantic web
- Next generation DAMLOIL
- Flavors OWL-Lite, OWL-DL and OWL (full)
- W3C recommendation as of Feb 10, 2004
- http//www.w3.org/TR/2004/REC-owl-features-2004021
0/
50Reasoning with Default Information
- Open and Closed worlds
- Open World Information provided is not assumed
to be complete, therefore inferences may result
in sentences whose truth value is unknown - Closed World Information provided is assumed
complete, therefore ground sentences not asserted
to be true are assumed false - Negation as Failure A negative literal, not P,
can be proved true if the proof of P fails
51Nonmonotonic Logics Circumscription
- Version of closed-world assumption
- Specify predicates that are almost always false
- Default rule stating that birds fly
- Bird(x) ? ?Abnormal(x) ? Flies(x)
- Abnormal() is circumscribed reasoner assumes
?Abnormal() unless Abnormal() is known to be true - Circumspection is model preference logic notion
of preferred models in KB
52Nonmonotonic LogicsDefault Logic
- Default rules express contingencies
- Bird(x) Flies(x)/Flies(x)
- If Bird(x) is true, and Flies(x) consistent with
KB, then conclude Flies(x) (by default) - Default rule form is
- P J1, , Jn/C
- P Prerequisite J Justifications C
Conclusions - If any J is false, then C is not true
53Truth Maintenance Systems
- Designed to handle Belief Revision
54Truth Maintenance Systems
- Designed to handle Belief Revision
- Lets say our KB contains sentence P
55Truth Maintenance Systems
- Designed to handle Belief Revision
- Lets say our KB contains sentence P
- But P is found to be incorrect/untrue
56Truth Maintenance Systems
- Designed to handle Belief Revision
- Lets say our KB contains sentence P
- But P is found to be incorrect/untrue
- So, we want to say Tell(KB,?P)
57Truth Maintenance Systems
- Designed to handle Belief Revision
- Lets say our KB contains sentence P
- But P is found to be incorrect/untrue
- So, we want to say Tell(KB,?P)
- First, though, Retract(KB,P) to avoid P ? ?P
58Truth Maintenance Systems
- Designed to handle Belief Revision
- Lets say our KB contains sentence P
- But P is found to be incorrect/untrue
- So, we want to say Tell(KB,?P)
- First, though, Retract(KB,P) to avoid P ? ?P
- What if P ? Q? What happens to Q?
59Truth Maintenance Systems
- Designed to handle Belief Revision
- Lets say our KB contains sentence P
- But P is found to be incorrect/untrue
- So, we want to say Tell(KB,?P)
- First, though, Retract(KB,P) to avoid P ? ?P
- What if P ? Q? What happens to Q?
- Retract Q?
60Truth Maintenance Systems
- Designed to handle Belief Revision
- Lets say our KB contains sentence P
- But P is found to be incorrect/untrue
- So, we want to say Tell(KB,?P)
- First, though, Retract(KB,P) to avoid P ? ?P
- What if P ? Q? What happens to Q?
- Retract Q?
- But what if we also have R ? Q?
61Truth Maintenance Systems
- Designed to handle Belief Revision
- Lets say our KB contains sentence P
- But P is found to be incorrect/untrue
- So, we want to say Tell(KB,?P)
- First, though, Retract(KB,P) to avoid P ? ?P
- What if P ? Q? What happens to Q?
- Retract Q?
- But what if we also have R ? Q?
- Therefore
62Truth Maintenance Systems
- Rollback mechanism doesnt scale up
- Justification-based Truth Maintenance System
(JTMS) - Includes in the KB the set of sentences from
which the sentence was inferred - Sentences are in or out, based on truth value of
supporting sentences - Assumption-based Truth Maintenance System (ATMS)
- Maintains a set of supporting sentences,
representing all states - Sentence holds in just those cases where all
assumptions in one of the assumptions sets hold
63Justification-based TMS
- Each sentence in KB includes all sentences that
made it true - P ? Q has justification P, P ? Q
- What if Q has the following justifications, and
we Retract(P)?
64Justification-based TMS
- Each sentence in KB includes all sentences that
made it true - P ? Q has justification P, P ? Q
- What if Q has the following justifications, and
we Retract(P)? - P, P ? Q
65Justification-based TMS
- Each sentence in KB includes all sentences that
made it true - P ? Q has justification P, P ? Q
- What if Q has the following justifications, and
we Retract(P)? - P, P ? Q
- P, P ? R ? Q
66Justification-based TMS
- Each sentence in KB includes all sentences that
made it true - P ? Q has justification P, P ? Q
- What if Q has the following justifications, and
we Retract(P)? - P, P ? Q
- P, P ? R ? Q
- R, R ? P ? Q
67Justification-based TMS
- Each sentence in KB includes all sentences that
made it true - P ? Q has justification P, P ? Q
- What if Q has the following justifications, and
we Retract(P)? - P, P ? Q
- P, P ? R ? Q
- R, R ? P ? Q
- Sentences that comprise Justifications are in or
out (not removed from KB) efficiency
68Assumption-based TMS
- Designed to make Belief Revision efficient
- Represents all states at the same time
- Each sentence in the KB has a set of assumption
sets - For each sentence in the KB, the sentence holds
when all assumptions in one of its assumption
sets hold
69Ontologies in PracticeThe BioOntologies
Consortium
70Outline
- Motivation
- The problem
- The solution
- Exchange Languages Evaluation
- Initial Evaluation
- Second-level Evaluation
- Conclusions/Recommendations
- Future Work
71Motivation
- Explosive and uncontrolled growth of
Bioinformation - It is increasingly important in the life sciences
to integrate information across scientific
disciplines and business areas - Terminology in the domain of molecular biology is
inconsistent - information searches can be
incomplete and inaccurate - Definitions and descriptions of life sciences
objects differ among data sources - significant
time and effort is required to integrate those
data sources
72What is DNA Topoisomerase?
UMLS says its gt
EC 5.99.1.2 DNA Nicking-Closing Protein DNA
Relaxing Enzyme DNA Relaxing Protein DNA
Topoisomerase DNA Topoisomerase I DNA Type 1
Topoisomerase DNA Untwisting Enzyme DNA
Untwisting Protein Omega Protein Topoisomerase
I Type I DNA Topoisomerase Nicking-closing
enzyme Relaxing enzyme Untwisting
enzyme w-Protein Swivelase
73Motivation - Shared Ontologies
- Ontologies in the life sciences currently exist,
but not in a coordinated/shared manner - Shared ontologies provide benefits
- sharing the work
- database integration
- exchange of biological data
- developing shared understandings
- differences can provide focus on interesting
problems
74The Solution Ontologies
- An ontology is a specification of a
conceptualization. - An ontology is a description of the concepts and
relationships that can exist for an agent or a
community of agents. ... A common ontology
defines the vocabulary with which queries and
assertions are exchanged among agents. - T.R. Gruber (1993)
75Goals of Ontologies
- Provide standardized vocabularies for text mining
and information retrieval - Formalized ontologies are expressed in a common
language (or a small number of languages),
facilitating representation and exchange of
ontological knowledge - Building common ontologies will establish shared
understandings within the community ? so, create
a consortium as a forum to develop these
ontologies
76Bio-Ontologies Consortium Goals
- Enable interoperability/exchange of life sciences
information - Establish a consortium for promoting and sharing
open-source ontologies in the Life Sciences - Establish user community for sharing experiences
with designing and building ontologies for the
Life Sciences - Develop synergies with the Knowledge Management
community to target tools/languages to life
sciences ontologies - Create a permanent portal for the exchange of
ontologies and ontology building tools
77Bio-Ontologies Consortium Activity
- Enable interoperability/exchange of life sciences
information - Successful exchange depends on
- Common, shared definitions
- Common language to describe definitions
- Therefore, select a language, or a small set of
languages, for the exchange of life sciences
ontologies
78Select Candidate Languages (1)
- Ontolingua
- Long-standing effort in KR community
- Based on work for common interchange language
- CycL
- Significant effort in KR community
- Largest commercial vendor of ontological tools
- OML/CKML
- XML based language
- new language, so possible to influence
development - OPM
- OO model to describe single- and multi-DB schemas
- tool used in bioinformatic community
79Select Candidate Languages (2)
- XML and XML/RDF
- Web-based language
- Significant work going on to extend expressivity
- UML
- Widely used modeling tool in commercial
marketplace - Based on OO concepts (supported by industry)
- OKBC
- API for accessing distributed Knowledge Bases
- Current work by KR community
- ASN.1
- Early representation language for Bioinformatics
- ODL
- De facto standard for OO databases
80Evaluation Criteria (1)
- Language Support and Standardization
- Does the language have a formal specification?
- What support (documentation, tutorials, tech
support, ) is available? - Does the language implement a standard? If so,
who controls this standard? - Data model/capabilities
- How rich is the expressiveness of the language,
I.e., does the language support negation,
conjunction, disjunction, relations, ...
81Evaluation Criteria (2)
- Performance
- Scalability to real-world problems
- Stability (languages with tools/environments)
- Other Issues/Pragmatics
- Current users of the language
- Domains in which the language has been applied
- Connection to data sources (knowledge sources
storage formats (relational, OO, ))
82Initial Evaluation - Results
- Keys to acceptance
- Rich expressive power
- Stability and history of use
- Approachable/understandable syntax
- Open to collaboration
- Keys to non-acceptance
- Proprietary language
- Wedded to a commercial system
83Initial Evaluation - Results
84Next Level Evaluation
- Two languages stood out as strong candidates
- Ontolingua
- OML/CKML
- Conduct experiments to represent biological
entities - select two life sciences ontologies
- Ecocyc Gene Ontology
- GeneClinics data model/ontology
- represent each ontology in both Ontolingua and OML
85Gene Ontology - Ontolingua (1)
(DEFINE-CLASS Genes (?X) "The class of all
genes is divided into several subclasses. Genes
whose function is unknown or known only
approximately are grouped into the classes ORFs
and Unclassified-Genes, respectively. Genes of
known function have been classified using two
orthogonal classification schemes developed by
Monica Riley. One scheme classifies genes
according to the physiological role of their
product class (Physiological-Roles) the other
scheme classifies genes according to the function
of their product, such as enzymes and transport
proteins (Product-Types). DEF (AND
(DNA-Segments ?X))) ?VALUE))) (DEFINE-FU
NCTION CENTISOME-POSITION (?FRAME) -gt
?VALUE "This slot lists the map position of this
gene on the chromosome in centisome units. DEF
(AND (Genes ?FRAME) (NUMBER ?VALUE))) (DEFINE-R
ELATION CITATIONS (?FRAME ?VALUE) "This slot
lists general citations pertaining to the object
containing the slot. Each value of the slot is a
citation of the form reference-id. DEF (AND
(Organisms ?FRAME) (STRING ?VALUE))) (DEFINE-RE
LATION COMMENT (?FRAME ?VALUE) "The Comment slot
stores a general comment about the object that
contains the slot. DEF (AND (THING ?FRAME)
(STRING ?VALUE))) (DEFINE-FUNCTION COMMON-NAME
(?FRAME) -gt ?VALUE "The primary name by which
an object is known to scientists -- a widely used
and familiar name (in some cases arbitrary
choices must be made). DEF (AND (Organisms
?FRAME) (STRING ?VALUE))) (DEFINE-RELATION
EVIDENCE (?FRAME ?VALUE) "Describes evidence for
the defined function of this object. Currently we
distinguish between function that is determined
experimentally, and function that is determined
through computational sequence analysis. DEF
(AND (Genes ?FRAME) ((ONE-OF EXPERIMENT
SEQUENCE-ANALYSIS) ?VALUE)))
86Gene Ontology - Ontolingua (2)
(DEFINE-RELATION HISTORY (?FRAME
?VALUE) "Contains a textual history of changes
made to this frame. Each item is either a string
or a note frame." DEF (AND (THING ?FRAME) ((OR
STRING Notes) ?VALUE))) (DEFINE-FUNCTION
INTERRUPTED? (?FRAME) -gt ?VALUE "The value of
this slot is T for genes that are interrupted,
i.e., those that have an early stop codon
inserted. DEF (AND (Genes ?FRAME) (BOOLEAN
?VALUE))) (DEFINE-FUNCTION LEFT-END-POSITION
(?FRAME) -gt ?VALUE DEF (AND (DNA-Segments
?FRAME) (NUMBER ?VALUE))) (DEFINE-RELATION
PRODUCT (?FRAME ?VALUE) "This slot lists the
product of a gene, which could be a polypeptide
or a tRNA. Multiple products will be recorded in
the case that several chemically modified forms
of the protein product exist. " DEF (AND
(Genes ?FRAME) ((OR Polypeptides RNA)
?VALUE))) (DEFINE-RELATION PRODUCT-STRING
(?FRAME ?VALUE) "This slot holds a text string
that describes the product of this gene this
slot is only used when EcoCyc does not describe
the gene product as a frame (such as a
polypeptide frame). DEF (AND (Genes ?FRAME)
(STRING ?VALUE))) (DEFINE-RELATION PRODUCT-TYPES
(?FRAME ?VALUE) "Describes the type of the gene
product, e.g., is it an enzyme, an RNA,
etc. DEF (AND (Genes ?FRAME) ((ONE-OF
ENZYME REGULATOR LEADER MEMBRANE TRANSPORT
STRUCTURAL RNA PHENOTYPE FACTOR
CARRIER) ?VALUE)))
87Gene Ontology - Ontolingua (3)
(DEFINE-FUNCTION RIGHT-END-POSITION (?FRAME) -gt
?VALUE DEF (AND (DNA-Segments ?FRAME)
(NUMBER ?VALUE))) (DEFINE-RELATION SYNONYMS
(?FRAME ?VALUE) "One or more secondary names for
an object -- names that a scientist might attempt
to use to retrieve the object. The Synonyms
should include any name a user might use to try
to retrieve an object. DEF (AND
(Generalized-Reactions ?FRAME) (STRING
?VALUE))) (DEFINE-FUNCTION TRANSCRIPTION-DIRECTIO
N (?FRAME) -gt ?VALUE "This slot specifies the
direction along the chromosome in which this gene
is transcribed allowable values are or -."
DEF (AND (DNA ?FRAME)
((ONE-OF "" "-") ?VALUE)))
88Gene Ontology - OML/CKML (1)
ltCKMLgt ltOntology id"Riley's Gene Classes"
version"1.0"gt ltcommentgt This OML ontology
defines an encoding of the gene
classification system developed by Monica Riley.
lt/commentgt ltextends ontology"http//www.ck
ml.org/ontology/" prefix"CKML"/gt ltObject
type"Genes"gt ltcommentgt The class of all
genes is divided into several subclasses. Genes
whose function is unknown or known only
approximately are grouped into the classes ORFs
and Unclassified-Genes, respectively. Genes of
known function have been classified using two
orthogonal classification schemes developed by
Monica Riley. One scheme classifies genes
according to the physiological role of their
product class (Physiological-Roles) the other
scheme classifies genes according to the function
of their product, such as enzymes and transport
proteins (Product-Types). lt/commentgt
lt/Objectgt ltFunction type"LEFT-END-POSITION"
srcType"Genes" tgtType"data.Real"/gt
ltFunction type"INTERRUPTED?" srcType"Genes"
tgtType"data.Boolean"gt ltcommentgt The value
of this slot is T for genes that are interrupted,
i.e., those that have an early stop codon
inserted. lt/commentgt lt/Functiongt
ltBinaryRelation type"HISTORY" srcType"CKMLObjec
t" tgtType"data.String"gt ltcommentgt
Contains a textual history of changes made to
this frame. Each item is either a string or a
note frame. lt/commentgt lt/BinaryRelationgt
ltTheory genus"Evidence"gt ltObject
type"EXPERIMENT"/gt ltObject
type"SEQUENCE-ANALYSIS"/gt lt/Theorygt
89Gene Ontology - OML/CKML (2)
ltBinaryRelation type"EVIDENCE" srcType"Genes"
tgtType"Evidence"gt ltcommentgt Describes
evidence for the defined function of this object.
Currently we distinguish between function that is
determined experimentally, and function that is
determined through computational sequence
analysis. lt/commentgt lt/BinaryRelationgt
ltFunction type"CENTISOME-POSITION"
srcType"Genes" tgtType"data.Real"gt
ltcommentgt This slot lists the map position of
this gene on the chromosome in centisome units.
lt/commentgt lt/Functiongt ltBinaryRelation
type"CITATIONS" srcType"CKMLObject"
tgtType"data.String"gt ltcommentgt This slot
lists general citations pertaining to the object
containing the slot. Each value of the slot is a
citation of the form reference-id. lt/commentgt
lt/BinaryRelationgt ltBinaryRelation
type"COMMENT" srcType"CKMLObject"
tgtType"data.String"gt ltcommentgt The
Comment slot stores a general comment about the
object that contains the slot. lt/commentgt
lt/BinaryRelationgt ltFunction
type"COMMON-NAME" srcType"CKMLObject"
tgtType"data.String"gt ltcommentgt The
primary name by which an object is known to
scientists -- a widely used and familiar name (in
some cases arbitrary choices must be made).
lt/commentgt lt/Functiongt ltTheory
genus"Transcription-Direction"gt ltObject
type""/gt ltObject type"-"/gt
lt/Theorygt ltFunction type"TRANSCRIPTION-DIRECT
ION" srcType"Genes" tgtType"Transcription-Direct
ion"gt ltcommentgt This slot specifies the
direction along the chromosome in which this gene
is transcribed allowable values are or -.
lt/commentgt lt/Functiongt ltBinaryRelation
type"PRODUCT" srcType"Genes" tgtType"Polypeptid
es"/gt lt/BinaryRelationgt
90Gene Ontology - OML/CKML (3)
ltBinaryRelation type"SYNONYMS"
srcType"CKMLObject" tgtType"data.String"gt
ltcommentgt One or more secondary names for an
object -- names that a scientist might attempt to
use to retrieve the object. The Synonyms should
include any name a user might use to try to
retrieve an object. lt/commentgt
lt/BinaryRelationgt ltBinaryRelation
type"PRODUCT-STRING" srcType"Genes"
tgtType"data.String"gt ltcommentgt This slot
holds a text string that describes the product of
this gene this slot is only used when EcoCyc
does not describe the gene product as a frame
(such as a polypeptide frame). lt/commentgt
lt/BinaryRelationgt ltTheory genus"Product-Types
"gt ltObject type"ENZYME"/gt ltObject
type"REGULATOR"/gt ltObject type"LEADER"/gt
ltObject type"MEMBRANE"/gt ltObject
type"TRANSPORT"/gt ltObject
type"STRUCTURAL"/gt ltObject type"RNA"/gt
ltObject type"PHENOTYPE"/gt ltObject
type"FACTOR"/gt ltObject type"CARRIER"/gt
lt/Theorygt ltBinaryRelation type"PRODUCT-TYPES
" srcType"Genes" tgtType"Product-Types"gt
ltcommentgt Describes the type of the gene product,
e.g., is it an enzyme, an RNA, etc. lt/commentgt
lt/BinaryRelationgt ltFunction
type"RIGHT-END-POSITION" srcType"Genes"
tgtType"data.Real"/gt
91Gene Ontology - OML/CKML (4)
ltCollection.Objectgt ltGenes id"EG10707"
text"pheA"gt ltLEFT-END-POSITION
tgt"2735765"/gt ltCENTISOME-POSITION
tgt"58.97035d0"/gt ltTRANSCRIPTION-DIRECTIO
N tgt""/gt ltRIGHT-END-POSITION
tgt"2736925"/gt lt/Genesgt
lt/Collection.Objectgt ltCollection.BinaryRelatio
ngt ltEVIDENCE src"EG10707"
tgt"EXPERIMENT"/gt ltNAMES src"EG10707"
tgt"pheA"/gt ltNAMES src"EG10707"
tgt"b2599"/gt ltPRODUCT src"EG10707"
tgt"CHORISMUTPREPHENDEHYDRAT-MONOMER"/gt
ltPRODUCT-STRING src"EG10707" tgt"chorismate
mutase-P and prephenate dehydratase"/gt
lt/Collection.BinaryRelationgt
92Experiments - Results (Ecocyc)
- OML representation
- OMLs expressive capabilities captured most
aspects of gene ontology - some limitations in expressive capability no
facets, cardinality or multiple collection types - terminology differences and definitions not
modular - Ontolingua representation
- Ontolingua expressed all of gene ontology
- Lisp syntax of Ontolingua not readily approachable
93Experiments - Results (GeneClinics)
- OML representation
- Expressive capabilities adequate to the job
- OML/CKML is based on conceptual graphs and may
have more expressive capabilities in the long
term - Ontolingua representation
- Ontolingua based on frames semantics which more
closely aligns with relational and OO data models - Lisp syntax not acceptable to larger community
- Both languages would benefit from life sciences
examples
94Conclusions and Recommendations
- The language most suitable for the exchange of
life sciences ontologies should have the
following key characteristics - Frame-based representation
- Long history of work with frame-based
representation model - Mappings between this model and relational and/or
OO data sources are easily expressed - XML-based syntax
- Critical for exchange among physically dispersed
community - New tools being developed in XML community
- Lots of momentum in the web-based community
95Current Efforts
- Developed specification for an XML-based exchange
language XOL (XML Ontology Language) based on
Ontolingua (Karp/Chaudhri) - Frame-based semantics for OML/CKML
- Developing process for submission of life
sciences ontologies to the Bio-Ontologies
Consortium
96Other Ontology Efforts
- Gene Ontology Consortium (http//genome-www.stanfo
rd.edu/GO/) - BioPathways Consortium (http//www.3rdmill.com/Bio
Pathways) - mmCIF (http//ndbserver.rutgers.edu/mmcif)
97Bio-Ontologies Consortium - Future Work
- Content development
- Elicit and review ontology submissions
- Synergies with OMG
- Provide public-domain ontologies to the Life
Sciences community and encourage use of those
ontologies - Bio-Ontologies 2000