Title: Formal Aspects Of Protege
1Formal Aspects Of Protege
- William Grosso
- Stanford Medical Informatics
- Stanford University
2Overview
- Interoperability is important
- HPKB DARPA project with many participants
- Protégé-2000 Lots of developers in many
locations - Ray cant write code fast enough !
- Interoperability requires common ground
- Shared semantics for common constructs
- The new Knowledge Model
3Proposed HPKB Scenario
PSM
SRI
PSM
PSM
MIT
SMI
Knowledge Base(s) in a KB Server
Shared Ontologies
Situation Data
4Knowledge Bases in HPKB
- Ontologies are ways to share well-defined
information - Define knowledge structure
- Useful as a coupling mechanism
- Knowledge Bases serve multiple roles
- Repositories of shared knowledge
- Community blackboards (with semantics).
5Interoperability requires Semantics
- As long as all the developers are in the same
building, things can be underspecified - Rely on group knowledge and established
practice - Larger working groups (over time, space, or in
numbers of people), can require more precise
specifications
6Knowledge Models
- Formal specification of the way knowledge is
represented - Precise, human-readable definitions of structures
in a language - Frequently unwritten
- Implied by the documentation
- Deduced via experience
7Knowledge Models at SMI
- Work spurred by the OKBC Specification
- Defining the Protégé Knowledge Model
- Comparing it to other knowledge models
- Goal Enable Protégé tools to interoperate with
knowledge-based systems from other labs - Goal is knowledge reuse
- Implicit Hypothesis understanding knowledge
models will facilitate interoperation
8Example Protégé and Loom
- Protégé A suite of tools to simplify knowledge
base design and construction - Design ontologies, create KA tools to acquire
instances - Explicitly adopts notion of external PSMs in
order to focus on KA - Loom An environment for knowledge-based system
construction - Everything done inside the Loom environment
9Frame-Based Knowledge Models
- Both Protégé and Loom use frame-based knowledge
models - Classes, instances, slots, facets,
- We expect differences over things like default
values and models of time - But the knowledge models differ on more mundane
notions as well
10Whats a Slot ?
- Protégé/Win
- Slots are not part of the global namespace
- Define attributes of a frame
- Cannot be referred to independently of either a
class or an instance - Which slots are attached to an instance is part
of the class definition
- Loom
- Slots are part of the global namespace
- Defined by defrelation construct
- Have attributes
- domain, range,
- Slots can be reified
- Instances of a slot class correspond to a
specific relation (between two instances)
11Whats an Instance ?
- Protégé/Win
- Every instance is a direct instance of a single
specified class - Automatically has the own slots defined by the
class - No other slots allowed
- Direct instance typing cannot change.
- To change type at all, need to do explicit
operations on the class
- Loom
- Type of an instance does not have to be specified
- Classifier deduces instance types
- Types of instances can change (without being
explicitly set) - Instances can be direct instances of more than
one class
12Interoperation ?
- Two different development environments
- Two different user models
- Two different approaches to KA
- Two different knowledge models
- Both frame based
- Disagree on the definitions of commonly used
structures - Solution ado,apt the OKBC knowledge model
13Protégé-2000 Is Like HPKB
- Ray cant write the code fast enough
- Therefore someone else has to write it
- Protégé-2000 allows everyone to customize it
using Java components - If we glue together components written at
multiple labs, and knowledge bases produced by
many different people, we might inadvertently
introduce the same issues
14Components
Central Framework
Widget
Widget
Widget
Storage Model
Widget
Widget
Widget
Storage Model
Widget
Provided by SMI. Plumbing that cannot be
replaced or augmented.
Widgets mediate between the knowledge base and
the user. They display small pieces of the
knowledge base in a way that the user can
understand and manipulate. SMI provides a generic
set of default widgets.
Every running application uses a storage model
for persistence. SDI currently provides two
(CLIPS format and RDBMS format).
15Widgets
- Widgets can be added to the platform (using
JavaBeans) - There is a well-defined Widget API for building
new widgets and adding them to a project - Widgets can now be arbitrarily complex
- Dialogs are used to configure widgets
- State is stored into a separate knowledge base
(the project knowledge base)
16Storage Models
- Protege/Win stored knowledge bases in a
CLIPS-compatible format - The goal for Protege-2000 is to use a
wide-variety of persistence mechanisms - CLIPS-format is still useful
- OKBC servers are important
- Relational databases could be useful
- To do this, we need to isolate out the
persistence mechanism as a component
17Axioms and Constraints
- Protege/Win used a frame-based language
- Protege-2000 keeps the emphasis on frames, but
adds in a constraint language - Based on KIF
- Compatible with OKBC
18The Actual Knowledge Model
19Knowledge Models
- Formal specification of the way knowledge is
represented - Precise, human-readable definitions of structures
in a language - Gives guarantees of what must hold in the
knowledge base - Other things may be true, in addition to what the
knowledge model guarantees - Protege ada,opts the OKBC knowledge model
20The Role of Logic
- Frames are intuitive for humans
- Concept / instance distinction dates back to
Plato - But theyre not very well-defined
- What Minsky meant by frame is not what Winograd
meant by frame (and is certainly not what Plato
meant by form) - We use logic to formalize the definitions
- Make the underlying assumptions explicit
21KIF
- Knowledge Interchange Format
- Developed in early 1990s as a standard syntax
for first order logic - entirely ASCII and somewhat LISPy
- (forall ?x (exists ?y (......))))
- Currently a draft standard
- http//logic.stanford.edu/kif/dpans.html
- Slight peculiarity relations are multiple arity
22Frames
- A Frame is simply a symbol
- A symbol is simply a 0-ary relation
- That is, it can be an argument to a function or a
predicate - That is, it is something we can make assertions
about - Types of frames include most of the traditional
modelling constructs (classes, instances, slots ,
...)
23Classes
- Classes are frames (are symbols ....)
- Classes are also unary predicates
- KIF allows multiple arity predicates
- That is, classes are sets (the set of instances)
- Members of the set instances of the class.
- You can assert things about the class (using the
fact that the class is a frame) - You can reason about the elements of the
associated set
24Defining Subclasses
- Subclass usually means two things
- All instances of the subclass are instances of
the superclass - Anything that is true of the superclass (as a
class) is true of the subclass - The first of these is simply subset
(gt (subclass-of ?S ?P) (forall ?F (gt
(?S ?F) (?P ?F))))
25Multiple Inheritance
- Easy to define in this model
- For Set-aspects, simply use subclass subset
- A set can be a subset of more than one class
- As frames, enforce substitutability
- Any sentence that can be asserted about the
superclass, as a class, ought to be true of the
subclass - Winds up being union of logical statements
26Slots
- Slots are frames (are symbols ...)
- Slots are also binary predicates (taking a frame
and a value) - Slots also have associated predicates
- binary (take a slot and a frame, formalize the
notion of attachment) - ternary (take a slot, a frame, and a value)
template-slot-of slot-of
template-slot-value slot-value
27Attaching a Slot
- Slots are frames that get attached to other
frames - Attaching a slot to a class, for example
- You can attach a slot as either a template slot
or an own slot - template slots define information that can be
propagated to elements of a class (and via
inheritance) - own slots are strictly local information
28Slots Propagation
instance-of
subclass-of
T
T
O
O
/dev/null
/dev/null
T
T
O
O
29Restating this in KIF
(gt (template-slot-value ?S ?C ?V) (and
(template-slot-of ?S ?C) (gt
(instance-of ?I ?C) (holds
?S ?I ?V)) (gt (subclass-of ?X ?C)
(template-slot-value ?S ?X
?V))))
30Restating this in English
If V is a template slot value of S on the class
C, then we know the following three things 1. S
has been attached to C as a template slot 2. V
is an own slot value for all instances I of C 3.
V is a template slot value for all subclasses X
of C
31Restating this in Swedish
Om V är värdet på en mallegenskap S på klassen
C, så vet vi följande tre saker 1. S har
kopplats till C som en mallegenskap 2. V är ett
eget värde på egenskapen för alla instanser I av
C 3. V är värdet på mallegenskapen för alla
underklasser X av C
32Instances
- An instance is a frame
- The idea of instance is, more or less, a GUI
notion (and has no implications for the knowledge
model)
33Facets
- Facets are frames (and symbols ...)
- Facets are also ternary predicates (taking a
frame, a slot, and a value) - Facets also have associated predicates
- ternary (take a slot, a frame, and a facet
formalize the notion of attachment) - 4-ary (take a slot, a frame, a facet and a value)
template-facet-of facet-of
34Facet Restrictions
- Template facets can only be attached to template
slots - Having a value implies attachment
- Similarly for own slots
(gt (template-facet-of ?F ?S ?C)
(template-slot-of ?S ?C))
(gt (template-facet-value ?F ?S ?C ?V)
(template-facet-of ?F ?S ?C))
35Facet Propagation
subclass-of
- Facets are attached to (frame, slot) pairs
- Whenever a slot propagates, from one frame to
another, the facets are carried along
T
O
/dev/null
T
O
36Canonical Facets
- The standard facets are local (e.g. at a single
(frame,slot) pair) constraints
VALUE-TYPE CARDINALITY NUMERIC-MINIMUM NUMERIC
-MAXIMUM
(gt (VALUE-TYPE ?S ?F ?C) (and (class ?C)
(gt (holds ?S ?F ?V)
(instance-of ?V ?C))))
37OKBC Revisited
- Protégé-2000 knowledge-bases are OKBC-compliant
- Protégé-2000 is not OKBC generic
- There are OKBC knowledge bases that Protégé-2000
cannot handle - Its close, though !
- Differences are KA related
- Instances are instances of exactly one class
- The role slot
38Desiderata for a Constraint Language
- William Grosso
- Stanford Medical Informatics
- Stanford University
39Overview
- Examples of Constraints
- Design Desiderata
- The Constraint Language
- Implementation Decisions
- The Default Implementation
- Dimensions for Evolution
40Desiderata for the Language
41The Big Modular Picture of Protege
Widgets
Widgets
Widgets
Widgets
Widgets
Widgets
Core Protege Framework
Storage Model
Constraint Engine
Actual KB
42Full and formal semantics
- Widgets can include widgets for acquiring
specific types of constraints - Multiple constraint engines are possible
- Performing different checks at different times
- Replacing one engine with another
- The entire kb gets stored out to some server
- Without formal semantics (a logical theory), this
is just not possible
43Compatibility with the OKBC knowledge model
- OKBC does not specify an axiom language
- OKBC is specified as a set of relations in KIF
- Classes are unary predicates, slots are binary
predicates, ... - All of these relations should immediately be
accessible from within the constraint language - And the constraint engine should give them the
right semantics
44Ease of Translation
- Important goal we want to be able to use Protege
as a front-end to a wide-variety of knowledge
base servers - This means that the constraint language ought to
be easily translated into a wide-variety of
constraint languages - At the very least, figuring out what can be
translated ought to be easy
45Supported by a reasonable default implementation
- KMG will provide a default implementation of the
constraint language - Not very efficient
- But good semantics for KA
- Good enough to bootstrap the process
- As we learn more about constraints, and how they
are used, we hope that people with real expertise
will step forward
46A Deficient Syllogism
Major Premise Interoperability requires formal
semantics (and knowledge models based on
mathematical logic) Minor Premise Humans
dont easily adapt to formal languages Conclusion
Widgets !!!!!!!
47Human Readability is a Red Herring
- The casual user interacts with forms
- The expert user knows about classes and instances
- Very few users know about the underlying logical
formalism - If we design widgets for acquiring constraints,
then the user will never see the constraint
language
48The Constraint Language
49A Single Constraint Language
- Constraint language is really an interlingua for
communication - Between widgets and the framework
- Between the framework and the storage model
- If we want all the components to evolve
independently and communicate gracefully, we need
to fix a single constraint language
50Logic
- We decided on a variant of KIF
- We use the KIF connectives and the KIF syntax
- Not all the KIF constants and predicates are
included - Our theory of arithmetic is much smaller
- (defrelation ...) is omitted
- For now ?
51Sorted Logic
- Two new constructs in the language
- defset allows the user to define a bag of
values. - Similar to notion of class, but with no support
in the ontology tab - Useful for enumerated types
- defrange all variables must have their types
declared - types can include things like is a target of
slot name
52Reified Constraints
- There is a knowledge-base for constraints
- Acquiring a constraint is really acquiring an
instance of Constraint - You can annotate sentences and relations with
useful information - You can store constraints out to a vanilla
frame-based system - To a simple KB server, a constraint is just
another frame
53The Constraint KB
- To use constraints, you must include the
constraint knowledge base - Will also contain default implementation of
engine (as a tab widget) - Will also include java code for the standard
relations - Will also include widgets for constraint
acquisition - Wont include any instances
54(No Transcript)
55Constraints and Axioms
- Constraints and Axioms use the syntax of logic
but have different semantics - Axioms can be used to assert new knowledge
- Constraints are restrictions on existing
knowledge - (forall ?x (exists ?y (rel-name ?x ?y)))
- Asserted as an axiom its reasonable to create a
skolem constant and bind it to ?y - Asserted as a constraint might not want to
skolemize
56Multiple Interpretations of a Single Theory
- No engine can return true when OKBC would
return false - Model theoretic terms If an engine thinks there
is a model, then there must be one - But engines are free to overlook models
57New functions and predicates are implemented
procedurally
- KIF has the (defrelation ...) construct to define
new relations - Our point of view A relation is, almost always,
something that should be defined in the ontology - The exceptions (mostly n-ary relations) should be
annotated explicitly and defined procedurally
58(No Transcript)
59Universal Implementation Decisions
60The Language is defined in a Knowledge-Base
- PAL Protege Axiom Language
- The PAL knowledge-base contains
- The constraint ontology
- The default relations
- And the java code that implements them
- The default implementation
- Once again, taking advantage of knowledge-base
inclusion
61Enforcement of constraints is not necesarily
real-time
- When the user loads (or saves) a knowledge-base,
it should be consistent - Its not always possible for the user to always
have a consistent KB while editing - And, even if it were possible, it might be
inconvenient. - Therefore, the user should decide when to check
constraints
62Enforcement via plug-ins (and tabs)
- The basic way users will interact with constraint
engines will be via tabs and widgets - We want to enable special types and categories of
constraints to be annotated - Basic mechanism subclassing Constraint
- We want to have multiple possible engines,
depending on context and user preference - Constraint tabs are just another way of
interacting with the KB .
63Two Important Consequences of these Decisions
64What is a knowledge base ?
- Used to be classes and instances
- Now also includes widgets
- Java code !
- Now also includes constraints
- Instances with an interpretation beyond the
standard meaning associated to frames - Custom pieces of java code that implement new
relations (possibly domain specific) for the
constraint language
65We have evolved from OKBC to some extent
- If we use the ontology as a type system, it is
convenient to have the types be mutually
exclusive (instances are instances of a single
class) - The role predicate
66The Default Implementation
67Model-checking, rather than theorem proving
- Make strong closed world assumptions
- Main goals
- Detect incomplete entry of information
- Check entered information for inconsistencies
68Envisioned Constraints are mostly Local
- The more false this assumption is, the worse
the engine will perform(the better a traditional
theorem prover would perform ?)
69Dimensions for Evolution
70Richer axiom ontology
- Subclassing our ontology to provide more detailed
information - Hints to enforcement engines
- This is best validated using subroutine x or
This statement is complexity level gamma - Statement could be generated by a widget
- Your widget, in your domain, generating PAL
statements for my engine to check - Formal Semantics necessary
- Engines might less the user check a subset of the
theory
71More Predicates and Functions
- Not many are included in the default
implementation - Mostly for reasoning about types, arithmetic, and
slot values (taking transitive closures) - Over time, we hope that people will implement
predicates and pass the code to us (for inclusion
as part of the Protege distribution) - Note also that relations dont have to be general
-- you can add knowledge-base specific relations
72Other engines
- In particular, a theorem prover ?
- Can GSAT be used as a preprocessing step ?
- How about the work on ALL ?
73Support for Knowledge-Acquisition
- The knowledge-model is done
- The axiom language is done (as a spec)
- Engines are a mere matter of programming
(similar things have been done for 25 years now) - Whats left ?
74Subclassing the PAL Ontology to provide hooks for
widgets ?
- CONSTRAINT only provides two slots (pragmatics
and sentence) - How about other slots
- Evaluation cost (for different engines) ?
- Evaluation hints ?
- What widget generated the axiom ?
75No A is a B
- A statement that is often enforced by defining
separate classes - But often not
- No hemophiliac should be taking Lasix
- Do we really want Hemophiliac as a subclass of
Person ? - Do we really want Lasix_Taker as a subclass of
Patient ?
76Lets write it in PAL
(forall ?P (gt (and (Person ?P)
(has-disease ?P Hemophilia))
(not (taking-drug ?P Laxol))))
77This is really a Venn Diagram
Empty Intersection
Person
Person
Partially filled out instance defines matching
Partially filled out instance defines matching
78Widgets play a role here
- Widget is placed on screen to mediate between
humans and KB - Widget generates PAL statements
- Engine interprets PAL statements
- User may or may not ever see PAL
79Things that are done
- The knowledge model is done
- The constraint language is done
- The default implementation is designed and
(partially implemented)
80Things that we will do
- Finish the default implementation
- Publish a full spec (as a Tech Report) ?
- Serve as a clearinghouse for engines and widgets