The Ocelot Frame Knowledge Representation System - PowerPoint PPT Presentation

About This Presentation
Title:

The Ocelot Frame Knowledge Representation System

Description:

... Knowledge Representation Systems. Long history of development in the AI knowledge ... Aka DB, Knowledge Base, KB, PGDB. An Ocelot database is a ... – PowerPoint PPT presentation

Number of Views:226
Avg rating:3.0/5.0
Slides: 19
Provided by: peter170
Category:

less

Transcript and Presenter's Notes

Title: The Ocelot Frame Knowledge Representation System


1
The Ocelot Frame Knowledge Representation System
  • Peter D. Karp, Ph.D.
  • Bioinformatics Research Group
  • SRI International
  • pkarp_at_ai.sri.com

2
Frame Knowledge Representation Systems
  • Long history of development in the AI knowledge
    representation community
  • Distant cousin of object-oriented databases
    (convergent evolution)
  • Background reading on frame systems
  • P. Karp, The design space of frame knowledge
    representation systems
  • http//www.ai.sri.com/pubs/files/236.pdf
  • P. Karp, Distinguishing Knowledge Bases and Data
    Bases Who's on First and What's on Second
  • http//www.ai.sri.com/pubs/files/1397.pdf

3
Ocelot Information
  • P.D. Karp et al, A collaborative environment for
    authoring large knowledge bases, J Intelligent
    Information Systems 13155-94 1999.
  • http//www.ai.sri.com/pkarp/pubs/99jiis.pdf
  • Ocelot Users Guide
  • http//www.ai.sri.com/pkarp/ocelot/

4
Pathway Tools Architecture
Pathway Genome Navigator
Web Mode
Desktop Mode
Protein Editor Pathway Editor Reaction Editor
Lisp API PerlCyc API JavaCyc API
Generic Frame Protocol
Oracle or MySQL
Disk File
Ocelot DBMS
5
Ocelot Data Model
  • Ocelot database
  • Aka DB, Knowledge Base, KB, PGDB
  • An Ocelot database is a collection of frames and
    slots

6
Ocelot Frames
  • Two kinds of frames
  • Classes Genes, Pathways, Biosynthetic Pathways
  • Instances (objects) trpA, TCA cycle
  • A symbolic frame name (id, key) uniquely
    identifies each frame
  • Examples EG10223, TRP, Proteins
  • Classes have Superclass(es), Subclass(es),
    Instance(s)
  • Instances have one or more parent classes

7
Slots
  • Encode attributes and properties of a frame
  • Molecular weight, gene coordinates, comments
  • Represent relationships between frames
  • The value of a slot is the identifier of another
    frame

8
Slots
  • Number of values
  • Single valued
  • Multivalued sets or lists
  • Slot values
  • Integer, real, string, symbol (frame name)
  • Every slot is described by a slot frame
    (slotunit) in a KB that defines meta information
    about that slot
  • Datatype, classes it pertains to, constraints
  • Enumerations
  • Two slots are inverses if they encode opposite
    relationships
  • Slot Product in class Genes
  • Slot Gene in class Polypeptides

9
Ocelot Data Model
  • Frame data model compared to relational model
  • Minimizes size of schema relative to semantic
    complexity
  • Inheritance lets us define new classes by
    modifying existing classes
  • Relational normalization breaks multivalued
    attributes into separate tables not needed in
    frame data model

10
Ocelot Schema
  • Schema is stored within the DB
  • Schema is self documenting
  • Slot frames define metadata about slots
  • Schema evolution facilitated by
  • Easy addition/removal of slots, or alteration of
    slot datatypes
  • Flexible data formats that do not require
    dumping/reloading of data
  • New versions of Pathway Tools include a schema
    upgrade function
  • Updates schema to match that of new MetaCyc
    version
  • Transforms data into new schema

11
Ocelot Storage System Architecture
  • Persistent storage via disk files or Oracle or
    MySQL
  • Oracle or MySQL (RDBMS KBs)
  • Concurrent development by multiple users
  • Incrementally fault in frames as referenced by
    the application
  • Incrementally save modified frames only
  • Stores complete transaction history of PGDB
  • Disk files
  • Updating by a single user at a time
  • Read in entirety at start of session
  • Write in entirety at every save

12
  • Figure showing multiple users tapping into one
    mysql server

13
Ocelot Storage Subsystem
  • RDBMS KBs
  • RDBMS schema is independent of application schema
  • DBMS is submerged within Ocelot, invisible to
    users
  • Frames transferred from DBMS to Ocelot
  • On demand
  • By background prefetcher
  • Memory cache
  • Persistent disk cache speeds performance via
    Internet

14
Ocelot Frame Faulting
  • When a frame is referenced by Pathway Tools
  • Look in Ocelot virtual memory
  • Look in disk cache
  • Look in RDBMS

15
Ocelot RDBMS Transaction History
  • RDBMS KBs store complete transaction history
  • Stored as sequences of GFP operations executed by
    the user or by Pathway Tools
  • Right click -gt Show -gt Changes in pop-up window
  • Used to compute gene last-curated date
  • Can be used to open a PGDB in an earlier state

16
Ocelot RDBMS Concurrency Control
  • When user A saves updates
  • Ocelot queries all transactions that occurred
    since A last saved or since the start of As
    session
  • Ocelot compares the operations in those
    transactions with the updates made by A
  • If conflicts are found, save does not occur and
    conflicts are reported to the user
  • If no conflicts, save proceeds
  • Other user transactions are evaluated into As
    session
  • Refresh

17
Ocelot Update Conflicts
  • Example conflicting updates
  • User A deletes frame F User B modifies value
    in slot F
  • User A changes MW of protein P from 3 to 4
    User B changes MW of protein P from 3 to 5
  • Example of updates that dont conflict
  • User A updates frame E User B updates frame F
  • User A updates the value of P.MW User B
    updates the value of P.pI
  • Users A and B both delete all values of P.MW

18
Revert KB Operation
  • Undoes all changes in current session

19
Pathway Tools / BioCycSoftware/Database Bundles
  • Each downloadable Pathway Tools configuration
    contains a combination of PGDBs
  • Those PGDBs are loaded into Lisp virtual memory
  • Build process
  • Start Common Lisp
  • Load in all Pathway Tools compiled Lisp code into
    virtual memory
  • Load in all PGDBs for that configuration into
    virtual memory
  • Save virtual memory image as binary executable
    file

20
Full BioCyc or Tier 123 Configuration
  • 507 PGDBs loaded into virtual memory

21
BioCyc at 10,000 Genomes
  • Scalability of current approach is limited
  • New approach For full BioCyc, store PGDBs not in
    virtual memory but in Franz AllegroCache
  • AllegroCache is a Common Lisp object-oriented
    database
  • Implementation now in hand for Ocelot
  • We have done extensive performance testing
  • Performance looks good to 10,000 PGDBs
Write a Comment
User Comments (0)
About PowerShow.com