Ontologies: BioOntologies: Their Creation and Design - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Ontologies: BioOntologies: Their Creation and Design

Description:

Components of an Ontology. Constraints and other meta information about relations. Slot Product: ... 'Physical', 'Abstract', 'Structure', 'Substance' useful for ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 85
Provided by: depts156
Category:

less

Transcript and Presenter's Notes

Title: Ontologies: BioOntologies: Their Creation and Design


1
-Ontologies Bio-Ontologies Their Creation and
Design
  • Dr. Peter Karp
  • SRI, http//www.ai.sri.com/pkarp/
  • Dr. Robert Stevens Professor Carole Goble
  • University of Manchester, UK
  • http//img.cs.man.ac.uk/tambis

2
Advertisement
  • The Fourth Annual Bio-Ontologies Meeting
  • "Sharing Experiences and Spreading Best Practice
  • Sponsored by
  • GlaxoSmithKline Pharmaceuticals
  • Tivoli Gardens, Copenhagen, Denmark,
  • 26th July 2001
  • Organised by Richard Chen, Carole Goble, Robert
    Stevens, Peter Karp, Pat Hayes, Robin McEntire
    and Eric Neumann.
  • http//img.cs.man.ac.uk/stevens/workshop01

3
Outline
  • What is an ontology?
  • Motivation for ontologies in bioinformatics
  • Definition of an ontology
  • Naming the parts comparing the types
  • Knowledge representation
  • Building an ontology
  • Methodologies, pprinciples and pitfalls
  • Running example a macromolecule fragment
  • Ontology Tools
  • Development tools

4
OntologiesDefinitions, Components, Subtypes
5
Outline
  • Motivations for ontologies in bioinformatics
  • Definition of ontology
  • Principles and pitfalls of ontology design
  • GKB Editor ontology development tool

6
Definition of an Ontology
  • Conceptualization of a domain of interest
  • Concepts, relations, attributes, constraints,
    objects, values
  • An ontology is a specification of a
    conceptualization
  • Formal notation
  • Documentation
  • A variety of forms, but includes
  • A vocabulary of terms
  • Some specification of the meaning of the terms
  • Ontologies are defined for reuse

7
Roles of Ontologies in Bioinformatics
  • Success of many biological DBs depends on
  • High fidelity ontologies
  • Clearly communicating their ontologies
  • Prevent errors on data entry and interpretation
  • Common framework for multidatabase queries
  • Controlled vocabularies for genome annotation
  • Riley ontology, GO
  • EC numbers

8
Roles of Ontologies in Bioinformatics
  • Information-extraction applications
  • Reuse is a core aspect of ontologies
  • Reuse of existing ontologies faster than
    designing new ones
  • Reuse decreases semantic heterogeneity of DBs
  • Schema-driven Software
  • Knowledge-acquisition tools
  • Query tools

9
Definitions
  • Data Model
  • Primitive data structuring mechanism in which an
    ontology is expressed
  • Relational data model, object-oriented data
    model, frame data model
  • Ontology
  • Domain specific conceptualization expressed
    within some data model

10
Components of an Ontology
  • Concepts
  • AKA Class, Set, Type, Predicate
  • Gene, Reaction, Macromolecule
  • Taxonomy of concepts
  • Generalization ordering among concepts
  • Concept A is a parent of concept B iff every
    instance of B is also an instance of A
  • Superset / subset
  • A kind of vs a part of

11
Taxonomy of Concepts
12
Components of an Ontology
  • Objects
  • AKA Instances, members of the set
  • trpA Gene, Reaction 1.1.2.4
  • Strictly speaking, an ontology with instances is
    a knowledge base
  • Relations and Attributes
  • AKA Slots, properties
  • Product of Gene, Map-Position of Gene
  • Reactants of Reaction, Keq of Reaction
  • Values
  • The Product of the trpA Gene is
    tryptophan-synthetase
  • trpA.Product tryptophan-synthetase

13
Components of an Ontology
  • Constraints and other meta information about
    relations
  • Slot Product
  • Value type Poypeptide or RNA
  • Domain Genes
  • Slot Map-Position
  • Value type Number
  • Domain Genes
  • Cardinality At-Most 1
  • Range 0 lt X lt 100
  • General Axioms
  • Nucleic acids lt 20 residues are oligonucleiotides

14
More on Concepts
  • Primitive properties are necessary
  • Globular protein must have hydrophobic core, but
    a protein with a hydrophobic core need not be a
    globular protein
  • Defined properties are necessary sufficient
  • Eukaryotic cells must have a nucleus. Every cell
    that contains a nucleus must be Eukaryotic.

15
Ontology Subtypes Expressiveness
  • Controlled vocabulary
  • List of terms
  • Taxonomy
  • Terms in a generalization hierarchy
  • DB schemas (relational and object-oriented)
  • More implementation specific
  • No instance information
  • Limited constraints
  • Frame knowledge bases
  • Description Logics

16
Ontology Subtypes
  • Database schema
  • Concepts, relations, constraints
  • Perhaps no taxonomy
  • At most hundreds of concepts
  • Taxonomy
  • Concepts, taxonomy, perhaps a few relations
  • Thousands of concepts
  • Knowledge base
  • Concepts, relations, constraints, objects, values
  • Hundreds to hundreds of thousands of concepts and
    objects

17
Ontology Subtypes
  • Generic (a.k.a. upper, core or reference)
  • common high level concepts
  • Physical, Abstract, Structure, Substance
  • useful for ontology re-use
  • important when generating or analysing natural
    language expressions
  • Domain-oriented
  • domain specific (e.g. E.coli)
  • domain generalisations (e.g. gene function)
  • Task-oriented
  • task specific (e.g. annotation analysis)
  • task generalisations (e.g. problem solving)

18
Knowledge Representation
  • Ontology are best delivered in some computable
    representation
  • Variety of choices with different
  • Expressiveness
  • The range of constructs that can be used to
    formally, flexibly, explicitly and accurately
    describe the ontology
  • Ease of use
  • Computational complexity
  • Is the language computable in real time
  • Rigour
  • Satisfiability and consistency of the
    representation
  • Systematic enforcement mechanisms
  • Unambiguous, clear and well defined semantics
  • A subclassOf B dont be fooled by syntax!

19
Languages
  • Vocabularies using natural language
  • Hand crafted, flexible but difficult to evolve,
    maintain and keep consistent, with poor semantics
  • Gene Ontology
  • Object-based KR frames
  • Extensively used, good structuring, intuitive.
    Semantics defined by OKBC standard
  • EcoCyc (uses Ocelot) and RiboWeb (uses
    Ontolingua)
  • Logic-based Description Logics
  • Very expressive, model is a set of theories, well
    defined semantics
  • Automatic derived classification taxonomies
  • Concepts are defined and primitive
  • Expressivity vs. computational complexity balance
  • TAMBIS Ontology (uses FaCT)

20
Vocabularies Gene Ontology
  • Hand crafted with simple tree-like structures
  • Position of each concept and its relationships
    wholly determined by a person
  • Flexible but
  • Maintenance and consistency preservation
    difficult and arduous
  • Poor semantics
  • Single hierarchies are limiting

21
Frame Data Model
  • Frames
  • Classes Genes, Reactions
  • Instances
  • Relationships
  • Slots Chromosome, map-position, citations,
    reactants, products, Keq
  • Facets Chromosome is single-valued, instance of
    class Chromosomes Citations is multiple valued,
    set of strings
  • Ontolingua the most famous frame system
  • All frames asserted into taxonomy by hand
  • All concepts are primitive

22
Description Logics
  • Describe knowledge in terms of concepts and
    relations
  • Concept defined in terms of other roles and
    concepts
  • Enzyme protein which catalyses reaction
  • Reason that enzyme is a kind of protein
  • Model built up incrementally and descriptively
  • Uses logical reasoning to figure out
  • Automatically derived (and evolved)
    classifications
  • Consistency -- concept satisfaction

23
Frames and Logics
  • Frames
  • Rich set of language constructs
  • Impose restrictive constraints on how they are
    combined or used to define a class
  • Only support primitive concepts
  • Taxonomy hand-crafted
  • Description logics
  • Limited set of language constructs
  • Primitives combined to create defined concepts
  • Taxonomy for defined concepts established through
    logical reasoning
  • Expressivity vs. computational complexity
  • Less intuitive
  • Ideal both! Current OIL activity uses a mixture.
    Logics provide reasoning services for frame
    schemes.

24
Ontology Exchange
  • To reuse an ontology we need to share it with
    others in the community
  • Exchanging ontologies requires a language with
  • common syntax
  • clear and explicit shared meaning
  • Tools for parsing, delivery, visualising etc
  • Exchanging the structure, semantics or
    conceptualisation?

25
Ontology Exchange Languages
  • XOL eXtensible Ontology Language
  • XML markup
  • Frame based
  • Rooted in OKBC
  • http//www.ai.sri.com/pkarp/xol/
  • OIL Ontology Interface LayerOntology Inference
    Layer
  • Gives a semantics to RDF-Schema
  • http//www.ontoknowledge.org/oil

26
OIL Ontology Metadata (Dublin Core)
  • Ontology-container
  • title macromolecule fragment
  • creator robert stevens
  • subject macromolecule generic ontology
  • description example for a tutorial
  • description.release 2.0
  • publisher R Stevens
  • type ontology
  • formal pseudo-xml
  • identifier http//www.ontoknowledge.org/oil/oil.
    pdf
  • source http//img.cs.man.ac.uk/stevens/tambis-
    oil.html
  • language OIL
  • language en-uk
  • relation.haspart http//www.ontoRus.com/bio/mmol
    e.onto

27
The Three Roots of OIL
Description Logics Formal Semantics Reasoning
Support
Frame-based Systems Epistemological
Modelling Primitives
OIL
Web Languages XML- and RDF-based syntax
28
OIL primitive ontology definitions
  • slot-def has-backbone
  • inverse is-backbone-of
  • slot-def has-component
  • inverse is -component-of
  • properties transitive
  • class-def nucleic-acid
  • class-def rna subclass-of nucleic-acid
  • slot-constraint has-backbone
  • value-type ribophosphate
  • class-def ribophosphate
  • class-def deoxyribophosphate
  • subclass-of NOT ribophosphate

29
OIL defined ontology definitions
  • class-def defined dna
  • subclass-of nucleic-acid AND NOT rna
  • slot-constraint has-backbone
  • value-type deoxyribophosphate
  • class-def defined enzyme
  • subclass-of protein
  • slot-constraint catalyse
  • has-value reaction
  • class-def defined kinase
  • subclass-of protein
  • slot-constraint catalyse
  • has-value phosphorylation-reaction

30
OIL in XML
  • OIL has a DTD, an XML Schema and a mapping to
    RDF-Schema. See web site for details
  • ltslot-defgt
  • ltslot-name has-component/gt
  • ltinversegt ltslot-name is-component-of/gt
    lt/inversegt
  • ltpropertiesgt lttransitive/gt lt/propertiesgt
  • lt/slot-defgt
  • ltclass-defgt ltclass-name nucleic-acid/gt
    lt/class-defgt
  • ltclass-defgt
  • ltclass-name rna/gt
  • ltsubclass-ofgt ltclass name nucleic-acid/gt
    lt/subclass-ofgt
  • ltslot-constraintgt
  • ltslot-name has-backbone/gt
  • ltvalue-typegt ltclass name ribophosphate
    lt/value-typegt
  • lt/slot-constraintgt
  • lt/class-defgt

31
OIL Remarks
  • Tools
  • Protégé II editor
  • FaCT reasoner
  • Other projects
  • Semantic Web projects (http//www.semanticweb.org)
  • Agents for the web projects (e.g. DAML)
  • A knowledge representation language and inference
    mechanism for the web

32
OIL Features
  • Based on standard frame languages
  • Extends expressive power with DL style logical
    constructs
  • Still has frame look and feel
  • Can still function as a basic frame language
  • OIL core language restricted in some respects so
    as to allow for reasoning support
  • No constructs with ill defined semantics
  • No constructs that compromise decidability
  • Has both XML and RDF(S) based syntax

33
OIL Features
  • Semantics clearly defined by mapping to very
    expressive Description Logic, e.g.
  • slot-constraint reverse-transcribe-from
    has-valuemRNA or (part-of has-value mRNA)
  • ? ?eats.meat ? ?eats.fish
  • Note the importance of clear semantics
  • ?eats.(meat ? fish)
  • is inconsistent (assuming meat and fish are
    disjoint)
  • Mapping can also be used to provide reasoning
    support from a Description Logic system (e.g.,
    FaCT)

34
Why Reasoning Support?
  • Key feature of OIL core language is availability
    of reasoning support
  • Reasoning intended as design support tool
  • Check logical consistency of classes
  • Compute implicit class hierarchy
  • May be less important in small local ontologies
  • Can still be useful tool for design and
    maintenance
  • More important with larger ontologies/multiple
    authors
  • Valuable tool for integrating and sharing
    ontologies
  • Use definitions/axioms to establish
    inter-ontology relationships
  • Check for consistency and (unexpected) implied
    relationships
  • Already shown to be useful technique for DB
    schema integration

35
Classifying by Reasoning
36
Finding Inconsistencies
37
Changing Classifications
38
DAMLOIL
  • OIL merged with DAML
  • Originally retained frame syntax
  • DAML more concerned with deploymnent rather than
    building and managing
  • OIL mapped to DAMLOIL, but not reliably reversed
  • FRAME look and feel may return
  • Web ontology language

39
Building Ontologies
40
Building Ontologies
  • No field of Ontological Engineering equivalent to
    Knowledge or Software Engineering
  • No standard methodologies for building
    ontologies
  • Such a methodology would include
  • a set of stages that occur when building
    ontologies
  • guidelines and principles to assist in the
    different stages
  • an ontology life-cycle which indicates the
    relationships among stages.
  • Gruber's guidelines for constructing ontologies
    are well known.

41
The Development Lifecycle
  • Two kinds of complementary methodologies emerged
  • Stage-based, e.g. TOVE Uschold96
  • Iterative evolving prototypes, e.g. MethOntology
    Gomez Perez94.
  • Most have TWO stages
  • Informal stage
  • ontology is sketched out using either natural
    language descriptions or some diagram technique
  • Formal stage
  • ontology is encoded in a formal knowledge
    representation language, that is machine
    computable
  • An ontology should ideally be communicated to
    people and unambiguously interpreted by software
  • the informal representation helps the former
  • the formal representation helps the latter.

42
A Provisional Methodology
  • A skeletal methodology and life-cycle for
    building ontologies
  • Inspired by the software engineering V-process
    model
  • The overall process moves through a life-cycle.

The left side charts the processes in building
an ontology
The right side charts the guidelines, principles
and evaluation used to quality assure the
ontology
43
The V-model Methodology
Ontology in Use
Evaluation coverage, verification, granularity
Identify purpose and scope
Knowledge acquisition
User Model
Conceptualisation Principles commitment,
conciseness, clarity, extensibility, coherency
Conceptualisation
Integrating existing ontologies
Conceptualisation Model
Encoding/Representation principles encoding
bias, consistency, house styles and standards,
reasoning system exploitation
Encoding
Representation
Implementation Model
44
The ontology building life-cycle
Identify purpose and scope
Knowledge acquisition
Building
Language and representation
Conceptualisation
Integrating existing ontologies
Available development tools
Encoding
Evaluation
45
User Model Identify purpose and scope
  • Decide what applications the ontology will
    support
  • EcoCyc Pathway engineering, qualitative
    simulation of metabolism, computer-aided
    instruction, reference source
  • TAMBIS retrieval across a broad range of
    bioinformatics resources
  • The use to which an ontology is put affects its
    content and style
  • Impacts re-usability of the ontology

46
User Model Knowledge Acquisition
  • Specialist biologists standard text books
    research papers and other ontologies and database
    schema.
  • Motivating scenarios and informal competency
    questions informal questions the ontology must
    be able to answer
  • Evaluation
  • Fitness for purpose
  • Coverage and competency

47
Ontology Scenario
  • A molecule ontology
  • Describes the molecules stored in bioinformatics
    databases and annotated therein
  • It should cover the molecules and other chemicals
    described in the resources
  • The ontology will be used for querying and
    annotating information in bioinformatics
    resources.

48
Competency Questions
  • Cover the macromolecules found in molecular
    biology resources and courses
  • Should accommodate various views on the
    macromolecules
  • should cover the queries people want to ask of
    macromolecules
  • In reality, need more detail on these questions-
    give me tRNA genes with anticodon x, from
    aardvark.

49
Acquiring Knowledge
  • Find your knowledge!
  • An important source is your head, but
  • Use text books, glossaries (many of which lie on
    the web) and domain experts
  • Use other ontologies what did they include and
    how did they do it?
  • Record your sources of knowledge.
  • Use your competency questions

50
Starting Concept List
  • Chemicals atom, ion, molecule, compound,
    element
  • Molecular-compound, ionic-compound,
    ionic-molecular-compound,
  • Ionic-macromolecular-compound and
    ionic-msall-macromolecular-compound
  • Protein, peptide, polyprotein, enzyme,
    holo-protein, apo-protein,
  • Nucleic acid DNA, RNA, tRNA, mRna, snRNA,

51
Conceptualisation Model Conceptualisation
  • Identify the key concepts, their properties and
    the relationships that hold between them
  • Which ones are essential?
  • What information will be required by the
    applications?
  • Structure domain knowledge into explicit
    conceptual models.
  • Identify natural language terms to refer to such
    concepts, relations and attributes

52
Conceptualisation Sketch
Chemical
Atom
Element
Compound
Molecule
Ion
Metal
Non-Metal
Molecular Compound
Molecular Element
Ionic Compound
Ionic Molecule
Metaloid
Ionic Molecular Compound
53
Molecule Conceptualisation Sketch
Ionic Macromolecular Compound
Macromolecule
Small Molecule
Nucleic Acid
Protein
Polysaccharide
Peptide
DNA
RNA
Enzyme
Starch
Glycogen
mRNA
tRNA
rRNA
snRNA
54
Conceptualisation Model Naming
  • Determine naming conventions
  • Consistent naming for classes and slots
  • EcoCyc
  • Classes are capitalized, hyphenated, plural
  • Slot names are uppercase
  • A quality ontology captures relevant biological
    distinctions with high fidelity

55
Conceptualisation Model Pitfalls
  • Pitfall Missing ontological elements
  • Missing classes Swiss-Prot Protein complexes
  • Lack of Lipid and Cofactor in example ontology
  • Missing attributes Genetic code identifier
  • Confuse 11 with 1Many, or 1Many with ManyMany
  • Cofactor as an attribute of reaction as well as
    protein
  • Important data is stored within text/comment
    fields
  • Pitfall Extra ontological elements
  • Pitfall Stop over-elaborating when do I stop?
  • Pitfall Relevance do I really need all this
    detail?

56
Conceptualisation Partonomy
  • Part-of relationships very important
  • Several linds of part-of component-of,
    region-of, mixture-of
  • Alpha-helix is a region of a protein, but a
    protein is compoennt of a complex
  • Care in placing transitivity

57
Integrating Existing Ontologies
  • Reuse or adapt existing ontologies when possible
  • Save time
  • Correctness
  • Facilitate interoperation
  • Reuse GO to give example ontology Function,
    Process and Location
  • Integration of ontologies
  • Ontologies have to be aligned
  • Hindered by poor documentation and argumentation
  • Hindered by implicit assumptions
  • Shared generic upper level ontologies should make
    integration easier

58
Encoding Implementation Toolkit
  • Construct ontology using an ontology-development
    system
  • Does the data model have the right expressivity?
  • Is it just a taxonomy or are relationships
    needed?
  • Is multiple parentage needed? Inverse
    relationships?
  • What types of constraints are needed?
  • Are reasoning services needed?
  • What are authoring features of the development
    tool?
  • Can ontology be exported to a DBMS schema?
  • Can ontology be exported to an ontology exchange
    language?
  • Is simultaneous updating by multiple authors
    needed?
  • Size limitations of development tool?

59
Encoding
  • Encode sketch in KRL
  • Use OIL a frame syntax with reasoning support
    if we want it
  • Wide range of expressivity (see cofactor example
    later)
  • Hand craft a hierarchy implement the sketch
    made earlier
  • This hand-crafted version can be migrated to a
    more descriptive form later.

60
Initial Encoding
  • class-def chemical
  • subclass-of substance
  • class-def molecule
  • subclass-of chemical
  • class-def compound
  • subclass-of chemical
  • class-def molecular-compound
  • subclass-of molecule and compound

61
Encoding Ontology Implementation Pitfalls
  • Pitfall Semantic ambiguity
  • Multiple ways to encode the same knowledge
  • Meaning of class definitions unclear
  • Pitfall Encoding Bias
  • Encoding the ontology changes the ontology

62
Encoding Ontology Implementation Pitfalls
  • Pitfall Redundancy (lack of normalization)
  • Exact same information repeated
  • Presence of computationally derivable information
  • Date of birth and age
  • Sequence length
  • DNA sequence and reverse complement
  • More effort required for entry and update
  • In KB partial updates lead to inconsistency
  • OK if redundant information is maintained
    automatically

63
Encoding The Interaction Problem
  • Task influences what knowledge is represented and
    how its represented
  • Molecular biology chemical and physical
    properties of proteins
  • Bioinformatics accession number, function gene
  • Underlying perspectives mean they may not be
    reconcilable
  • If an ontology has too many conflicting tasks it
    can end up compromised TaO experience

64
Evaluate it - A guide for reusability
  • Conciseness
  • No redundancy
  • Appropriateness protein molecules at the atomic
    resolution when amino acid level would do
  • Clarity
  • Consistency
  • Satisfiability it doesnt contradict itself
  • Molecule and Compound disjoint, but
    molecular-cpound is (molecule and compound)
  • Commitment
  • Do I have to buy into a load of stuff I dont
    really need or want just to get the bit I do?

65
Documentation Make Ontology Understandable!
  • Produce clear informal and formal documentation
  • An ontology that cannot be understood will not be
    reused
  • Genbank feature table
  • NCBI ASN.1 definitions
  • There exists a space of alternative ontology
    design decisions
  • Semantics / Granularity
  • Terminology
  • Pitfall Neglecting to record design rationale

66
Molecules Revisited
Non-Ionic Macromolecular Compound
Ionic Macromolecular Compound
Macromolecule
Small Molecule
Nucleic Acid
Protein
Polysaccharide
Peptide
DNA
RNA
Enzyme
Starch
Glycogen
mRNA
tRNA
rRNA
snRNA
67
More Encoding
  • class-def chemical
  • subclass-of substance
  • class-def defined molecule
  • subclass-of chemical
  • Slot-constraint contains-bond min-cardinality 1
    has-value covalent-bond
  • class-def defined compound
  • subclass-of chemical
  • Slot-constraint has-atom-types greater-than 1
  • class-def defined molecular-compound
  • subclass-of molecule and compound

68
Cofactor Knowledge
  • Gather knowledge about cofactors, coenzymes and
    prosthetic groups from glossaries and
    dictionaries etc.
  • Note that definitions are inconsistent and even
    contradictory.
  • Synthesise knowledge and make judgements.

69
Encoding Cofactor
  • Class-def defined cofactor
  • Subclass-of metal-ion or small-organic-molecule
  • Slot-constraint binds-to has-value protein
  • Class-def defined coenzyme
  • Subclass-of cofactor
  • Slot-constraint binds-loosley-to has-value
    protein
  • Class-def defined prosthetic-group
  • Subclass-of cofactor and (not metal-ion)
  • Slot-constraint binds-strongly-to has-value
    protein

70
Cofactor Discussion
  • Classifies as a kind of chemical
  • Taken from IUPAC definition document not a
    child of organic-molecule and metal-ion
  • Can express both disjunction and negation in OIL
  • Uses a slot hierarchy in describing binds-to.

71
More Discussion
  • Can we define sufficiency conditions for peptide?
  • Mass and length are not easy to use in definition
    A protein is gt 100 Kda
  • What about a 99 Kda protein

72
Publish the Ontology
  • Formal and informal specifications
  • Intended domain of application
  • Design rationale
  • Limitations
  • See EcoCyc paper in ISMB-93/Bioinformatics 00
  • See TAMBIS paper in Bioinformatics 99

73
Ontological Pitfalls
  • Stop-over when do I stop over elaborating?
  • Proteins ? amino acid residues ? side chains ?
    physical chemical properties .
  • Relevance
  • Do we need to mention all the types of nucleic
    acid?

74
Ontology-Development Tools
75
Ontology DevelopmentTools
  • Development environments
  • Ontology Libraries
  • Ontology publishing and exchange
  • Across all representational forms (logic, frame,
    etc..)
  • Web compliant
  • Ontology delivery
  • Ontology servers

76
Development Environments
  • Considerations depend on ontology subtype!
  • Expressiveness of data model
  • Authoring features
  • DBMS export capabilities
  • Ontology-exchange language export capabilities
  • Distributed authoring
  • Size limitations
  • WebOnto
  • Ontosaurus
  • GKB Editor
  • Protégé II
  • Ontolingua
  • GRAIL toolkit etc
  • Wondertools

77
GKB EditorOntology Development Toolkit
  • Graphical editor for KBs and ontologies
  • Ontologies stored in Ocelot object-oriented
    knowledge base
  • Expressive, scalable, distributed
  • EcoCyc ontology contains 1K classes, 15K
    instances
  • Knowledge is graphically portrayed in 3 viewers
  • All operations are schema driven
  • See http//www.ai.sri.com/gkb/user-man.html

78
Ocelot Capabilities
  • Frame data model
  • KBs and ontologies stored in files or Oracle
  • Oracle KBs and ontologies
  • Better scalability -- frame faulting on demand
    and in background
  • Concurrency control system coordinates changes by
    multiple users
  • Transaction logging (recall operation history)
  • GFP API provides programmatic interface

79
Distributed Ontology Development
User 1
User 2
Internet
Oracle Server
User 4
User 3
80
Frame Data Model
  • Classes
  • Genes, Reactions
  • Slots
  • chromosome, map-position, citations
  • reactants, products, Keq
  • Facets
  • chromosome is single-valued, instance of class
    Chromosomes
  • citations is multiple valued, set of strings

81
GKB Editor
  • Taxonomy Viewer
  • Create/delete classes and instances
  • Browse class taxonomy
  • Alter class/subclass links
  • Frame editor
  • Add/remove slots to/from classes
  • Create/delete/edit slot values for instances
  • Frame relationships viewer
  • View and update a network of relationships among
    instances

82
GKB Editor Operations
  • Operations Add, remove, replace, rename
  • Objects Classes, instances, slots, values,
    facets, annotations
  • Editing of sets, multisets, lists
  • Modification of class hierarchy and class
    definitions
  • Extensive customization of shape, color, font

83
Summary
  • A definition of ontology as a characterisation of
    conceptualisation -- capturing the things we know
    about a domain
  • The knowledge within an ontology can be applied
    to a variety of tasks
  • Building an ontology -- process and life-cycle
  • Influences on the choice of encoding language
  • The desirability of tools for the building,
    management and exchange of ontologies

84
Final remarks
  • The use of ontologies is growing within the
    bio-molecular world
  • They are a high-cost, but high-benefit solution
    to a variety of problems confronting the
    bioinformatics community.

85
Some References (1)
  • Review
  • Stevens R., Goble C.A. and Bechhofer, S.
    Ontology-based Knowledge Representation for
    Bioinformatics accepted for Briefings in
    Bioinformatics
  • Bio-ontologies Systems
  • Karp P. D. An ontology for biological function
    based on molecularinteractions Bioinformatics
    200016 269-285
  • Ashburner et al Gene Ontology Tool for the
    Unification of Biology, Nature Genetics Vol 25
    pages 25-29
  • R. Altman, M. Bada, X.J. Chai, M. Whirl Carillo
    R.O. Chen, and N.F. Abernethy. RiboWeb An
    Ontology-Based System for Collaborative Molecular
    Biology. IEEE Intelligent Systems, 14(5)68-76,
    1999.
  • P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton,
    R. Stevens, and A Brass. An Ontology for
    Bioinformatics Applications. Bioinformatics,
    15(6)510-520, 1999.
  • R.O. Chen, R. Felciano, and R.B. Altman.
    RiboWeb Linking Structural Computations to a
    Knowledge Base of Published Experimental Data.
    In Proceedings of the 5th International
    Conference on Intelligent Systems for Molecular
    Biology, pages 84-87. AAAI Press, 1997.
  • Guarino, N. 1992. Concepts, Attributes and
    Arbitrary Relations Some Linguistic and
    Ontological Criteria for Structuring Knowledge
    Bases. Data Knowledge Engineering, 8 249-261.
  • Guarino, N., Carrara, M., and Giaretta, P. 1994a.
    An Ontology of Meta-Level Categories. In J.
    Doyle, E. Sandewall and P. Torasso (eds.),
    Principles of Knowledge Representation and
    Reasoning Proceedings of the Fourth
    International Conference (KR94). Morgan Kaufmann,
    San Mateo, CA 270-280.
  • P. Karp and S. Paley Integrated Access to
    Metabolic and Genomic Data Journal of
    Computational Biology, 3(1)191--212, 1996.
  • P. Karp, M. Riley, S. Paley, A. Pellegrini-Toole,
    and M. Krummenacker. EcoCyc Electronic
    Encyclopedia of phE. coli Genes and Metabolism.
    Nucleic Acids Research, 27(1)55-58, 1999.
  • S. Schulze-Kremer. Ontologies for Molecular
    Biology. In Proceedings of the Third Pacific
    Symposium on Biocomputing, pages 693-704. AAAI
    Press, 1998.
  • P.G. Baker, A. Brass, S. Bechhofer, C. Goble, N.
    Paton, and R. Stevens. TAMBIS Transparent Access
    to Multiple Bioinformatics Information Sources.
    An Overview. In Proceedings of the Sixth
    International Conference on Intelligent Systems
    for Molecular Biology, pages 25--34. AAAI Press,
    June 28-July 1, 1998 1998.

86
Some References (2)
  • Ontology development and exchange
  • T.R. Gruber. Towards Principles for the Design of
    Ontologies Used for Knowledge Sharing. In Roberto
    Poli Nicola Guarino, editor, International
    Workshop on Formal Ontology, Padova, Italy, 1993.
    Available as technical report KSL-93-04,
    Knowledge Systems Laboratory, Stanford
    Universityftp.ksl.ftanford.edu/pub/KSL_Reports/KS
    L-983-04.ps.

87
More References (3)
  • I. Horrocks, D. Fensel, J. Broekstra, M. Crubezy,
    S. Decker, M. Erdmann, W. Grosso, C. Goble, F.
    Van Harmelen, M. Klein, M. Musen, S. Staab, and
    R. Studer. The ontology interchange language oil
    The grease between ontologies. http//www.cs.vu.nl
    / dieter/oil.
  • R. Jasper and M. Uschold A Framework for
    Understanding and Classifying Ontology
    Applications. In Twelfth Workshop on Knowledge
    Acquisition Modeling and Management KAW'99, 1999.
  • M. Uschold and M. Gruninger. Ontologies
    Principles, Methods and Applications. Knowledge
    Engineering Review, 11(2), June
  • Guarino, N. and Welty, C. Identity, Unity, and
    Individuality Towards a Formal Toolkit for
    Ontological Analysis, in H.\ Werner (Ed),
    Proceedings of ECAI-2000 The European Conference
    on Artificial Intelligence , IOS Press, Amsterdam
    August, 2000 219--223
Write a Comment
User Comments (0)
About PowerShow.com