Integration of Information Extraction with an Ontology - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Integration of Information Extraction with an Ontology

Description:

Marmot, Crystal, Badger ... Tokenise Badger output, find corresponding CN ... Use Badger (lexicon) and Crystal (concept) output to automatically update ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 15
Provided by: cs6
Category:

less

Transcript and Presenter's Notes

Title: Integration of Information Extraction with an Ontology


1
Integration of Information Extraction with an
Ontology
  • KMi
  • Knowledge Media Institute
  • M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta
    and S. Buckingham Sum

2
Introduction
  • Ontology -gt Information Extractor
  • English text (NLP)
  • Group of tools their IE system
  • KMi Ontology
  • From UMass
  • Marmot
  • Crystal
  • Badger
  • OCML preprocessor

3
Presentation Layout
  • Background on tool origins and area of work
  • Description of tool integration
  • Coping with ambiguity
  • Description of output
  • Population of Ontology
  • Future Work

4
UMassUniversity of Massachutes Amherst
  • Marmot, Crystal, Badger
  • Classifies text by recognizing extraction
    patterns and semantic features associated to
    slots in predefined frames.

5
Testing Area KMi Planet
  • Web-based new server
  • Story Library
  • Collections of news stories and postings
  • Ontology Library
  • Ontologies stored for use in extracting
    information from the story library.
  • Uses OCML
  • myPlanet
  • myPlanet uses cue-phrases defined as research
    areas to query KMi planet through the ontology
    library and the information extraction tools
    were about to talk about

6
The Ontology Library
  • 40 different types of events or activities that
    can be described by the ontology library.
  • Event type 3 demonstration-of-technology
  • technology-being-demostrated (technology) (Info
    Extraction)
  • has-duration (duration) (30 min)
  • start-time (time-point) (330pm)
  • end-time (time-point) (4pm)
  • has-location (a place) (room 120 TMCB BYU campus)
  • other agents-involved (list of person(s)) (Dr.
    Embley)
  • main-agent (list of person(s)) (Brian Goodrich)
  • location-at-start (a place) (room 120 TMCB BYU
    campus)
  • location-at-end (a place) (room 120 TMCB BYU
    campus)
  • medium-used (equipment) (mutli-media projector,
    ppt)
  • subject-of-the-demo (title) (Integration of
    Information Extraction with an Ontology)

7
Marmot
  • Natural Language Processor
  • Noun, Verb, and Prepositional Phrases
  • John Domingue Wed, 15 Oct 1997.
  • David Brown, University for Industry visits the
    OU.
  • ltexgt 2 1
  • SUBJ(1) DAVID BROWN COMMA UNIVERSITY
  • PP (2) FOR INDUSTRY
  • VB (3) VISITS
  • OBJ1(4) THE OU
  • PUNC(5) PERIOD
  • lt/exgt
  • ltexgt 1 1
  • SUBJ(1) JOHN DOMINGUE
  • ADVP(2) _at_WED_COMMA_15_OCT_1997_at_
  • PUNC(3) PERIOD
  • lt/exgt

8
Crystal
  • Dictionary Induction Tool
  • Using keyword to annotate text with semantic
    tags.
  • Visitor (ltVIgt David Brown ltVIgt)
  • Place (ltPLgt the OU ltPLgt)
  • Specific-to-general driven data search
  • Relaxes constraints on initial definitions until
    it finds the most specific
    definition that covers all
    instances of the word in the text.
  • Retains results for future use
  • Tested on over 300 stories, 100 precision
    and recall

9
Badger
  • (fairly certain whoever wrote this section did
    not speak English as first language)

Matches sentences from text against concept nodes
passed from Crystal. Select the best match by
max number of features matching the concept
node. Can remove irrelevant sentences from
problem set.
  • gt
  • http//rockape.qgl.org/crap/badger.swf

10
Coping with Ambiguity
  • Query list of institutions
  • Return list of institutions no match
  • Query list of projects
  • Return list of project - match
  • No discussion of whether this was automatically
    done by the extractor or manually by the users.

11
OCML Code Translator (Operational Conceptual
Modeling Language)
  • Tokenise Badger output, find corresponding CN
    definitions and extract all the objects found in
    the story

12
Ontology Maintenance
  • Use Badger (lexicon) and Crystal (concept) output
    to automatically update Ontology library whenever
    a new story is added to the Story library
  • Some cannot be automatically updated
  • There is not enough information in the story
  • No current template to match with the sentence
    concepts.

13
Conclusion
  • IE system created using Marmot, Crystal, Badger
    and the OCML translator.
  • Obtained good results in KMi stories.

Assessment
  • Sporadic periods of quality technical writing,
    interspersed with nearly impenetrable English
  • A borrowing of tools, translated to OCML and
    ported for KMi

14
Future Work
  • Deriving the type of an object when it does not
    match a predefined template.
  • Automatic creation of new classes and subclasses.
  • Using this IE tool in other domains (need new
    training data?)
  • Trying out a new Machine Learning algorithm in
    Crystal and comparing performance.
  • Using the IE tool hypertext.
  • Saving Badgers output in XML
  • Creating a more visual gui for the ontologies.
Write a Comment
User Comments (0)
About PowerShow.com