Ontologies at Your Service - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Ontologies at Your Service

Description:

translation. cheese. K se. fromage. tournee. lobster. bench ... Used at ISI for machine translation, text summarization, database access. ISI's DINO ontology ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 20
Provided by: Eduar58
Category:

less

Transcript and Presenter's Notes

Title: Ontologies at Your Service


1
Ontologies at Your Service
  • Yigal Arens
  • Eduard Hovy
  • USC/ISI

2
DGRC
  • Purpose Make Digital Government Happen!
  • Advance information systems research
  • Bring the benefits of cutting edge IT research to
    government systems
  • Help educate government and the community

3
The problem and the solution
  • Problem FedStats brings together thousands of
    databases from over seventy Federal agencies
  • data is duplicated and near-duplicated,
  • even government personnel have trouble finding
    and interpreting one anothers data!

Research challenge Provide access to multiple
databases, for both sophisticated and casual
users, in an easy-to-use and easy-to-understand
manner, without distorting the data
  • Solution Create a framework that can provide
    easy, fast, and/or standardized access
  • need method of standardizing many databases,
  • need multi-database access engine,
  • need powerful user interface.

4
Why use an ontology?
Ontology taxonomized set of terms with
definitions and axioms, used by humans,
databases, and systems.
  • Cognitive Reasons
  • Investigate human knowledge organization.
  • Build platform for human processes reminding,
    generalization, learning.
  • System Building Reasons
  • Standardize terminology avoid inconsistency.
  • Assist knowledge transfer link data across
    domains.
  • Facilitate interoperability let systems work in
    new domains.

5
SENSUS two uses
lobster
Buch
tournee
Klavier
livre
Käse
Plug in domain models and databases multi-DB
access
?????
cheese
bench
fromage
Link to words of different languages translation
6
ISIs DINO ontology
http//edc.isi.edu8011/dino
  • Taxonomy, multiple superclass links
  • Approx. 90,000 concepts
  • Top level Penman Upper Model (ISI)
  • Body WordNet 1.6 (Princeton), rearranged
  • New information being added by text mining
  • Used at ISI for machine translation, text
    summarization, database access

7
Projects Described Today
  • Energy Data Collection (EDC)
  • Access to distributed statistical data

8
1. EDC Project
9
EDC Access to gasoline data
  • Government partners
  • Energy Information Administration (EIA)
  • Bureau of Labor Statistics (BLS)
  • Census Bureau
  • (also data from California Energy Commission)
  • Central problems attacked
  • Proliferation of terminology
  • Difficulty requesting and interpreting data
  • Need to integrate data from autonomous sources
  • Current databases and models
  • SENSUS ontology 90,000 nodes (from ISIs NLP
    technology)
  • Domain model 500 nodes (manual for database
    access planner)
  • LKB 6000 nodes (NL term/info extraction from
    glossaries)
  • Databases 58,000 series (EIA OGIRS and others)
  • Webpages 60 (BLS, CEC tables)

10
The idea behind SIMS
Sources
  • There are many types of data sources databases,
    pdf files, text files, html files...
  • The user doesnt want to know this!
  • Solution

1. Wrap each source in software that handles
access to its data 2. Record the types of info
in each source in a source model 3. Arrange all
source models together in the same spacethe
Domain Model 4. Use a data access planner (SIMS)
to transform a users request for data into a set
of individual access queries that extracts the
right data from the appropriate sources
Models
11
A super domain model the ontology
?
?
?
?
?
?
?
?
?
?
?
http//edc.isi.edu8011/dino
12
Extracting metadata from text
  • Challenge Extend the ontology to cover domain
    models. Try doing this automatically, by
    extracting useful terms from text associated with
    data
  • Problems
  • Proliferation of terms in domain
  • Agencies define terms differently
  • Many refer to the same or related entity
  • Lengthy term definitions often bury important
    information
  • Example input
  • Motor Gasoline Blending Components Naphthas
    (e.g., straight-run gasoline, alkylate,
    reformate, benzene, toluene, xylene) used for
    blending or compounding into finished motor
    gasoline. These components include reformulated
    gasoline blendstock for oxygenate blending (RBOB)
    but exclude oxygenates (alcohols, ethers),
    butane, and pentanes plus. Note Oxygenates are
    reported as individual components and are
    included in the total for other hydrocarbons,
    hydrogens, and oxygenates.

Judith Klavans, Dir of CRIA, Columbia Deniz
Saros, grad student, Columbia
13
Lexical Knowledge Base (LKB) Tool
  • Combines statistical and linguistic methods
  • identifies topics with high accuracy
  • provides complete coverage
  • useful for any subject area
  • produced over 6,000 concepts in current domain

14
2. A Biology Polyclave
15
Polyclave
  • Challenge at NSF Biodiversity Infrastructure
    workshop, field biologists asked for hand-held
    polyclave
  • need to identify plant species in the field,
    tramping through Colorado
  • workers sometimes not fully expert in tens of
    thousands of varieties
  • existing polyclave built by someone, but
    proprietary not hand-held
  • Experiment we used ISI knowledge rep (ontology)
    technology to build a polyclave and populated it
    with information from UC Davissee
    http//vigor.isi.edu8888/

16
(No Transcript)
17
(No Transcript)
18
Current ontology research directions
  • Applications
  • DGRC modeling and linking health data from NCHS
    to SENSUS
  • Automated QA using SENSUS information in
    Webclopedia, ISIs QA system (like AskJeeves, for
    real)
  • Ontology construction research
  • Investigating methods for automated ontology
    construction, using statistical clustering
    methods
  • Investigating methods for automated ontology
    content acquisition, by extracting information
    from online text

19
Thank you!Any questions?
Write a Comment
User Comments (0)
About PowerShow.com