Toward Semantic Web Information Extraction - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Toward Semantic Web Information Extraction

Description:

Evaluated over 3 human-annotated corpora of news articles: ... Business News, International Political News, and UK Political News (~500 articles) ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 20
Provided by: osm7C
Category:

less

Transcript and Presenter's Notes

Title: Toward Semantic Web Information Extraction


1
Toward Semantic Web Information Extraction
  • B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D.
    Ognyanoff, M. Goranov
  • Presenter Yihong Ding

2
Toward a Semantic Web
  • Fully automatic methods for the semantic
    annotation are needed
  • Related topics
  • Information retrieval (IR)
  • Information extraction (IE)
  • Name-entity recognition (NER)
  • Annotation processes

3
Semantic Annotation Diagram
4
Name Entities
  • Named Entities (NE)
  • people, organizations, locations, and others
    referred by name.
  • May also include scalars and expressions
  • numbers, amounts of money, dates, etc. (NUMEX,
    TIMEX)
  • Hypothesis
  • Named entities (and the relations between them)
    mentioned in a resource constitute an important
    part of its semantics

5
Semantic Annotation of NEs
  • Semantic Annotation of the NEs in a text
    includes
  • Recognition of the type of the entities in the
    text
  • Identification of the entity individual
  • Comparison
  • the traditional NER approach results in
  • ltPersongtYihong Dinglt/Persongt
  • the Semantic Annotation of NEs should result in
    something like the following
  • ltBYUPerson IDhttp//..byu../YihongDinggtYiho
    ng Dinglt/BYUPersongt

6
The KIM Platform
  • The Knowledge and Information Management Platform
    provides
  • Automatic Semantic Annotation of NEs (and
    relations between them)
  • Ontology Population with NE individuals and
    relations
  • Indexing and Retrieval w.r.t NEs
  • Query and Navigation over the Formal Knowledge

7
KIM Constituents
  • KIM Ontology (KIMO)
  • KIM World KB
  • KIM Server with API for remote access and
    integration
  • Front-ends KIM Web UI, Plug-in for Internet
    Explorer, and KB Explorer

8
KIM Bases
  • KIM is based on the following open-source
    platforms
  • GATE NLP and IE platform in University of
    Sheffield
  • Sesame RDF(S) repository Administrator b.v.
  • Ontology Middleware and Custom Inference by
    Ontotext as extensions of Sesame
  • Lucene open source IR-engine from Apache

9
KIM Architecture
10
KIM Ontology (KIMO)
  • Light-weight upper-level ontology
  • 250 NE classes
  • 100 relations and attributes
  • covers mostly NE classes, and ignores general
    concepts
  • includes classes representing lexical resources
  • www.ontotext.com/KIM/kimo.rdfs

11
KIM World KB
  • A projection of the world (domain ontology)
  • Quasi-exhaustive coverage of the most popular
    entities in the world
  • Entities of general importance like the ones
    that appear in the news
  • At present KIM KB consists of about 200,000
    entities
  • 50,000 locations, 130,000 organizations, 6000
    people, etc.

12
Entity Description
  • NEs are represented in KIM World KB with their
    Semantic Descriptions consisting of
  • Aliases (Florida FL)
  • Relations with other entities (Person hasPosition
    Position)
  • Attributes (latitude longitude of geographic
    entities)
  • Proper class of the NE

13
KIM Server
  • APIs for
  • Semantic Annotation
  • Document Persistence
  • Indexing Retrieval of documents w.r.t NEs
  • Semantic Repository Access Exploration

14
KIM Semantic Information Extraction
  • Based on GATE
  • NLP IE platform
  • Rules now based on ontology classes instead of a
    flat set of NE types
  • Recognition and Identification of the NEs
  • IE supported by a Semantic Repository
  • Containing lexical and gazetteer resources
  • Annotations referring to Entity Descriptions
  • Ontology Population with the newly recognized
    entities relations

15
KIM IE Pipeline
16
KIM Plug-in
17
KIM IE Performance
  • Evaluated over 3 human-annotated corpora of news
    articles
  • International Business News, International
    Political News, and UK Political News (500
    articles)
  • Precision 86, Recall 84 w.r.t the standard NE
    types
  • But these metrics are not representative for
    semantic annotation

18
Semantic Annotation Metrics
  • There are no established metrics for semantic
    annotation
  • No human-annotated corpora with precise class and
    instance information
  • No metrics for various partial matches
  • When a more specific class is recognized
  • When a more general class is recognized
  • When the class is correctly recognized, but the
    individual entity is not correctly identified.

19
Conclusion
  • It is possible to adopt traditional IE techniques
    for semantic annotation
  • It is worth using almost-exhaustive entity
    knowledge for IE
  • KIM is still under development
  • Proper evaluation metrics
  • Precise disambiguation
  • More advanced IE techniques
  • KIM ontology and KB development
Write a Comment
User Comments (0)
About PowerShow.com