Named Entity Disambiguation on an Ontology Enriched by Wikipedia

About This Presentation

Title:

Named Entity Disambiguation on an Ontology Enriched by Wikipedia

Description:

Named Entity Disambiguation on an Ontology Enriched by Wikipedia ... Utilizing ontological concepts, and properties of instances in a specific KB, to ... – PowerPoint PPT presentation

Number of Views:230

Avg rating:3.0/5.0

Slides: 21

Provided by: HIEN3

Category:

more less

Transcript and Presenter's Notes

Title: Named Entity Disambiguation on an Ontology Enriched by Wikipedia

1
Named Entity Disambiguation on an Ontology
Enriched by Wikipedia
International IEEE Conference - RIVF08

Hien Thanh Nguyen1, Tru Hoang Cao2
1Ton Duc Thang University, Vietnam
2Ho Chi Minh City University of Technology,
Vietnam

2
Outline

Introduction
Background
Approach
Evaluation
Conclusion

3
Introduction

No explicit semantic information about data and
objects are presented in most of the Web pages.
Semantic Web aim at solving this problem by
making semantic metadata available in web page
content
Ex the entity John McCarthy pointing to the
homepage of the inventor of Lisp programming
Entity disambiguation

4
Introduction- Entity disambiguation

Entity disambiguation is the process of
identifying when different references correspond
to the same real world entity (Jorge Cardoso and
Amit Sheth)
Our work aim at detecting named entities in a
text and linking them to a given ontology

5
Introduction - What are Named Entities?

Named Entities (NE) are considered people,
organizations, locations, date, time, money,
measures, percentage, etc.
Example

Ms. Washington's candidacy is being championed
by several powerful lawmakers including her boss,
Chairman John Dingell (D., Mich.) of the House
Energy and Commerce Committee.
6
Introduction Basic problem in NE

Many NEs share the same name
Ambiguity of NE types John Smith (company vs.
person)
May (person vs. month)
Washington (person vs. location)
etc.
Ambiguity of referent (e.g. Paris may be the
capital of French, or a small town in Texas)

7
Introduction - Our contribution are two-fold

Utilizing ontological concepts, and properties of
instances in a specific KB, to automatically
generate a corpus of labeled training data
Exploiting Wikipedia to enrich the training data
with new and informative features.
Exploring a range of features extracted from
texts, a KB, and Wikipedia

8
Background - Ontology

Ontology schema defines taxonomy of classes and
properties (relations and attributes)
Knowledge base contains semantic descriptions,
including attributes and relations, of named
entities in real world

9
Background - Wikipedia

Each article defines an entity or a concept
Four sources of information
Title
Redirect titles
Categories
Hyperlinks
Outlinks vs. Inlinks

10
Background - Wikipedia
11
Approach

Expoiting terms (i.e. base noun phrases) and
named entities coocurring with ambiguous name for
disambiguation
Casting the problem as ranking problem
Using TFIDF to calculate similarity and choose
the candidate with the highest score

12
Approach

Constructing corpus
Utilizing classes and properties to generate a
snippet for each instance in an ontology
Feature generation for enriching representation
of those instances
Analyzing a text for disambiguation and
identification of NEs occurring therein

13
Approach - Construct corpus
14
Approach- Construct corpus
15
Approach Disambiguation process

For each ambiguous name
Looking up candidates
Extracting base noun phrases in the same sentence
an in the headline
Extracting named entities in the whole text
Using TFIDF to rank and choose the candidate with
the highest score

16
Approach An example
17
Evaluation

Using KIM Ontology
140 texts of news articles in some news agencies
Focusing on four names John McCarthy, John
Wiliams, Georgia, and Columbia
Measure accuracy as the total number of correctly
assignment NEs (in text)/ontology instances
divided by the total number of assignment

18
Evaluation
19
Conclusion

Our approach is quite natural and similar to the
way humans do, relying on co-occurring NEs and
terms to resolve other ambiguous entities in a
given context.
Currently Wikipedia editions are available for
approximately 200 languages, so our method can be
used to build NE disambiguation systems for a
large number of languages
The features from Wikipedia, and NEs in the whole
text are meaningful evidence for disambiguation
In the future detecting NEs out of the ontology,
and investigating other similarity metrics

20
Thanks for your attention !

Write a Comment

User Comments (0)