Named Entity Disambiguation on an Ontology Enriched by Wikipedia - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Named Entity Disambiguation on an Ontology Enriched by Wikipedia

Description:

Named Entity Disambiguation on an Ontology Enriched by Wikipedia ... Utilizing ontological concepts, and properties of instances in a specific KB, to ... – PowerPoint PPT presentation

Number of Views:230
Avg rating:3.0/5.0
Slides: 21
Provided by: HIEN3
Category:

less

Transcript and Presenter's Notes

Title: Named Entity Disambiguation on an Ontology Enriched by Wikipedia


1
Named Entity Disambiguation on an Ontology
Enriched by Wikipedia
International IEEE Conference - RIVF08
  • Hien Thanh Nguyen1, Tru Hoang Cao2
  • 1Ton Duc Thang University, Vietnam
  • 2Ho Chi Minh City University of Technology,
    Vietnam

2
Outline
  • Introduction
  • Background
  • Approach
  • Evaluation
  • Conclusion

3
Introduction
  • No explicit semantic information about data and
    objects are presented in most of the Web pages.
  • Semantic Web aim at solving this problem by
    making semantic metadata available in web page
    content
  • Ex the entity John McCarthy pointing to the
    homepage of the inventor of Lisp programming
  • Entity disambiguation

4
Introduction- Entity disambiguation
  • Entity disambiguation is the process of
    identifying when different references correspond
    to the same real world entity (Jorge Cardoso and
    Amit Sheth)
  • Our work aim at detecting named entities in a
    text and linking them to a given ontology

5
Introduction - What are Named Entities?
  • Named Entities (NE) are considered people,
    organizations, locations, date, time, money,
    measures, percentage, etc.
  • Example

Ms. Washington's candidacy is being championed
by several powerful lawmakers including her boss,
Chairman John Dingell (D., Mich.) of the House
Energy and Commerce Committee.
6
Introduction Basic problem in NE
  • Many NEs share the same name
  • Ambiguity of NE types John Smith (company vs.
    person)
  • May (person vs. month)
  • Washington (person vs. location)
  • etc.
  • Ambiguity of referent (e.g. Paris may be the
    capital of French, or a small town in Texas)

7
Introduction - Our contribution are two-fold
  • Utilizing ontological concepts, and properties of
    instances in a specific KB, to automatically
    generate a corpus of labeled training data
  • Exploiting Wikipedia to enrich the training data
    with new and informative features.
  • Exploring a range of features extracted from
    texts, a KB, and Wikipedia

8
Background - Ontology
  • Ontology schema defines taxonomy of classes and
    properties (relations and attributes)
  • Knowledge base contains semantic descriptions,
    including attributes and relations, of named
    entities in real world

9
Background - Wikipedia
  • Each article defines an entity or a concept
  • Four sources of information
  • Title
  • Redirect titles
  • Categories
  • Hyperlinks
  • Outlinks vs. Inlinks

10
Background - Wikipedia
11
Approach
  • Expoiting terms (i.e. base noun phrases) and
    named entities coocurring with ambiguous name for
    disambiguation
  • Casting the problem as ranking problem
  • Using TFIDF to calculate similarity and choose
    the candidate with the highest score

12
Approach
  • Constructing corpus
  • Utilizing classes and properties to generate a
    snippet for each instance in an ontology
  • Feature generation for enriching representation
    of those instances
  • Analyzing a text for disambiguation and
    identification of NEs occurring therein

13
Approach - Construct corpus
14
Approach- Construct corpus
15
Approach Disambiguation process
  • For each ambiguous name
  • Looking up candidates
  • Extracting base noun phrases in the same sentence
    an in the headline
  • Extracting named entities in the whole text
  • Using TFIDF to rank and choose the candidate with
    the highest score

16
Approach An example
17
Evaluation
  • Using KIM Ontology
  • 140 texts of news articles in some news agencies
  • Focusing on four names John McCarthy, John
    Wiliams, Georgia, and Columbia
  • Measure accuracy as the total number of correctly
    assignment NEs (in text)/ontology instances
    divided by the total number of assignment

18
Evaluation
19
Conclusion
  • Our approach is quite natural and similar to the
    way humans do, relying on co-occurring NEs and
    terms to resolve other ambiguous entities in a
    given context.
  • Currently Wikipedia editions are available for
    approximately 200 languages, so our method can be
    used to build NE disambiguation systems for a
    large number of languages
  • The features from Wikipedia, and NEs in the whole
    text are meaningful evidence for disambiguation
  • In the future detecting NEs out of the ontology,
    and investigating other similarity metrics

20
Thanks for your attention !
Write a Comment
User Comments (0)
About PowerShow.com