Learning the Semantic Meaning of a Concept from the Web presentation

About This Presentation

Transcript and Presenter's Notes

Title: Learning the Semantic Meaning of a Concept from the Web

1
Learning the Semantic Meaning of a Concept from
the Web

Yang Yu
Masters Thesis Defense
August 03, 2006

2
The Problem

Manually preparing training data for text
classification based ontology mapping is
expensive.

3
The Thesis

Automatically collecting training data for the
concept defined in an ontology.
Benefits
Reduce the amount of human work
Fully automated ontology mapping

4
Overview

Background
The semantic Web and ontology
Ontology Mapping
Proposal
System
Experimental Results
WEAPONS ontology
LIVING_THINGS ontology
Discussions and Conclusion

5
Semantic Web and Ontology

What is it?
an extension of the current web
An Example

6
Ontology Mapping

Interoperability problem
Independently developed ontologies for the same
or overlapped domain
Mapping
r f (Ci, Cj) where i1, , n and j1, , m
r ? equivalent, subClassOf, superClassOf,
complement, overlapped, other

7
Approaches to Ontology Mapping

Manual mapping
String Matching
Text classification
the semantic meaning of a concept is reflected in
the training data that use the concept
Probabilistic feature model
Classification
Results highly depend on training data

8
Motivation

Preparing exemplars manually is costly
Billions of documents available on the web
Search engines

9
The Proposal

Using the concept defined in an ontology as a
query and processing the search results to obtain
exemplars
Verification
Build a prototype system
Check ontology mapping results

10
System overview Part I
Search Engine
11
The parser (Query expansion)
FOODFRUITAPPLE
12
The retriever
13
The processor
14
Naïve Bayes text classifier

Bow toolkit
McCallum, Andrew Kachites, Bow A toolkit for
statistical language modeling, text retrieval,
classification and clustering, http//www.cs.cmu.e
du/mccallum/bow 1996.
rainbow -d model --index dir/
rainbow d model query
Bayes Rule
Naïve Bayes text classifier

15
Bayes Rule

P (A B)

16
Naïve Bayes classifier

A text classification problem
Whats the most probable classification of the
new instance given the training data?
vj category j.
(a1, a2, , an) attributes of a new document
So Naïve
(Mitchell Tom, Machine Learning, McGraw Hill)
1997

17
System overview Part II
18
The model builder

Mutually exclusive and exhaustive
Leaf classes
C and C-

19
The calculator

Naïve Bayes text classifier tends to give extreme
values (1/0)
Tasks
Feed exemplars to the classifier one by one
Keep records of classification results
Take averages and generate report

20
An Example of the Calculator
TANK-VEHICLE
APC
AIR-DEFENSE-GUN
Classifier
200
SAUDI-NAVAL- MISSILE-CRAFT
P(TANK-VEHICLE APC) 170 /200 0.85
P(AIR-DEFENSE-GUN APC) 0.10 P(SAUDI-NAVAL-MI
SSILE-CRAFT APC) 0.05
21
Experiments with WEAPONS ontology

Information Interpretation and Integration
Conference (http//www.atl.lmco.com/projects/ontol
ogy/i3con.html)
WeaponsA.n3 and WeaponsB.n3
Both over 80 classes defined
More than 60 classes are leaf classes
Similar structure

22
WeaponsA.n3
Part of WeaponsA.n3
WEAPON
CONVENTIONAL-
WEAPON
ARMORED- COMBAT-VEHICLE
MODERN- NAVAL-SHIP
WARPLANE
SUPER-ETENDARD
PATROL-CRAFT
AIRCRAFT-CARRIER
TANK-VEHICLE
-
23
WeaponsB.n3
Part of WeaponsB.n3
24
Expected Results
Part of WeaponsB.n3
SUPER- ETENDARD
AIRCRAFT-CARRIER
PATROL-CRAFT
TANK-VEHICLE
FIGHTER-PLANE
LIGHT-AIRCRAFT-CARRIER
PATROL- WARTER-CRAFT
APC
FIGHTER-ATTACK-PLANE
LIGHT-TANK
SUPER-ETENDARD-FIGHTER
PATROL- BOAT- RIVER
PATROL- BOAT
25
A Typical Report
P(APC Ci) where i 1 63
APC
SELF-PROPELLED-ARTILLERY 0.357180681
TANK-VEHICLE 0.277139274
ICBM 0.10423636
MRBM 0.080615147
TOWED-ARTILLERY 0.054724102
SUPPORT-VESSEL 0.023265054
PATROL-CRAFT 0.019570325
MOLOTOV-COCKTAIL 0.015032411
TORPEDO-CRAFT 0.013677696
SUPER-ETENDARD 0.009856519
MORTAR 0.00772997
AIR-DEFENSE-GUN 0.002997109
......
MACHINE-GUN 0.000211772
MOLOTOV-COCKTAIL 0.000187578
TRUCK-BOMB 0.000171675
AS-9-KYLE-ALCM 0.000156403
ARABIL-100-MISSILE 0.000111953
AL-HIJARAH-MISSILE 7.65E-05
OGHAB-MISSILE 7.12E-05
BADAR-2000 4.28E-05
26
classes with highest conditional probability
27
different numbers of exemplars (whole)
28
different numbers of exemplars (sentence)
29
Comparison of mapping accuracy of different
groups of experiments
Higher Conditional Probability
30
Experiment with LIVING_THINGS ontology

P(MAN HUMAN)
P (WOMAN HUMAN)
Find a mapping for GIRL

31
Actual Experiment Results L-1
Results of experiment (1)
32
Actual Experiment Results L-2
With clustering on exemplars
Without clustering on exemplars
with additional classes
33
Actual Experiment Results L-3
Comparison between different numbers of exemplars
(sentence)
34
Actual Experiment Results Different Queries
Queries augmented with class properties
35
Actual Experiment Results L-4
Results of experiment (1) with new queries
Results of experiment (2) with new queries
36
Limitation 1 An exemplar is not a sample of a
concept

An exemplar is a combination of strings that
represent some usage of a concept.
An exemplar is not an instance of a concept.
The way we calculate conditional probability is
an estimation.

37
Limitation 2 Popularity does not equal relevancy

Limited by a search engines algorithm
PageRank
Popularity does not equal relevancy
Weight cannot be specified for words in a search
query

38
Limitation 3 Relevancy does not equal to
similarity
Search Results for concept A
Text related to concept A
Text against concept A
Text for concept A i.e. desired exemplars
Text for related concept B
39
Related Research

UMBC OntoMapper
Sushama Prasad, Peng Yun and Finin Tim, A Tool
for Mapping between Two Ontologies Using Explicit
Information, AAMAS 2002 Workshop on Ontologies
and Agent Systems, 2002.
CAIMEN
Lacher S. Martin and Groh Georg ,Facilitating the
Exchange of Explicit Knowledge through Ontology
Mappings, Proc of the Fourteenth International
FLAIRS conference, 2001.
GLUE
Doan Anhai, Madhavan Jayant, Dhamankar Robin,
Domingos Pedro, and Halevy Alon, Learning to
Match Ontologies on the Semantic Web, WWW2002,
May, 2002.
Google Conditional Probability
P(HUMAN MAN) 1.77 billion / 2.29 billion
0.77
P(HUMAN WOMAN) 0.6 billion / 2.29 billion
0.26
Wyatt D., Philipose M., and Choudhury T.,
Unsupervised Activity Recognition Using
Automatically Mined Common Sense. Proceedings of
AAAI-05. pp. 21-27.

40
Conclusion and Future Work

Text retrieved from the web can be used as
exemplars for text classification based ontology
mapping
Many parameters affect the quality of the
exemplars
There are noise contained in the processed
documents
Future work
Clustering

41
Questions

Write a Comment

User Comments (0)

About PowerShow.com

Learning the Semantic Meaning of a Concept from the Web PowerPoint PPT Presentation