Title: Learning the Semantic Meaning of a Concept from the Web
1Learning the Semantic Meaning of a Concept from
the Web
- Yang Yu
- Masters Thesis Defense
- August 03, 2006
2The Problem
- Manually preparing training data for text
classification based ontology mapping is
expensive.
3The Thesis
- Automatically collecting training data for the
concept defined in an ontology. - Benefits
- Reduce the amount of human work
- Fully automated ontology mapping
4Overview
- Background
- The semantic Web and ontology
- Ontology Mapping
- Proposal
- System
- Experimental Results
- WEAPONS ontology
- LIVING_THINGS ontology
- Discussions and Conclusion
5Semantic Web and Ontology
- What is it?
- an extension of the current web
- An Example
6Ontology Mapping
- Interoperability problem
- Independently developed ontologies for the same
or overlapped domain - Mapping
- r f (Ci, Cj) where i1, , n and j1, , m
- r ? equivalent, subClassOf, superClassOf,
complement, overlapped, other
7Approaches to Ontology Mapping
- Manual mapping
- String Matching
- Text classification
- the semantic meaning of a concept is reflected in
the training data that use the concept - Probabilistic feature model
- Classification
- Results highly depend on training data
8Motivation
- Preparing exemplars manually is costly
- Billions of documents available on the web
- Search engines
9The Proposal
- Using the concept defined in an ontology as a
query and processing the search results to obtain
exemplars - Verification
- Build a prototype system
- Check ontology mapping results
10System overview Part I
Search Engine
11The parser (Query expansion)
FOODFRUITAPPLE
12The retriever
13The processor
14Naïve Bayes text classifier
- Bow toolkit
- McCallum, Andrew Kachites, Bow A toolkit for
statistical language modeling, text retrieval,
classification and clustering, http//www.cs.cmu.e
du/mccallum/bow 1996. - rainbow -d model --index dir/
- rainbow d model query
- Bayes Rule
- Naïve Bayes text classifier
15Bayes Rule
16Naïve Bayes classifier
- A text classification problem
- Whats the most probable classification of the
new instance given the training data? - vj category j.
- (a1, a2, , an) attributes of a new document
- So Naïve
- (Mitchell Tom, Machine Learning, McGraw Hill)
1997
17System overview Part II
18The model builder
- Mutually exclusive and exhaustive
- Leaf classes
- C and C-
19The calculator
- Naïve Bayes text classifier tends to give extreme
values (1/0) - Tasks
- Feed exemplars to the classifier one by one
- Keep records of classification results
- Take averages and generate report
20An Example of the Calculator
TANK-VEHICLE
APC
AIR-DEFENSE-GUN
Classifier
200
SAUDI-NAVAL- MISSILE-CRAFT
P(TANK-VEHICLE APC) 170 /200 0.85
P(AIR-DEFENSE-GUN APC) 0.10 P(SAUDI-NAVAL-MI
SSILE-CRAFT APC) 0.05
21Experiments with WEAPONS ontology
- Information Interpretation and Integration
Conference (http//www.atl.lmco.com/projects/ontol
ogy/i3con.html) - WeaponsA.n3 and WeaponsB.n3
- Both over 80 classes defined
- More than 60 classes are leaf classes
- Similar structure
22WeaponsA.n3
Part of WeaponsA.n3
WEAPON
CONVENTIONAL-
WEAPON
ARMORED- COMBAT-VEHICLE
MODERN- NAVAL-SHIP
WARPLANE
SUPER-ETENDARD
PATROL-CRAFT
AIRCRAFT-CARRIER
TANK-VEHICLE
-
23WeaponsB.n3
Part of WeaponsB.n3
24Expected Results
Part of WeaponsB.n3
SUPER- ETENDARD
AIRCRAFT-CARRIER
PATROL-CRAFT
TANK-VEHICLE
FIGHTER-PLANE
LIGHT-AIRCRAFT-CARRIER
PATROL- WARTER-CRAFT
APC
FIGHTER-ATTACK-PLANE
LIGHT-TANK
SUPER-ETENDARD-FIGHTER
PATROL- BOAT- RIVER
PATROL- BOAT
25A Typical Report
P(APC Ci) where i 1 63
APC
SELF-PROPELLED-ARTILLERY 0.357180681
TANK-VEHICLE 0.277139274
ICBM 0.10423636
MRBM 0.080615147
TOWED-ARTILLERY 0.054724102
SUPPORT-VESSEL 0.023265054
PATROL-CRAFT 0.019570325
MOLOTOV-COCKTAIL 0.015032411
TORPEDO-CRAFT 0.013677696
SUPER-ETENDARD 0.009856519
MORTAR 0.00772997
AIR-DEFENSE-GUN 0.002997109
......
MACHINE-GUN 0.000211772
MOLOTOV-COCKTAIL 0.000187578
TRUCK-BOMB 0.000171675
AS-9-KYLE-ALCM 0.000156403
ARABIL-100-MISSILE 0.000111953
AL-HIJARAH-MISSILE 7.65E-05
OGHAB-MISSILE 7.12E-05
BADAR-2000 4.28E-05
26classes with highest conditional probability
27different numbers of exemplars (whole)
28different numbers of exemplars (sentence)
29Comparison of mapping accuracy of different
groups of experiments
Higher Conditional Probability
30Experiment with LIVING_THINGS ontology
- P(MAN HUMAN)
- P (WOMAN HUMAN)
- Find a mapping for GIRL
31Actual Experiment Results L-1
Results of experiment (1)
32Actual Experiment Results L-2
With clustering on exemplars
Without clustering on exemplars
with additional classes
33Actual Experiment Results L-3
Comparison between different numbers of exemplars
(sentence)
34Actual Experiment Results Different Queries
Queries augmented with class properties
35Actual Experiment Results L-4
Results of experiment (1) with new queries
Results of experiment (2) with new queries
36Limitation 1 An exemplar is not a sample of a
concept
- An exemplar is a combination of strings that
represent some usage of a concept. - An exemplar is not an instance of a concept.
- The way we calculate conditional probability is
an estimation.
37Limitation 2 Popularity does not equal relevancy
- Limited by a search engines algorithm
- PageRank
- Popularity does not equal relevancy
- Weight cannot be specified for words in a search
query
38Limitation 3 Relevancy does not equal to
similarity
Search Results for concept A
Text related to concept A
Text against concept A
Text for concept A i.e. desired exemplars
Text for related concept B
39Related Research
- UMBC OntoMapper
- Sushama Prasad, Peng Yun and Finin Tim, A Tool
for Mapping between Two Ontologies Using Explicit
Information, AAMAS 2002 Workshop on Ontologies
and Agent Systems, 2002. - CAIMEN
- Lacher S. Martin and Groh Georg ,Facilitating the
Exchange of Explicit Knowledge through Ontology
Mappings, Proc of the Fourteenth International
FLAIRS conference, 2001. - GLUE
- Doan Anhai, Madhavan Jayant, Dhamankar Robin,
Domingos Pedro, and Halevy Alon, Learning to
Match Ontologies on the Semantic Web, WWW2002,
May, 2002. - Google Conditional Probability
- P(HUMAN MAN) 1.77 billion / 2.29 billion
0.77 - P(HUMAN WOMAN) 0.6 billion / 2.29 billion
0.26 - Wyatt D., Philipose M., and Choudhury T.,
Unsupervised Activity Recognition Using
Automatically Mined Common Sense. Proceedings of
AAAI-05. pp. 21-27.
40Conclusion and Future Work
- Text retrieved from the web can be used as
exemplars for text classification based ontology
mapping - Many parameters affect the quality of the
exemplars - There are noise contained in the processed
documents - Future work
- Clustering
41Questions