Learning the Semantic Meaning of a Concept from the Web PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Learning the Semantic Meaning of a Concept from the Web


1
Learning the Semantic Meaning of a Concept from
the Web
  • Yang Yu
  • Masters Thesis Defense
  • August 03, 2006

2
The Problem
  • Manually preparing training data for text
    classification based ontology mapping is
    expensive.

3
The Thesis
  • Automatically collecting training data for the
    concept defined in an ontology.
  • Benefits
  • Reduce the amount of human work
  • Fully automated ontology mapping

4
Overview
  • Background
  • The semantic Web and ontology
  • Ontology Mapping
  • Proposal
  • System
  • Experimental Results
  • WEAPONS ontology
  • LIVING_THINGS ontology
  • Discussions and Conclusion

5
Semantic Web and Ontology
  • What is it?
  • an extension of the current web
  • An Example

6
Ontology Mapping
  • Interoperability problem
  • Independently developed ontologies for the same
    or overlapped domain
  • Mapping
  • r f (Ci, Cj) where i1, , n and j1, , m
  • r ? equivalent, subClassOf, superClassOf,
    complement, overlapped, other

7
Approaches to Ontology Mapping
  • Manual mapping
  • String Matching
  • Text classification
  • the semantic meaning of a concept is reflected in
    the training data that use the concept
  • Probabilistic feature model
  • Classification
  • Results highly depend on training data

8
Motivation
  • Preparing exemplars manually is costly
  • Billions of documents available on the web
  • Search engines

9
The Proposal
  • Using the concept defined in an ontology as a
    query and processing the search results to obtain
    exemplars
  • Verification
  • Build a prototype system
  • Check ontology mapping results

10
System overview Part I
Search Engine
11
The parser (Query expansion)
FOODFRUITAPPLE
12
The retriever
13
The processor
14
Naïve Bayes text classifier
  • Bow toolkit
  • McCallum, Andrew Kachites, Bow A toolkit for
    statistical language modeling, text retrieval,
    classification and clustering, http//www.cs.cmu.e
    du/mccallum/bow 1996.
  • rainbow -d model --index dir/
  • rainbow d model query
  • Bayes Rule
  • Naïve Bayes text classifier

15
Bayes Rule
  • P (A B)

16
Naïve Bayes classifier
  • A text classification problem
  • Whats the most probable classification of the
    new instance given the training data?
  • vj category j.
  • (a1, a2, , an) attributes of a new document
  • So Naïve
  • (Mitchell Tom, Machine Learning, McGraw Hill)
    1997

17
System overview Part II
18
The model builder
  • Mutually exclusive and exhaustive
  • Leaf classes
  • C and C-

19
The calculator
  • Naïve Bayes text classifier tends to give extreme
    values (1/0)
  • Tasks
  • Feed exemplars to the classifier one by one
  • Keep records of classification results
  • Take averages and generate report

20
An Example of the Calculator
TANK-VEHICLE
APC
AIR-DEFENSE-GUN
Classifier
200
SAUDI-NAVAL- MISSILE-CRAFT
P(TANK-VEHICLE APC) 170 /200 0.85
P(AIR-DEFENSE-GUN APC) 0.10 P(SAUDI-NAVAL-MI
SSILE-CRAFT APC) 0.05
21
Experiments with WEAPONS ontology
  • Information Interpretation and Integration
    Conference (http//www.atl.lmco.com/projects/ontol
    ogy/i3con.html)
  • WeaponsA.n3 and WeaponsB.n3
  • Both over 80 classes defined
  • More than 60 classes are leaf classes
  • Similar structure

22
WeaponsA.n3
Part of WeaponsA.n3
WEAPON
CONVENTIONAL-
WEAPON
ARMORED- COMBAT-VEHICLE
MODERN- NAVAL-SHIP
WARPLANE
SUPER-ETENDARD
PATROL-CRAFT
AIRCRAFT-CARRIER
TANK-VEHICLE
-
23
WeaponsB.n3
Part of WeaponsB.n3
24
Expected Results
Part of WeaponsB.n3
SUPER- ETENDARD
AIRCRAFT-CARRIER
PATROL-CRAFT
TANK-VEHICLE
FIGHTER-PLANE
LIGHT-AIRCRAFT-CARRIER
PATROL- WARTER-CRAFT
APC
FIGHTER-ATTACK-PLANE
LIGHT-TANK
SUPER-ETENDARD-FIGHTER
PATROL- BOAT- RIVER
PATROL- BOAT
25
A Typical Report
P(APC Ci) where i 1 63
APC
SELF-PROPELLED-ARTILLERY 0.357180681
TANK-VEHICLE 0.277139274
ICBM 0.10423636
MRBM 0.080615147
TOWED-ARTILLERY 0.054724102
SUPPORT-VESSEL 0.023265054
PATROL-CRAFT 0.019570325
MOLOTOV-COCKTAIL 0.015032411
TORPEDO-CRAFT 0.013677696
SUPER-ETENDARD 0.009856519
MORTAR 0.00772997
AIR-DEFENSE-GUN 0.002997109
......
MACHINE-GUN 0.000211772
MOLOTOV-COCKTAIL 0.000187578
TRUCK-BOMB 0.000171675
AS-9-KYLE-ALCM 0.000156403
ARABIL-100-MISSILE 0.000111953
AL-HIJARAH-MISSILE 7.65E-05
OGHAB-MISSILE 7.12E-05
BADAR-2000 4.28E-05
26
classes with highest conditional probability
27
different numbers of exemplars (whole)
28
different numbers of exemplars (sentence)
29
Comparison of mapping accuracy of different
groups of experiments
Higher Conditional Probability
30
Experiment with LIVING_THINGS ontology
  • P(MAN HUMAN)
  • P (WOMAN HUMAN)
  • Find a mapping for GIRL

31
Actual Experiment Results L-1
Results of experiment (1)
32
Actual Experiment Results L-2
With clustering on exemplars
Without clustering on exemplars
with additional classes
33
Actual Experiment Results L-3
Comparison between different numbers of exemplars
(sentence)
34
Actual Experiment Results Different Queries
Queries augmented with class properties
35
Actual Experiment Results L-4
Results of experiment (1) with new queries
Results of experiment (2) with new queries
36
Limitation 1 An exemplar is not a sample of a
concept
  • An exemplar is a combination of strings that
    represent some usage of a concept.
  • An exemplar is not an instance of a concept.
  • The way we calculate conditional probability is
    an estimation.

37
Limitation 2 Popularity does not equal relevancy
  • Limited by a search engines algorithm
  • PageRank
  • Popularity does not equal relevancy
  • Weight cannot be specified for words in a search
    query

38
Limitation 3 Relevancy does not equal to
similarity
Search Results for concept A
Text related to concept A
Text against concept A
Text for concept A i.e. desired exemplars
Text for related concept B
39
Related Research
  • UMBC OntoMapper
  • Sushama Prasad, Peng Yun and Finin Tim, A Tool
    for Mapping between Two Ontologies Using Explicit
    Information, AAMAS 2002 Workshop on Ontologies
    and Agent Systems, 2002.
  • CAIMEN
  • Lacher S. Martin and Groh Georg ,Facilitating the
    Exchange of Explicit Knowledge through Ontology
    Mappings, Proc of the Fourteenth International
    FLAIRS conference, 2001.
  • GLUE
  • Doan Anhai, Madhavan Jayant, Dhamankar Robin,
    Domingos Pedro, and Halevy Alon, Learning to
    Match Ontologies on the Semantic Web, WWW2002,
    May, 2002.
  • Google Conditional Probability
  • P(HUMAN MAN) 1.77 billion / 2.29 billion
    0.77
  • P(HUMAN WOMAN) 0.6 billion / 2.29 billion
    0.26
  • Wyatt D., Philipose M., and Choudhury T.,
    Unsupervised Activity Recognition Using
    Automatically Mined Common Sense. Proceedings of
    AAAI-05. pp. 21-27.

40
Conclusion and Future Work
  • Text retrieved from the web can be used as
    exemplars for text classification based ontology
    mapping
  • Many parameters affect the quality of the
    exemplars
  • There are noise contained in the processed
    documents
  • Future work
  • Clustering

41
Questions
Write a Comment
User Comments (0)
About PowerShow.com