Characterizing Semantic Relatedness of Search Query Terms - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Characterizing Semantic Relatedness of Search Query Terms

Description:

Dominik Benz, Beate Krause, Praveen Kumar, Andreas Hotho, Gerd Stumme ... Semantically annotated content is the 'fuel' of the next generation Semantic Web ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 30
Provided by: velblodVid
Category:

less

Transcript and Presenter's Notes

Title: Characterizing Semantic Relatedness of Search Query Terms


1
Characterizing Semantic Relatednessof Search
Query Terms
Dominik Benz, Beate Krause, Praveen Kumar,
Andreas Hotho, Gerd Stumme Research Unit
Knowledge and Data Engineering (KDE), University
of Kassel, Germany
2
Where do Semantics come from?
  • Semantically annotated content is the fuel of
    the next generation Semantic Web but where is
    the petrol station?

Derive Semantics from how users interact with
information!
Implicit interaction (Search engine clicklogs)
Explicit interaction (Collaborative Tagging)
3
Agenda
  • Logsonomy Introduction
  • Dataset
  • Query Term Similarity Measures
  • Semantic Grounding
  • Summary and Outlook

4
Explicit annotation Folksonomies
  • Folksonomies allow users to assign tags to
    resources.

5
Implicit annotation Search Engine clicklogs
(Logsonomies)
  • By clicking on results, users assign query
    terms to resources.

6
Structural analogies between explicit implicit
annotation
Implicit
Explicit
Allow

users
Allow
users
to query
terms
to assign
tags
and click the
results
to
resources
Logsonomy
Folksonomy
  • Formal model for both F (U, T, R, Y) where
  • U, T, and R are finite sets, whose elements are
    called users, tags and resources,
  • Y µ U T R, called set of tag assignments,
  • Can also be seen as ternary relation, tripartite
    hypergraph

7
Put together the pieces
  • Analysis of similarity measures between
    Folksonomy Tags
  • able to extract synonyms, hypernyms
  • Cattuto 2008

Apply Similarity Measures to Logsonomy Graph!
  • Similar Network properties of Folksonomies and
    Logsonomies
  • small world, clustering coefficient, cumulative
    strength, ..
  • Krause 2008

8
Agenda
  • Logsonomy Introduction
  • Dataset
  • Query Term Similarity Measures
  • Semantic Grounding
  • Summary and Outlook

9
Logsonomy Dataset
  • AOL clicklog (March 2006)
  • Users search engine user IDs
  • Tags retrieved by splitting queries (using
    whitespace)
  • Resources clicked URLs
  • Excerpt 10,000 most often used query terms
  • U 463,380 T 10,000 R
    1,284,724
  • Y 26,227,550
  • Tag rank position in most-popular list
  • 1 free
  • 2 county
  • 3 pictures
  • 4 school

10
Agenda
  • Logsonomy Introduction
  • Dataset
  • Query Term Similarity Measures
  • Semantic Grounding
  • Summary and Outlook

11
Similarity Measures Co-occurrence Tag Context
  • Take Co-occurrence frequency as similarity
    measure (coocc)
  • Describe each tag as a context vector
  • each dimension of the vector space corresponds to
    another tag (TagContext)
  • compute similar tags by cosine similarity


JAVA
design
software
blog
web
programming
12
Similarity Measures User Resource Context
  • Two further possible context dimensions
  • Users (UserContext)
  • Resources (ResourceContext)
  • (TF-IDF weighting showed no great effect)


JAVA
John
Mary
Joe
Karl
Lucy
JAVA

lwa.de
java.sun.com
javadev.de
google.com
hacking.com
13
Similarity Measures FolkRank
  • Take Co-occurrence frequency as similarity
    measure (freq).
  • Cosine Similarity between tag vectors
  • Use FolkRank to find related tags (folkrank).
  • Basic Idea PageRank-like spreading of weights
    through folksonomy / logsonomy structure high
    weights for a particular tag in the random surfer
    vector

Web graph
Logsonomy / Folksonomy graph
Andreas Hotho and Robert Jäschke and Christoph
Schmitz and Gerd Stumme. Information Retrieval in
Folksonomies Search and Ranking. Proceedings of
the 3rd European Semantic Web Conference,
(4011)411-426, Springer,Budva, Montenegro,2006.
14
Example Most related terms for guitar and
brain
BRAIN GUITAR
15
Qualitative Insights Average Rank of related tags
Folksonomy
Logsonomy
16
First insights
  • Co-occurrence seems to have similar bias to
    high-frequency tags, i.e., possibly to
    hyperonyms.
  • Tag Context (and partially ResourceContext) seems
    also to yield more synomyms and siblings
  • FolkRank noisy
  • User Context mixed picture
  • ? Now grounding of these observations in
    WordNet.

17
Agenda
  • Logsonomy Introduction
  • Dataset
  • Query Term Similarity Measures
  • Semantic Grounding
  • Summary and Outlook

18
Semantic Grounding in WordNet
  • WordNet is a large lexical database for English.
  • Words with same meaning are grouped in synsets,
    which are ordered by an is-a hierarchy.
  • Introduction of single artificial root node
    enables application of graph-based similarity
    metrics between pairs of nuns / pairs of verbs.
  • Inclusion of top n del.icio.us tags in WordNet

19
Example of Semantic Grounding
Wordnet Synset Hierarchy
  • Original tag
  • java
  • Most similar tag
  • cooc, folkrankprogramming
  • TagContextpython

computers
programming
map
languages
design_patterns
Grounded similarity
java
python
20
Shortest path between original tag and most
closely related one
Jiang-Conrath distance
Shortest path
Shown to be the semantically most adequate
measure for similarity within WordNet
Budanitsky, Hirst, 2006.
21
Distribution of the lengts of shortest paths in
WordNet
Folksonomy Logsonomy
22
Shortes path composition (length 1 and 2)
Folksonomy
Logsonomy
siblings
23
Agenda
  • Logsonomy Introduction
  • Dataset
  • Query Term Similarity Measures
  • Semantic Grounding
  • Summary and Outlook

24
Summary
Similar network properties of folksonomies /
logsonomies
Application of term similarity measures to
logsonomies
Semantic Grounding of measures in WordNet
Comparison of measure characteristics with
folksonomies
Conclusions
25
Summary Outlook
  • Formalization into Logsonomies retains semantics
    inherent in log data
  • Similarity measures from folksonomy analysis are
    also able to extract synonyms / hyperonyms, but
    partially different behaviour
  • Tag Context almost identical
  • Resource Context less precise
  • Cooccurrence influenced by logsonomy
    construction (restoring compounds, ..)
  • Now possibly even more precise semantics by
    integrating Folksonomies / Logsonomies?

26
Similar tags live on www.bibsonomy.org
Thanks for your attention! contact
benz_at_cs.uni-kassel.de
27
Appendix Music Genre Taxonomy learned from
last.fm
Music Genre Taxonomy learned from last.fm
28
Level displacement in WordNet
level displacement to most related tag
29
Qualitative insights Overlap of 10 most related
tags
Logsonomy/Folksonomy
Write a Comment
User Comments (0)
About PowerShow.com