A Categorical Model for Discovering Latent Structure in Social Annotations - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

A Categorical Model for Discovering Latent Structure in Social Annotations

Description:

CAT2: humor, funny, fun, comics, geek. CAT3: games, game, fun, flash ... Games and Puzzles. Math solutions to the puzzle. Math documents ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 26
Provided by: SAID9
Category:

less

Transcript and Presenter's Notes

Title: A Categorical Model for Discovering Latent Structure in Social Annotations


1
A Categorical Model for Discovering Latent
Structure in Social Annotations
  • Said Kashoob James Caverlee Ying
    Ding
  • Texas AM University
    Indiana University

ICWSM 2009 May 19, 2009
2
Tags
Users
Content
  • Object webpage, document, image, video
  • Tag\annotation keyword attached to object

3
2 million articles
 3 billion images
150 million URLs
  • Existing research on social annotations
  • Enhanced information access
  • Tag-based browsing (Bao 2007), search (Li 2007,
    Heymann 2008), and clustering (Brooks 2006,
    Ramage 2009), ...
  • Analysis and modeling
  • Golder 2005, Halpin 2007, Cattuto 2006, Li 2008,
    Veres 2006
  • Tagging and incentives
  • Sen 2006, Marlow 2006, ...
  • Plus lots more ...

4
Discovering Latent Structure in Social Annotations
  • Given a collection of web objects, users, and
    tags, can we model the underlying tag generation
    process?
  • Discover implicit communities of interest?
  • And categories of related tags?
  • For a given category, identify the most relevant
    objects per category
  • For a given category, identify the most important
    tags
  • Potentially useful for
  • Enhancing tag-based search and browsing
  • Automatically deriving tag ontologies
  • Recommending related users
  • ...

5
Objects
Categories
Communities
6
Initial ThoughtsContent-Based Topic Modeling
  • Latent Dirichlet Allocation (Blei 2003)
  • Latent Semantic Analysis (Deerwester 1990,
    Hofmann 1999)
  • Provides an elegant probabilistic interpretation

Topic 1 Topic 2 .. Topic
K
7
Initial ThoughtsContent-Based Topic Modeling
  • Recent work applying LDA-like models to tags
    (e.g., Wu 2006, Zhou 2008)


Topic 1 Topic 2 ..
Topic K
8
Modeling Social Annotations
  • Intuition The process that generates content is
    fundamentally different from the annotation
    process
  • A document is the product of 10s/100s/1000s of
    distributed authors who may not even be aware
    of each other
  • Annotators may vary widely in
  • Especially important for non-textual objects like
    images, videos, etc. where combining content and
    tags is not suitable

9
Our Solution Community-Based Categorical
Annotation Model
community 1 community 2
community L
Topic 1 Topic 2 Topic K
Category 1 Category 2 . Category K
Category 1 Category 2 . Category K
Category 1 Category 2 .. Category K
10
Community based Categorical annotation model (CCA)
  • We view communities as groups forming around
  • Interests , expertise, language, interpretation
  • Each community has a number of categories as its
    world view
  • For each object a community draws tags from the
    appropriate underlying categories

e.g. Dinosaur image Scientist cretaceous ,
theropod Elementary school student meat-eater
, t-rex French speaker carnivore, lezard-tyran
Scientist community may have categories Astronom
y, Biology, Paleontology, Scientist
community may draw tags from the categories
Biology and Paleontology
11
The Annotation Generation Process
  • For each object to be tagged
  • Each community that decides to tag it
  • Selects an appropriate category
  • Then draws a tag from that category

12
The Resulting Annotations
  • Tags comics humor phd
  • fun comic education funny
  • webcomic science research
  • humour geek academic
  • academia

phdcomics.com
Object annotations are generated by
communities Each community selects tags from its
category set Each category is a distribution over
tags
COM 1
COM 2
COM 2
COM 1
CAT1 art, design, paper, drawing, fun
CAT1 science math kids reference
CAT2 humor, funny, fun, comics, geek
CAT2 howto productivity lifehack tips
CAT3 games, game, fun, flash
CAT3 web20 tools wiki collabortion
13
Recovering Communities and Categories
Obj 1
Obj 2
Obj N
..
t1 t2 . tn
t1 t2 . tn
t1 t2 . tn
Communityin object
Gibbs Sampler
tag in category
Category per Community inobject
COM 2
COM 1
CAT1 art, design, paper, drawing, fun
CAT1 science math kids reference
CAT2 humor, funny, fun, comics, geek
CAT2 howto productivity lifehack tips
CAT3 games, game, fun, flash
CAT3 web20 tools wiki collabortion
14
What is the output of the model?
  • Now we have distributions of communities and
    categories
  • Can do inference to find the most likely tags per
    category, per community
  • Can find relationships between objects,
    communities
  • ...

15
Experimental Setup
  • Datasets
  • Delicious 27,572 Web documents, 16,216 unique
    tags, 10,677,508 total tags
  • Flickr 90,000 images, 44,980 unique tags,
    788,435 total tags
  • Goals
  • Discover meaningful communities and categories
  • Explore the relationship between content-based
    topics and tag-based categories
  • Show process generating content is different from
    process generating tags
  • Show objects with similar content do not
    necessarily have similar tags and vice versa
  • Illustrate 2D Information discovery in topic and
    category space

16
Discovering Categories
  • Categories of interest in
  • Flickr
    Delicious

17
Discovering Communities
Discovered Communities and their categories
18
Content-Based Topics vs. Tag-Based Categories
  • Learn categories tags
  • Learn topics content
  • Measure similarity between (category,topic) pairs
    using Jensen-Shannon

phdcomics.com tags Comics, humor, phd, fun,
comic, education, funny, webcomic, science,
research, humour, geek, academic, academia
phdcomic.com content Piled Higher  DeeperLife
(or the lack thereof) in AcademiaA comic strip
by Jorge Cham read the latest comic
stripsperuse through the online archive
  • Notice (1- similarity measure) lt 0.4, no clear
    overlap between categories and topics

19
Exploring Content vs. Annotation
  • For pairs of objects that are similar in the
    category space, How topically similar are they?
  • For pairs of objects that are similar in the
    topic space, How Categorically similar are they?
  • Jensen-Shannon distance lt 0.1 (similar) gt 0.9
    (dissimilar)
  • Objects with similar content do not necessarily
    have similar tags and vice versa

20
Example Application Information Access via Topic
and Category
Topic Space
Topic Space
similar
dissimilar
similar
similar
  • Category Space

similar
dissimilar
dissimilar
dissimilar
21
Example Application Information Access via Topic
and Category
22
Contributions and Future Work
  • Contributions
  • Introduced the CCA model for discovering
    community-based categories'' of interest that
    group semantically-related tags
  • Found that tag-based categories are not the same
    as content-based topics
  • Validates the need to separately model annotation
    process from the content-generation process
  • Introduced a browsing method utilizing categories
    and topics
  • Future Work
  • Consider finer-grained hierarchical models of the
    social annotation process
  • Extend the integrated browsing model
  • Extend experimental validation to other social
    tagging communities

23
  • Thank You!

24
Similarity Measure
  • Jensen-shannon distance between two distributions
    p and q over space X
  • Kullback-leibler divergence

25
CCA model
D objects N tags L communities K categories
Gibbs Sampler Update Equation
Write a Comment
User Comments (0)
About PowerShow.com