Localized Alternative Cluster Ensembles for Collaborative Structuring - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Localized Alternative Cluster Ensembles for Collaborative Structuring

Description:

funk. Michael Wurst, Katharina Morik, Ingo Mierswa ... funk. free jazz. a. b. c. hip hop. d. e. f. g ... funk ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 30
Provided by: vasarelyW
Category:

less

Transcript and Presenter's Notes

Title: Localized Alternative Cluster Ensembles for Collaborative Structuring


1
Localized Alternative Cluster Ensemblesfor
Collaborative Structuring
Michael Wurst, Katharina Morik, Ingo
MierswaArtificial Intelligence Unit,University
of Dortmund
2
Data Organization in the Web 2.0
  • Organizing large personaldata collections
    requires semantic annotation
  • Users annotate items with arbitrary tags

3
Data Organization in the Web 2.0
  • No common ontology is required (folksonomies)
  • Different, parallel views
  • Even one user often hasdifferent views on
    thesame set of items!

4
Emergent Tag Structures
pop
jazz
alternative
hip hop
funk
free jazz
home
work
office
plane
5
Emergent Tag Structures
pop
jazz
alternative
hip hop
funk
free jazz
pop
jazz
alternative
hip hop
funk
free jazz
pop
jazz
alternative
hip hop
funk
free jazz
home
work
office
plane
6
Emergent Tag Structures
Tag Frequency
pop
jazz
alternative
hip hop
funk
free jazz
pop
jazz
alternative
hip hop
Tag Rank
funk
free jazz
pop
jazz
alternative
hip hop
funk
free jazz
home
work
office
plane
7
Collaborative Structuring
  • Tags tend to be chaotic
  • Users are not supported in creating and
    maintaining large, complex tag structures

8
Collaborative Structuring
  • Tags tend to be chaotic
  • Users are not supported in creating and
    maintaining large, complex tag structures
  • Goals
  • Tag unstructured items automatically
  • Deliver alternative views on a set of items
  • Exploit existing tag structures of other users
  • ? Collaborative Structuring

9
Current Clustering Methods not Sufficient
  • Subspace clustering, e.g. Agrawal etal, 1998
    Non-redundant clustering Gondek Hofmann,
    2004
  • Do not exploit existing structures created by
    others
  • Semi-supervised clustering, e.g Cohn etal, 2003
    Do not produce several alternatives
  • Ensemble clustering, e.g. Strehl Ghosh,
    2002Do not work on partial input structures

10
Clustering for Collaborative Structuring
  • Should exploit existing structures created by
    others
  • Should produce several alternatives
  • Should consider locality of input structures

11
Localized Alternative Cluster Ensembles
Results ?i are composed of existing
clusterings ?ij on subsets S1,Sm of the set of
items to tag.
?1
. . .?n
  • LACE Learning Task
  • Given a set of objects S,
  • input clusterings I ? ?i Si ? Gi,
  • quality function q(I, O, S)
  • LACE delivers a set of
  • result clusterings O ? ?i Si ? Gi
  • such that q(I, O, S) is maximized
  • each ?i in O covers at least S.

alternative
jazz
pop
hip hop
a
d
f
e
g
funk
free jazz
?12
c
b
?11
12
Localized Alternative Cluster Ensembles
?1
. . .?n
  • Quality for a single result clustering
  • Sum of similarities of items in S to a best
    matching item in an input clustering
  • The number of input clusterings used tocover the
    items in S should be minimized
  • Use each input clustering only once

alternative
jazz
hip hop
pop
a
d
f
e
g
free jazz
funk
?12
b
c
?11
Overall quality
13
LACE Algorithm
alternative
jazz
a
y
free jazz
funk
c
b
a
c
x
?11
f
b
d
e
g
pop
hip hop
d
f
?12
14
LACE Algorithm
Best matching cluster node isselected by
f-measure.
alternative
jazz
a
y
free jazz
funk
c
b
a
c
x
?11
f
b
d
e
g
pop
hip hop
d
f
?12
15
LACE Algorithm
Items that are sufficiently similar to items in
the best matching clustering are deleted from the
query set.
alternative
jazz
a
y
free jazz
funk
f
b
c
d
x
?11
e
g
alternative
jazz
a
pop
hip hop
d
f
free jazz
funk
b
c
?12
?11
16
LACE Algorithm
A new query is posed containing the remaining
items. Only tags not used yet are considered.
alternative
jazz
a
y
free jazz
funk
f
b
c
d
x
?11
e
g
alternative
jazz
a
pop
hip hop
d
f
free jazz
funk
b
c
?12
?11
17
LACE Algorithm
The process continues until all items are
covered, no additional match is possible or a
maximal number of rounds is reached.
alternative
jazz
a
y
free jazz
funk
?1
b
c
x
?11
e
g
alternative
jazz
hip hop
pop
a
d
pop
f
hip hop
d
f
free jazz
funk
?12
b
c
?12
?11
18
LACE Algorithm
Remaining items are added byclassification.
alternative
jazz
a
y
free jazz
funk
?1
b
c
x
?11
alternative
jazz
hip hop
pop
a
d
pop
f
hip hop
e
g
d
f
free jazz
funk
?12
b
c
?12
?11
19
LACE Algorithm
Process starts anew until no more matches are
possible or the maximal number of results is
reached.
alternative
jazz
a
y
free jazz
funk
b
c
x
?11
pop
hip hop
pop
jazz
alternative
hip hop
d
f
Free jazz
funk
?12
?1
20
LACE Algorithm
Process starts anew until no more matches are
possible or the maximal number of results is
reached.
alternative
jazz
a
y
free jazz
funk
b
c
x
?11
pop
hip hop
home
pop
jazz
alternative
work
hip hop
d
f
Free jazz
funk
office
plane
?12
?1
?2
?k
?3
21
LACE Algorithm
Process starts anew until no more matches are
possible or the maximal number of results is
reached.
alternative
jazz
a
y
free jazz
funk
b
c
P2p Network
x
?11
pop
hip hop
home
pop
jazz
alternative
work
hip hop
d
f
Free jazz
funk
office
plane
?12
?1
?2
?k
?3
22
Evaluation Leave One Clustering Out
  • Given a evaluation set of clusterings
  • Iteratively delete each clustering from this set
    and cluster its items using the remaining
    clusterings

home
work
pop
jazz
alternative
jazz
pop
independent
hip hop
hip hop
office
plane
funk
free jazz
23
Evaluation Leave One Clustering Out
  • Given a evaluation set of clusterings
  • Iteratively delete each clustering from this set
    and cluster its items using the remaining
    clusterings

home
work
jazz
pop
independent
hip hop
office
plane
pop
jazz
alternative
hip hop
funk
free jazz
24
Evaluation Leave One Clustering Out
  • Given a evaluation set of clusterings
  • Iteratively delete each clustering from this set
    and cluster its items using the remaining
    clusterings

home
work
jazz
pop
independent
hip hop
office
plane
pop
jazz
alternative
hip hop
pop
independent
home
work
funk
free jazz
25
Evaluation Leave One Clustering Out
  • Given a evaluation set of clusterings
  • Iteratively delete each clustering from this set
    and cluster its items using the remaining
    clusterings

home
work
jazz
pop
independent
hip hop
office
plane
?
pop
jazz
alternative
hip hop
pop
independent
home
work
funk
free jazz
26
Evaluation
  • Nemoz Networked Media Organizer
  • 39 user created tag structures on 1830 audio
    files
  • Comparison of clusterings
  • F-Measure (best match)
  • correlation of tree distances
  • abs. distance of tree distances
  • Best result of 5 returned clusterings is used

27
Evaluation
28
Conclusion
  • Importance of different views
  • Exploit the work of others
  • Collaborative Structuring
  • Localized Alternative Cluster Ensembles
  • Emerging tag structures Accuracy Diversity

29
Conclusion
  • Importance of different views
  • Exploit the work of others
  • Collaborative Structuring
  • Localized Alternative Cluster Ensembles
  • Emerging tag structures Accuracy Diversity

Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com