Title: Localized Alternative Cluster Ensembles for Collaborative Structuring
1Localized Alternative Cluster Ensemblesfor
Collaborative Structuring
Michael Wurst, Katharina Morik, Ingo
MierswaArtificial Intelligence Unit,University
of Dortmund
2Data Organization in the Web 2.0
- Organizing large personaldata collections
requires semantic annotation - Users annotate items with arbitrary tags
3Data Organization in the Web 2.0
- No common ontology is required (folksonomies)
- Different, parallel views
- Even one user often hasdifferent views on
thesame set of items!
4Emergent Tag Structures
pop
jazz
alternative
hip hop
funk
free jazz
home
work
office
plane
5Emergent Tag Structures
pop
jazz
alternative
hip hop
funk
free jazz
pop
jazz
alternative
hip hop
funk
free jazz
pop
jazz
alternative
hip hop
funk
free jazz
home
work
office
plane
6Emergent Tag Structures
Tag Frequency
pop
jazz
alternative
hip hop
funk
free jazz
pop
jazz
alternative
hip hop
Tag Rank
funk
free jazz
pop
jazz
alternative
hip hop
funk
free jazz
home
work
office
plane
7Collaborative Structuring
- Tags tend to be chaotic
- Users are not supported in creating and
maintaining large, complex tag structures
8Collaborative Structuring
- Tags tend to be chaotic
- Users are not supported in creating and
maintaining large, complex tag structures
- Goals
- Tag unstructured items automatically
- Deliver alternative views on a set of items
- Exploit existing tag structures of other users
- ? Collaborative Structuring
9Current Clustering Methods not Sufficient
- Subspace clustering, e.g. Agrawal etal, 1998
Non-redundant clustering Gondek Hofmann,
2004 - Do not exploit existing structures created by
others - Semi-supervised clustering, e.g Cohn etal, 2003
Do not produce several alternatives - Ensemble clustering, e.g. Strehl Ghosh,
2002Do not work on partial input structures
10Clustering for Collaborative Structuring
- Should exploit existing structures created by
others - Should produce several alternatives
- Should consider locality of input structures
11Localized Alternative Cluster Ensembles
Results ?i are composed of existing
clusterings ?ij on subsets S1,Sm of the set of
items to tag.
?1
. . .?n
- LACE Learning Task
- Given a set of objects S,
- input clusterings I ? ?i Si ? Gi,
- quality function q(I, O, S)
- LACE delivers a set of
- result clusterings O ? ?i Si ? Gi
- such that q(I, O, S) is maximized
- each ?i in O covers at least S.
alternative
jazz
pop
hip hop
a
d
f
e
g
funk
free jazz
?12
c
b
?11
12Localized Alternative Cluster Ensembles
?1
. . .?n
- Quality for a single result clustering
- Sum of similarities of items in S to a best
matching item in an input clustering - The number of input clusterings used tocover the
items in S should be minimized - Use each input clustering only once
alternative
jazz
hip hop
pop
a
d
f
e
g
free jazz
funk
?12
b
c
?11
Overall quality
13LACE Algorithm
alternative
jazz
a
y
free jazz
funk
c
b
a
c
x
?11
f
b
d
e
g
pop
hip hop
d
f
?12
14LACE Algorithm
Best matching cluster node isselected by
f-measure.
alternative
jazz
a
y
free jazz
funk
c
b
a
c
x
?11
f
b
d
e
g
pop
hip hop
d
f
?12
15LACE Algorithm
Items that are sufficiently similar to items in
the best matching clustering are deleted from the
query set.
alternative
jazz
a
y
free jazz
funk
f
b
c
d
x
?11
e
g
alternative
jazz
a
pop
hip hop
d
f
free jazz
funk
b
c
?12
?11
16LACE Algorithm
A new query is posed containing the remaining
items. Only tags not used yet are considered.
alternative
jazz
a
y
free jazz
funk
f
b
c
d
x
?11
e
g
alternative
jazz
a
pop
hip hop
d
f
free jazz
funk
b
c
?12
?11
17LACE Algorithm
The process continues until all items are
covered, no additional match is possible or a
maximal number of rounds is reached.
alternative
jazz
a
y
free jazz
funk
?1
b
c
x
?11
e
g
alternative
jazz
hip hop
pop
a
d
pop
f
hip hop
d
f
free jazz
funk
?12
b
c
?12
?11
18LACE Algorithm
Remaining items are added byclassification.
alternative
jazz
a
y
free jazz
funk
?1
b
c
x
?11
alternative
jazz
hip hop
pop
a
d
pop
f
hip hop
e
g
d
f
free jazz
funk
?12
b
c
?12
?11
19LACE Algorithm
Process starts anew until no more matches are
possible or the maximal number of results is
reached.
alternative
jazz
a
y
free jazz
funk
b
c
x
?11
pop
hip hop
pop
jazz
alternative
hip hop
d
f
Free jazz
funk
?12
?1
20LACE Algorithm
Process starts anew until no more matches are
possible or the maximal number of results is
reached.
alternative
jazz
a
y
free jazz
funk
b
c
x
?11
pop
hip hop
home
pop
jazz
alternative
work
hip hop
d
f
Free jazz
funk
office
plane
?12
?1
?2
?k
?3
21LACE Algorithm
Process starts anew until no more matches are
possible or the maximal number of results is
reached.
alternative
jazz
a
y
free jazz
funk
b
c
P2p Network
x
?11
pop
hip hop
home
pop
jazz
alternative
work
hip hop
d
f
Free jazz
funk
office
plane
?12
?1
?2
?k
?3
22Evaluation Leave One Clustering Out
- Given a evaluation set of clusterings
- Iteratively delete each clustering from this set
and cluster its items using the remaining
clusterings
home
work
pop
jazz
alternative
jazz
pop
independent
hip hop
hip hop
office
plane
funk
free jazz
23Evaluation Leave One Clustering Out
- Given a evaluation set of clusterings
- Iteratively delete each clustering from this set
and cluster its items using the remaining
clusterings
home
work
jazz
pop
independent
hip hop
office
plane
pop
jazz
alternative
hip hop
funk
free jazz
24Evaluation Leave One Clustering Out
- Given a evaluation set of clusterings
- Iteratively delete each clustering from this set
and cluster its items using the remaining
clusterings
home
work
jazz
pop
independent
hip hop
office
plane
pop
jazz
alternative
hip hop
pop
independent
home
work
funk
free jazz
25Evaluation Leave One Clustering Out
- Given a evaluation set of clusterings
- Iteratively delete each clustering from this set
and cluster its items using the remaining
clusterings
home
work
jazz
pop
independent
hip hop
office
plane
?
pop
jazz
alternative
hip hop
pop
independent
home
work
funk
free jazz
26Evaluation
- Nemoz Networked Media Organizer
- 39 user created tag structures on 1830 audio
files - Comparison of clusterings
- F-Measure (best match)
- correlation of tree distances
- abs. distance of tree distances
- Best result of 5 returned clusterings is used
27Evaluation
28Conclusion
- Importance of different views
- Exploit the work of others
- Collaborative Structuring
- Localized Alternative Cluster Ensembles
- Emerging tag structures Accuracy Diversity
29Conclusion
- Importance of different views
- Exploit the work of others
- Collaborative Structuring
- Localized Alternative Cluster Ensembles
- Emerging tag structures Accuracy Diversity
Thank you for your attention!