Crosspartition Clustering: - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Crosspartition Clustering:

Description:

Revealing Analogous Themes across Related Topics. Motivation: identifying analogies / correspondences ... 'divinity': omnipotent , almighty, mercy, infinite ' ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 25

Provided by: zvika

Category:

more less

Transcript and Presenter's Notes

Title: Crosspartition Clustering:

1
Cross-partition Clustering
Revealing Analogous Themes across Related Topics
Zvika Marx, Ido Dagan, Eli Shamir
2
Outline

Motivation identifying analogies /
correspondences across different domains a
fascinating aspect of intelligence !
Computational framework revealing concepts /
themes through Word Clustering
Probabilistic clustering the Information
Distortion / Information Bottleneck methods
The Cross Partition Clustering Method
Algorithm and underlying principles
Application to cross-religion comparison

3
What is Analogy?

Non-obvious similarity or correspondence
Principled, deep, systematic relation (in
contrast to mere appearance)
Analogies are discovered or revealed
related with problem solving
require insight, creativity, fluid thinking
Concept formation , Feature selection ?

4
Example Orchestra and Army
General
5
Cross-partition Clustering Setting

Given data pre-partitioned to distinct subsets
w1,, wN, N ? 2
Goal revealing themes that cut across all subsets

Subset w1
Subset w2
Subset wN

soft (probabilistic) assignments
(the formalism may even allows soft
pre-partitioning)

partition
partition
partition
Previous works Dagan, Marx Shamir, CoNLL
2002 Marx, Dagan Shamir, NIPS 2003
6
Background Probabilistic Clustering

Input
for each x?X (clustered element) relative freq
p(x)
for each x and each y?Y (feature)
conditional occurrence probability p(yx)
Output
for each x and each cluster c assignment
probability p(cx)
( and for each c, distribution over the y's
p(yc) a centroid representation of c
in the feature space )

7
Prob. Clustering Iterative Loop

K-means style iterative loop
(1) Recalculate assignment probabilities pt(cx)
pt(cx) ? pt-1(c) exp ??KLp(yx)pt?1(yc)
exponentially inverse to KL divergence between
the representative feature distributions of x and
c
(2) Recalculate cluster centroids pt(yc) for
each cluster c
pt(yc) ? ?x p(x) pt(cx) p(yx)

t
8
Information Distortion / Bottleneck

Data clustering interpreted as a means for
conveying the relevant information in the data
It maximizes information the clusters provide
about whats relevant (i.e. the feature variable
Y)
Subject to the above, minimize the information
provided by the data about the clusters ? Maximum
Entropy
Optimization problem
argmin H(C) ? H(CX) ? H(YC)
The optional H(C) differentiate IB (with) from
ID (without)
The K-means iterative steps are derived from
this term

over all p(c), p(cx), p(yc) distributions
9
ID/IB Convergence

The minimized term H(C) ? H(CX) ? H(YC)
is bounded
Each iterative loop reduces its value (at time t)
by KL pt(c) pt-1(c)
?x p(x) KL pt-1(cx) pt(cx)
? ?c pt(c) KL pt(yc) pt-1(yc) gt 0
? convergence to a configuration of distributions
that (locally) minimizes the term

10
CP Clustering Setting (reminder)

Data pre-partitioned to w1,, wN, N ? 2
Goal revealing, themes that cut across all
subsets

Subset w1
Subset w2
Subset wN

soft (probabilistic) assignments
(the formalism may also allow soft
pre-partitioning)

partition
partition
partition
11
Cross-partition Clustering

Input additional to p(x), p(yx)
for each x and each w?W (pre-given subset)
prob. of assignment to pre-given subset p(wx)
In principle, p(wx) can be any prob. dist. over
W in our experiments 0/1 hard partitioning
Output
p(cx) (as in any probabilistic clustering)
A novel aspect re-associating (re-assigning)
features to clusters p(cy)
Two types of centroids p(yc,w), p(yc)

12
The Cross Partition Algorithm

(1) Assignment probability p(cx) (as in IB/ID)
pt (cx) ? exp ??KLp(yx)pt?1(yc)
(2) Recalculate w-projected local centroids
pt (yc,w) ? ?x p(x) pt (cx) p(yx) p(wx)
(3) Re-associate feature with clusters
pt (cy) ? ?w pt (yc,w)?p(w)
(4) Biased centroids, based on the above
pt (yc) ? pt (cy) p(y)

t
13
Cross-partition Principles

Assignments p(cx) as in IB/ID (relying on MaxEnt
principle)
Look for C that, jointly with W, is informative
about Y distribution
? local centroids p(yc,w)
Re-associate features with clusters, p(cy),
so that the associations are independent of W
(again relying on MaxEnt principle)

14
4 Different Terms to Optimize
(1) FCP1 ? H(C) ? H(CX) ? H(YC)
H(YC) ? ? ?x p(x) ?c p(cx) ?y p(yx) log
p(yc) (2) FCP2 H(YC,W) ( assuming
I(CYWX) 0 ) ? ? ?x p(x) ?c p(cx) ?y
p(yx) ?w p(wx) log p(yc,w) (3) FCP3 ?
H(C) ? H(CY) ? H(YC,W) H(CY)
? ?y p(y) ?c p(cy) log p(cy) H(YC,W)
? ? ?w p(w) ?y p(y) ?c p(cy) log p(yc,w)
(4) FCP4 H(YC) ? ? ?y p(y) ?c p(cy)
log p(yc)
15
Dynamics ID/IB versus CP
Information Distortion
Cross Partition
? H(CX)
? H(CX)
H(YC,W)
H(YC)
H(YC)
? H(CY)
16
Priored and Unpriored Variants

As in the IB method, it is possible to add prior
in the iterative cycle update steps (depending on
the exact terms being optimized)
- In step (1)
pt(cx) ? pt?1(c) exp
??KLp(yx)pt?1(yc)
- In step (3)
pt(cy) ? pt (c) ?w pt?1(yc,w)?p(w)
So there are four variants of the CP method
(withwithout prior in step (1)(3) )
The previous works mentioned before (CoNLL2002,
NIPS2004) in fact implement two of these variants

17
Religion Data

Given 5 corpora focused on religions
Buddhism, Christianity, Hinduism, Islam and
Judaism.
Encyclopedic entries, online magazines,
introductory web articles.
Co-occurrence statistics
200 (auto extracted) keywords of each religion
(same word appearing in two corpora is taken as
two distinct elements)
Count feature words (7000) in an undirected ?5
window truncated by sentence boundaries.

18
Clusters Reveal Meaningful Themes

Two Clusters spiritual vs. establishment
aspects
Seven Clusters (our titles, highest p(cy)
features that did not have a dual role of
clustered keywords)
schools central, dominant?, mainstream,
affiliate
divinity omnipotent?, almighty, mercy,
infinite
religious experience intrinsic, mental,
realm, mature
writings commentary, manuscript, dictionary,
grammar
festivals and rite annual, funeral,
rebuild, feast
sin and suffering vegetable, insect,
penalty, quench
community and family parent, nursing,
spouse, elderly
Interesting correspondence to classical comp.
religion work dimensions of the scared (Smart,
1999)
ritual, mythic, experiential/emotional, ethical,
social, material

19
Evaluation

Measure how well the output clusters capture
expert clusters, of freely chosen keywords.
We have not instructed the experts on how to
choose the words.
Three experts participated each one of them
covered a different subset of all possible
religion comparisons.

20
Example Good Match to Expert

Our sacred writings cross-partition cluster
terms used by the expert are underlined
first 15 words per religion shown (p(cx)
indicated)
Corresponding expert cluster ( hit scores
high in another cluster)

21
Example Poor Match to Expert

Our suffer, sin and punishment CP cluster
The mysticism expert cluster was most closely
relevant

22
Quantitative Evaluation

Comparing religion pairs (W 2) hard
clustering
Evaluation restricted to terms common to two
experts
Jaccard coefficient proportion between
num. of pairs of words co-assigned to the same
cluster (no matter which) by both expert and
algorithm
A num. of pairs co-assigned by one but not by
the other

23
Conclusion

Focus on particular features ones that are
important in the context of identifying
commonalities across domains
Applicable to real-world data
Principled info-theoretic approach
For future work
Detect relational structure
Problem solving
More applications Commercial products, Legal

24
Thank You