Crosspartition Clustering: - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Crosspartition Clustering:

Description:

Revealing Analogous Themes across Related Topics. Motivation: identifying analogies / correspondences ... 'divinity': omnipotent , almighty, mercy, infinite ' ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 25
Provided by: zvika
Category:

less

Transcript and Presenter's Notes

Title: Crosspartition Clustering:


1
Cross-partition Clustering
Revealing Analogous Themes across Related Topics
Zvika Marx, Ido Dagan, Eli Shamir
2
Outline
  • Motivation identifying analogies /
    correspondences across different domains a
    fascinating aspect of intelligence !
  • Computational framework revealing concepts /
    themes through Word Clustering
  • Probabilistic clustering the Information
    Distortion / Information Bottleneck methods
  • The Cross Partition Clustering Method
  • Algorithm and underlying principles
  • Application to cross-religion comparison

3
What is Analogy?
  • Non-obvious similarity or correspondence
  • Principled, deep, systematic relation (in
    contrast to mere appearance)
  • Analogies are discovered or revealed
  • related with problem solving
  • require insight, creativity, fluid thinking
  • Concept formation , Feature selection ?

4
Example Orchestra and Army
General
5
Cross-partition Clustering Setting
  • Given data pre-partitioned to distinct subsets
    w1,, wN, N ? 2
  • Goal revealing themes that cut across all subsets

Subset w1
Subset w2
Subset wN
  • soft (probabilistic) assignments
  • (the formalism may even allows soft
    pre-partitioning)

partition
partition
partition
Previous works Dagan, Marx Shamir, CoNLL
2002 Marx, Dagan Shamir, NIPS 2003
6
Background Probabilistic Clustering
  • Input
  • for each x?X (clustered element) relative freq
    p(x)
  • for each x and each y?Y (feature)
  • conditional occurrence probability p(yx)
  • Output
  • for each x and each cluster c assignment
    probability p(cx)
  • ( and for each c, distribution over the y's
    p(yc) a centroid representation of c
    in the feature space )

7
Prob. Clustering Iterative Loop
  • K-means style iterative loop
  • (1) Recalculate assignment probabilities pt(cx)
  • pt(cx) ? pt-1(c) exp ??KLp(yx)pt?1(yc)
  • exponentially inverse to KL divergence between
    the representative feature distributions of x and
    c
  • (2) Recalculate cluster centroids pt(yc) for
    each cluster c
  • pt(yc) ? ?x p(x) pt(cx) p(yx)

t
8
Information Distortion / Bottleneck
  • Data clustering interpreted as a means for
    conveying the relevant information in the data
  • It maximizes information the clusters provide
    about whats relevant (i.e. the feature variable
    Y)
  • Subject to the above, minimize the information
    provided by the data about the clusters ? Maximum
    Entropy
  • Optimization problem
  • argmin H(C) ? H(CX) ? H(YC)
  • The optional H(C) differentiate IB (with) from
    ID (without)
  • The K-means iterative steps are derived from
    this term

over all p(c), p(cx), p(yc) distributions
9
ID/IB Convergence
  • The minimized term H(C) ? H(CX) ? H(YC)
    is bounded
  • Each iterative loop reduces its value (at time t)
    by KL pt(c) pt-1(c)
  • ?x p(x) KL pt-1(cx) pt(cx)
  • ? ?c pt(c) KL pt(yc) pt-1(yc) gt 0
  • ? convergence to a configuration of distributions
    that (locally) minimizes the term

10
CP Clustering Setting (reminder)
  • Data pre-partitioned to w1,, wN, N ? 2
  • Goal revealing, themes that cut across all
    subsets

Subset w1
Subset w2
Subset wN
  • soft (probabilistic) assignments
  • (the formalism may also allow soft
    pre-partitioning)

partition
partition
partition
11
Cross-partition Clustering
  • Input additional to p(x), p(yx)
  • for each x and each w?W (pre-given subset)
  • prob. of assignment to pre-given subset p(wx)
  • In principle, p(wx) can be any prob. dist. over
    W in our experiments 0/1 hard partitioning
  • Output
  • p(cx) (as in any probabilistic clustering)
  • A novel aspect re-associating (re-assigning)
    features to clusters p(cy)
  • Two types of centroids p(yc,w), p(yc)

12
The Cross Partition Algorithm
  • (1) Assignment probability p(cx) (as in IB/ID)
  • pt (cx) ? exp ??KLp(yx)pt?1(yc)
  • (2) Recalculate w-projected local centroids
  • pt (yc,w) ? ?x p(x) pt (cx) p(yx) p(wx)
  • (3) Re-associate feature with clusters
  • pt (cy) ? ?w pt (yc,w)?p(w)
  • (4) Biased centroids, based on the above
  • pt (yc) ? pt (cy) p(y)

t
13
Cross-partition Principles
  • Assignments p(cx) as in IB/ID (relying on MaxEnt
    principle)
  • Look for C that, jointly with W, is informative
    about Y distribution
  • ? local centroids p(yc,w)
  • Re-associate features with clusters, p(cy),
    so that the associations are independent of W
  • (again relying on MaxEnt principle)

14
4 Different Terms to Optimize
(1) FCP1 ? H(C) ? H(CX) ? H(YC)
H(YC) ? ? ?x p(x) ?c p(cx) ?y p(yx) log
p(yc) (2) FCP2 H(YC,W) ( assuming
I(CYWX) 0 ) ? ? ?x p(x) ?c p(cx) ?y
p(yx) ?w p(wx) log p(yc,w) (3) FCP3 ?
H(C) ? H(CY) ? H(YC,W) H(CY)
? ?y p(y) ?c p(cy) log p(cy) H(YC,W)
? ? ?w p(w) ?y p(y) ?c p(cy) log p(yc,w)
(4) FCP4 H(YC) ? ? ?y p(y) ?c p(cy)
log p(yc)
15
Dynamics ID/IB versus CP
Information Distortion
Cross Partition
? H(CX)
? H(CX)
H(YC,W)
H(YC)
H(YC)
? H(CY)
16
Priored and Unpriored Variants
  • As in the IB method, it is possible to add prior
    in the iterative cycle update steps (depending on
    the exact terms being optimized)
  • - In step (1)
  • pt(cx) ? pt?1(c) exp
    ??KLp(yx)pt?1(yc)
  • - In step (3)
  • pt(cy) ? pt (c) ?w pt?1(yc,w)?p(w)
  • So there are four variants of the CP method
    (withwithout prior in step (1)(3) )
  • The previous works mentioned before (CoNLL2002,
    NIPS2004) in fact implement two of these variants

17
Religion Data
  • Given 5 corpora focused on religions
    Buddhism, Christianity, Hinduism, Islam and
    Judaism.
  • Encyclopedic entries, online magazines,
    introductory web articles.
  • Co-occurrence statistics
  • 200 (auto extracted) keywords of each religion
    (same word appearing in two corpora is taken as
    two distinct elements)
  • Count feature words (7000) in an undirected ?5
    window truncated by sentence boundaries.

18
Clusters Reveal Meaningful Themes
  • Two Clusters spiritual vs. establishment
    aspects
  • Seven Clusters (our titles, highest p(cy)
    features that did not have a dual role of
    clustered keywords)
  • schools central, dominant?, mainstream,
    affiliate
  • divinity omnipotent?, almighty, mercy,
    infinite
  • religious experience intrinsic, mental,
    realm, mature
  • writings commentary, manuscript, dictionary,
    grammar
  • festivals and rite annual, funeral,
    rebuild, feast
  • sin and suffering vegetable, insect,
    penalty, quench
  • community and family parent, nursing,
    spouse, elderly
  • Interesting correspondence to classical comp.
    religion work dimensions of the scared (Smart,
    1999)
  • ritual, mythic, experiential/emotional, ethical,
    social, material

19
Evaluation
  • Measure how well the output clusters capture
    expert clusters, of freely chosen keywords.
  • We have not instructed the experts on how to
    choose the words.
  • Three experts participated each one of them
    covered a different subset of all possible
    religion comparisons.

20
Example Good Match to Expert
  • Our sacred writings cross-partition cluster
  • terms used by the expert are underlined
  • first 15 words per religion shown (p(cx)
    indicated)
  • Corresponding expert cluster ( hit scores
    high in another cluster)

21
Example Poor Match to Expert
  • Our suffer, sin and punishment CP cluster
  • The mysticism expert cluster was most closely
    relevant

22
Quantitative Evaluation
  • Comparing religion pairs (W 2) hard
    clustering
  • Evaluation restricted to terms common to two
    experts
  • Jaccard coefficient proportion between
  • num. of pairs of words co-assigned to the same
    cluster (no matter which) by both expert and
    algorithm
  • A num. of pairs co-assigned by one but not by
    the other

23
Conclusion
  • Focus on particular features ones that are
    important in the context of identifying
    commonalities across domains
  • Applicable to real-world data
  • Principled info-theoretic approach
  • For future work
  • Detect relational structure
  • Problem solving
  • More applications Commercial products, Legal

24
Thank You
  • Time
  • - Last session
  • Patience
  • (Last session!)
Write a Comment
User Comments (0)
About PowerShow.com