Predictively Modeling Social Text - PowerPoint PPT Presentation

About This Presentation

Title:

Predictively Modeling Social Text

Description:

football. The. Pittsburgh. Steelers. b. won. Box is shorthand for many repetitions of the structure... Cd ~ Mult( | ) = football' For each position ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 74

Provided by: willia95

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Predictively Modeling Social Text

1
Predictively Modeling Social Text

William W. Cohen
Machine Learning Dept. and Language Technologies
Institute
School of Computer Science
Carnegie Mellon University
Joint work with Amr Ahmed, Andrew Arnold,
Ramnath Balasubramanyan, Frank Lin, Matt Hurst
(MSFT), Ramesh Nallapati, Noah Smith, Eric Xing,
Tae Yano

2
Newswire Text
Social Media Text

Formal
Primary purpose
Inform typical reader about recent events
Broad audience
Explicitly establish shared context with reader
Ambiguity often avoided

Informal
Many purposes
Entertain, connect, persuade
Narrow audience
Friends and colleagues
Shared context already established
Many statements are ambiguous out of social
context

3
Newswire Text
Social Media Text

Goals of analysis
Extract information about events from text
Understanding text requires understanding
typical reader
conventions for communicating with him/her
Prior knowledge, background,

Goals of analysis
Very diverse
Evaluation is difficult
And requires revisiting often as goals evolve
Often understanding social text requires
understanding a community

4
Outline

Tools for analysis of text
Probabilistic models for text, communities, and
time
Mixture models and LDA models for text
LDA extensions to model hyperlink structure
LDA extensions to model time
Alternative framework based on graph analysis to
model time community
Preliminary results tradeoffs
Discussion of results challenges

5
Introduction to Topic Models

Multinomial Naïve Bayes

?
?
C
football
..
WN
..
W1
W2
W3
The
Pittsburgh
Steelers
won
M
b
b
Box is shorthand for many repetitions of the
structure.
6
Introduction to Topic Models

Multinomial Naïve Bayes

?
?
C
politics
..
WN
..
W1
W2
W3
The
Pittsburgh
mayor
stated
M
b
b
7
Introduction to Topic Models

Naïve Bayes Model Compact representation

?
?
C
C
..
WN
W1
W2
W3
W
M
N
b
M
b
8
Introduction to Topic Models

Multinomial Naïve Bayes

For each document d 1,?, M
Generate Cd Mult( ?)
For each position n 1,?, Nd
Generate wn Mult(?,Cd)

?
C

For document d 1
Generate Cd Mult( ?) football
For each position n 1,?, Nd67
Generate w1 Mult(?,Cd) the
Generate w2 Pittsburgh
Generate w3 Steelers
.

..
WN
W1
W2
W3
M
b
9
Introduction to Topic Models

Multinomial Naïve Bayes

In the graphs
shaded circles are known values
parents of variable W are the inputs to the
function used in generating W.
Goal given known values, estimate the rest,
usually to maximize the probability of the
observations

C
..
WN
W1
W2
W3
M
b
10
Introduction to Topic Models

Mixture model unsupervised naïve Bayes model

Joint probability of words and classes
But classes are not visible

?
C
Z
W
N
M
b
11
Introduction to Topic Models

Learning for naïve Bayes
Take logs, the function is convex, linear and
easy to optimize for any parameter

Learning for mixture model
Many local maxima (at least one for each
permutation of classes)
Expectation/maximization is most common method

12
Introduction to Topic Models

Mixture model EM solution

E-step
M-step
13
Introduction to Topic Models

Mixture model EM solution

E-step
Estimate the expected values of the unknown
variables (soft classification)
M-step
Maximize the values of the parameters subject to
this guessusually, this is learning the
parameter values given the soft classifications
14
Introduction to Topic Models
15
Introduction to Topic Models

Probabilistic Latent Semantic Analysis Model

Select document d Mult(?)
For each position n 1,?, Nd
generate zn Mult( ?d)
generate wn Mult( ?zn)

?d
?
Topic distribution
z

Mixture model
each document is generated by a single (unknown)
multinomial distribution of words, the corpus is
mixed by ?
PLSA model
each word is generated by a single unknown
multinomial distribution of words, each document
is mixed by ?d

w
N
M
?
16
Introduction to Topic Models
JMLR, 2003
17
Introduction to Topic Models

Latent Dirichlet Allocation

For each document d 1,?,M
Generate ?d Dir( ?)
For each position n 1,?, Nd
generate zn Mult( ?d)
generate wn Mult( ?zn)

a
z
w
N
M
?
18
Introduction to Topic Models

LDAs view of a document

19
Introduction to Topic Models

LDA topics

20
Introduction to Topic Models

Latent Dirichlet Allocation
Overcomes some technical issues with PLSA
PLSA only estimates mixing parameters for
training docs
Parameter learning is more complicated
Gibbs Sampling easy to program, often slow
Variational EM

21
Introduction to Topic Models

Perplexity comparison of various models

Unigram
Mixture model
PLSA
Lower is better
LDA
22
Introduction to Topic Models

Prediction accuracy for classification using
learning with topic-models as features

Higher is better
23
Outline

Tools for analysis of text
Probabilistic models for text, communities, and
time
Mixture models and LDA models for text
LDA extensions to model hyperlink structure
LDA extensions to model time
Alternative framework based on graph analysis to
model time community
Preliminary results tradeoffs
Discussion of results challenges

24
Hyperlink modeling using LDA
25
Hyperlink modeling using LinkLDAErosheva,
Fienberg, Lafferty, PNAS, 2004
a
?

For each document d 1,?,M
Generate ?d Dir( ?)
For each position n 1,?, Nd
generate zn Mult( ?d)
generate wn Mult( ?zn)
For each citation j 1,?, Ld
generate zj Mult( . ?d)
generate cj Mult( . ?zj)

z
z
w
c
N
L
M
?
g
Learning using variational EM
26
Hyperlink modeling using LDAErosheva, Fienberg,
Lafferty, PNAS, 2004
27
Newswire Text
Social Media Text

Goals of analysis
Extract information about events from text
Understanding text requires understanding
typical reader
conventions for communicating with him/her
Prior knowledge, background,

Goals of analysis
Very diverse
Evaluation is difficult
And requires revisiting often as goals evolve
Often understanding social text requires
understanding a community

Science as a testbed for social text an open
community which we understand
28
Modeling Citation InfluencesDietz, Bickel,
Scheffer, ICML 2007

Copycat model of citation influence

LDA model for cited papers
Extended LDA model for citing papers
For each word, depending on coin flip c, you
might chose to copy a word from a cited paper
instead of generating the word

29
Modeling Citation InfluencesDietz, Bickel,
Scheffer, ICML 2007

Citation influence graph for LDA paper

30
Models of hypertext for blogs ICWSM 2008
Ramesh Nallapati
me
31
LinkLDA model for citing documents Variant of
PLSA model for cited documents Topics are shared
between citing, cited Links depend on topics in
two documents
Link-PLSA-LDA
32
Experiments

8.4M blog postings in Nielsen/Buzzmetrics corpus
Collected over three weeks summer 2005
Selected all postings with gt2 inlinks or gt2
outlinks
2248 citing (2 outlinks), 1777 cited documents
(2 inlinks)
Only 68 in both sets, which are duplicated
Fit model using variational EM

33
Topics in blogs
Model can answer questions like which blogs are
most likely to be cited when discussing topic z?
34
Topics in blogs
Model can be evaluated by predicting which links
an author will include in a an article
Link-LDA
Link-PLDA-LDA
Lower is better
35
Another model Pairwise Link-LDA

LDA for both cited and citing documents
Generate an indicator for every pair of docs
Vs. generating pairs of docs
Link depends on the mixing components (?s)
stochastic block model

36
Pairwise Link-LDA supports new inferences
but doesnt perform better on link prediction
37
Outline

Tools for analysis of text
Probabilistic models for text, communities, and
time
Mixture models and LDA models for text
LDA extensions to model hyperlink structure
Observation these models can be used for many
purposes
LDA extensions to model time
Alternative framework based on graph analysis to
model time community
Discussion of results challenges

38
Predicting Response to Political Blog Posts with
Topic Models NAACL 09
Noah Smith
Tae Yano
39
Political blogs and and comments
Posts are often coupled with comment sections
Comment style is casual, creative, less carefully
edited
39
40
Political blogs and comments

Most of the text associated with large A-list
community blogs is comments
5-20x as many words in comments as in text for
the 5 sites considered in Yano et al.
A large part of socially-created commentary in
the blogosphere is comments.
Not blog ? blog hyperlinks
Comments do not just echo the post

41
Modeling political blogs
Our political blog model
CommentLDA
z, z topic w word (in post) w word (in
comments) u user
D of documents N of words in post
M of words in comments
42
Modeling political blogs
Our proposed political blog model
LHS is vanilla LDA
D of documents N of words in post
M of words in comments
43
Modeling political blogs
RHS to capture the generation of reaction
separately from the post body
Our proposed political blog model
Two chambers share the same topic-mixture
Two separate sets of word distributions
D of documents N of words in post
M of words in comments
44
Modeling political blogs
Our proposed political blog model
User IDs of the commenters as a part of comment
text
generate the words in the comment section
D of documents N of words in post
M of words in comments
45
Modeling political blogs
Another model we tried
Took out the words from the comment section!
The model is structurally equivalent to the
LinkLDA from (Erosheva et al., 2004)
This is a model agnostic to the words in the
comment section!
D of documents N of words in post
M of words in comments
46
Topic discovery - Matthew Yglesias (MY) site
46
47
Topic discovery - Matthew Yglesias (MY) site
47
48
Topic discovery - Matthew Yglesias (MY) site
48
49
Comment prediction
(MY)

LinkLDA and CommentLDA consistently outperform
baseline models
Neither consistently outperforms the other.

20.54
Comment LDA (R)
(CB)

(RS)

16.92
32.06
Link LDA (R)
Link LDA (C)
user prediction Precision at top 10 From left to
right Link LDA(-v, -r,-c) Cmnt LDA (-v, -r, -c),
Baseline (Freq, NB)
49
50
From Episodes to Sagas Temporally Clustering
News Via Social-Media Commentary current work
Noah Smith
Matthew Hurst
Frank Lin
Ramnath Balasubramanyan
51
Motivation

News-related blogosphere is driven by recency
Some recent news is better understood based on
context of sequence of related stories
Some readers have this context some dont
To reconstruct the context, reconstruct the
sequence of related stories (saga)
Similar to retrospective event detection
First efforts
Find related stories
Cluster by time
Evaluation agreement with human annotators

52
Clustering results on Democratic-primary-related
documents
k-walks (more later)
SpeCluster time Mixture of multinomials
model for general text timestamp from Gaussian
53
Clustering results on Democratic-primary-related
documents

Also had three human annotators build
gold-standard timelines
hierarchical
annotated with names of events, times,
Can evaluate a machine-produced timeline by
tree-distance to gold-standard one

54
Clustering results on Democratic-primary-related
documents

Issue divergence of opinion with human
annotators
is modeling community interests the problem?
how much of what we want is actually in the
data?
should this task be supervised or unsupervised?

55
More sophisticated time models

Hierarchical LDA Over Time model
LDA to generate text
Also generate a timestamp for each document from
topic-specific Gaussians
Non-parametric model
Number of clusters is also generated (not
specified by user)
Allows use of user-provided prototypes
Evaluated on liberal/conservative blogs and ML
papers from NIPS conferences

Ramnath Balasubramanyan
56
Results with HOTS model - unsupervised
57
Results with HOTS model human guidance

Adding human seeds for some key events improves
performance on all events.
Allows a user to partially specify a timeline of
events and have the system complete it.

58
Outline

Tools for analysis of text
Probabilistic models for text, communities, and
time
Mixture models and LDA models for text
LDA extensions to model hyperlink structure
LDA extensions to model time
Alternative framework based on graph analysis to
model time community
Preliminary results tradeoffs
Discussion of results challenges

59
Spectral Clustering Graph MatrixVector Node
? Weight
v
M
A B C D E F G H I J
A _ 1 1 1
B 1 _ 1
C 1 1 _
D _ 1 1
E 1 _ 1
F 1 1 1 _
G _ 1 1
H _ 1 1
I 1 1 _ 1
J 1 1 1 _
A
A 3
B 2
C 3
D
E
F
G
H
I
J
H
M
60
Spectral Clustering Graph MatrixMv1 v2
propogates weights from neighbors
v1
v2

M
A B C D E F G H I J
A _ 1 1 1
B 1 _ 1
C 1 1 _
D _ 1 1
E 1 _ 1
F 1 1 _
G _ 1 1
H _ 1 1
I 1 1 _ 1
J 1 1 1 _

A 3
B 2
C 3
D
E
F
G
H
I
J

A 213101
B 3131
C 3121
D
E
F
G
H
I
J
H
M
61
Spectral Clustering Graph MatrixMv1 v2
propogates weights from neighbors
v1
v2

M
A B C D E F G H I J
A _ 1 1 1
B 1 _ 1
C 1 1 _
D _ 1 1
E 1 _ 1
F 1 1 _
G _ 1 1
H _ 1 1
I 1 1 _ 1
J 1 1 1 _

A 3
B 2
C 3
D
E
F
G
H
I
J

A 5
B 6
C 5
D
E
F
G
H
I
J
H
M
62
Spectral Clustering Graph MatrixWv1 v2
propogates weights from neighbors
e2
0.4
0.2
x
x
x
x
x
x
x
x
x
0.0
x
x
x
-0.2
y
z
y
y
e3
z
z
z
-0.4
y
z
z
z
z
z
z
z
y
e2
-0.4
-0.2
0
0.2
e1
Shi Meila, 2002
M
63
Spectral Clustering

If W is row-normalized adjacency matrix for a
connected graph with k closely-connected
subcommunities then
the top eigenvector is a constant vector
the next k eigenvectors are roughly piecewise
constant with pieces corresponding to
subcommunities

Spectral clustering
Find the top k1 eigenvectors v1,,vk1
Discard the top one
Replace every node a with k-dimensional vector
xa ltv2(a),,vk1 (a) gt
Cluster with k-means

M
64
Spectral Clustering Pros and Cons

Elegant, and well-founded mathematically
Works quite well when relations are approximately
transitive (like similarity, social connections)
Expensive for very large datasets
Computing eigenvectors is the bottleneck
Noisy datasets cause problems
Informative eigenvectors need not be in top few
Performance can drop suddenly from good to
terrible

65
Experimental results best-case assignment of
class labels to clusters
Adamic Glance Divided They Blog 2004
66
Spectral Clustering Graph MatrixMv1 v2
propogates weights from neighbors
v1
v2

M
A B C D E F G H I J
A _ 1 1 1
B 1 _ 1
C 1 1 _
D _ 1 1
E 1 _ 1
F 1 1 _
G _ 1 1
H _ 1 1
I 1 1 _ 1
J 1 1 1 _

A 3
B 2
C 3
D
E
F
G
H
I
J

A 5
B 6
C 5
D
E
F
G
H
I
J
H
M
67
Repeated averaging with neighbors as a clustering
method

Pick a vector v0 (maybe even at random)
Compute v1 Wv0
i.e., replace v0x with weighted average of
v0y for the neighbors y of x
Plot v1x for each x
Repeat for v2, v3,
What are the dynamics of this process?

68
Repeated averaging with neighbors on a sample
problem
larger
small
69
PIC Power Iteration Clusteringrun power
iteration (repeated averaging w/ neighbors) with
early stopping
Frank Lin

Formally, can show this works when spectral
techniques work
Experimentally, linear time
Easy to implement and efficient
Very easily parallelized
Experimentally, often better than traditional
spectral methods

70
Experimental results best-case assignment of
class labels to clusters
71
Experiments run time and scalability
Time in millisec
72
Clustering results on Democratic-primary-related
documents
k-walks

k-walks is early version of PIC
cluster a graph with several types of nodes
blog entries news stories and dates.
clusters (communities) of the graph correspond
to events in the saga.
Advantage
PIC clusters at interactive speeds.

vs human annotators
73
Outline