Title: Topic modeling
1Topic modeling
Mark Steyvers Department of Cognitive
Sciences University of California, Irvine
2Some topics we can discuss
- Introduction to LDA basic topic model
- Preliminary work on therapy transcripts
- Extensions to LDA
- Conditional topic models (for predicting
behavioral codes) - Various topic models for word order
- Topic models incorporating parse trees
- Topic models for dialogue
- Topic models incorporating speech information
3Most basic topic model LDA(Latent Dirichlet
Allocation)
4Automatic and unsupervised extraction of semantic
themes from large text collections.
- Pennsylvania Gazette
- (1728-1800)
- 80,000 articles
Enron 250,000 emails
NYT 330,000 articles
NSF/ NIH 100,000 grants
AOL queries 20,000,000 queries 650,000 users
16 million Medline articles
5Model Input
- Matrix of counts number of times words occur in
documents - Note
- word order is lost bag of words approach
- Some function words are deleted the, a, in
documents
words
6Basic Assumptions
- Each topic is a distribution over words
- Each document a mixture of topics
- Each word in a document originates from a single
topic
7Document mixture of topics
auto car parts cars used ford honda truck toyota
party store wedding birthday jewelry ideas cards c
ake gifts
hannah montana zac efron disney high school
musical miley cyrus hilary duff
webmd cymbalta xanax gout vicodin effexor predniso
ne lexapro ambien
20
Document ------------------------------- --------
--------------------------------------------------
---- ---------------------------------------------
------------------------------------------
80
100
Document ------------------------------- --------
--------------------------------------------------
---- ---------------------------------------------
------------------------------------------
8Generative Process
- For each document, choose a mixture of topics
- ? ? Dirichlet(?)
- Sample a topic 1..T from the mixture z ?
Multinomial(?) - Sample a word from the topic w ?
Multinomial(?(z)) ? ? Dirichlet(ß)
Nd
D
T
9Prior Distributions
- Dirichlet priors encourage sparsity on topic
mixtures and topics
Topic 3
Word 3
Topic 1
Topic 2
Word 1
Word 2
? Dirichlet( a )
? Dirichlet( ß )
(darker colors indicate lower probability)
10Toy Example
MONEY1 BANK1 BANK1 LOAN1 BANK1 MONEY1 BANK1
MONEY1 BANK1 LOAN1 LOAN1 BANK1 MONEY1 ....
1.0
.6
RIVER2 MONEY1 BANK2 STREAM2 BANK2 BANK1 MONEY1
RIVER2 MONEY1 BANK2 LOAN1 MONEY1 ....
.4
1.0
RIVER2 BANK2 STREAM2 BANK2 RIVER2 BANK2....
Topics
Topic Weights
Documents and topic assignments
11Statistical Inference
MONEY? BANK BANK? LOAN? BANK? MONEY? BANK? MONEY?
BANK? LOAN? LOAN? BANK? MONEY? ....
?
?
RIVER? MONEY? BANK? STREAM? BANK? BANK? MONEY?
RIVER? MONEY? BANK? LOAN? MONEY? ....
?
RIVER? BANK? STREAM? BANK? RIVER? BANK?....
Topics
Topic Weights
Documents and topic assignments
12Statistical Inference
- Three sets of latent variables
- document-topic distributions ?
- topic-word distributions ?
- topic assignments z
- Estimate posterior distribution over topic
assignments - P( z w )
- we collapse over topic mixtures and word
mixtures - we can later infer ? and ?
- Use approximate methods Markov chain Monte
Carlo (MCMC) with Gibbs sampling
13Toy Example Artificial Dataset
Two topics
16 documents
Docs
Can we recover the original topics and topic
mixtures from this data?
14Initialization assign word tokens randomly to
topics
(?topic 1 ?topic 2 )
15Gibbs Sampling
count of topic t assigned to doc d
count of word w assigned to topic t
probability that word i is assigned to topic t
16After 1 iteration
- Apply sampling equation to each word token
(?topic 1 ?topic 2 )
17After 4 iterations
(?topic 1 ?topic 2 )
18After 8 iterations
(?topic 1 ?topic 2 )
19After 32 iterations
?
(?topic 1 ?topic 2 )
20Summary of Algorithm
INPUT word-document counts (word order is
irrelevant)
OUTPUT topic assignments to each word P( zi
) likely words in each topic P( w z ) likely
topics in each document (gist) P( z d )
21Example topics from TASA an educational corpus
- 37K docs 26K word vocabulary
- 300 topics e.g.
22Three documents with the word play(numbers
colors ? topic assignments)
23LSA
documents
dims
dims
documents
C U D VT
dims
words
words
dims
Topic model
documents
topics
documents
C F Q
topics
words
words
normalized co-occurrence matrix
mixture weights
mixture components
24Documents as Topics Mixturesa Geometric
Interpretation
P(word1)
1
topic 1
observeddocument
0
topic 2
1
P(word2)
P(word3)
1
P(word1)P(word2)P(word3) 1
25Some Preliminary Work on Therapy Transcripts
26Defining documents
- Can define document in multiple ways
- all words within a therapy session
- all words from a particular speaker within a
session - Clearly we need to extend topic model to
dialogue.
27(No Transcript)
28Positive/Negative Topic Usage by Group
29Positive/ Negative Topic Usage by Changes in
Satisfaction
This graph shows that couples with a decrease in
satisfaction over the course of therapy use
relatively negative language. Those who leave
the therapy with increased satisfaction exhibit
more positive language
30Topics used by Satisfied/ Unsatisfied Couples
Topic 38
talk divorce problem house along separate separati
on talking agree example
Dissatisfied couples talk relatively more often
about separation and divorce
31Affect Dynamics
- Analyze the short-term dynamics of affect usage
- Do unhappy couples follow up negative language
with negative language more often than happy
couples? In other words, are unhappy couples
involved in a negative feedback loop? - Calculated
- P( z2 z1 )
- P( z2 z1- )
- P( z2- z1 )
- P( z2- z1- )
- E.g. P( z2- z1 ) is the probability that
after a positive word the next non-neutral word
will be a negative word
32Markov Chain Illustration Base rates
.51
.49
.27
z
Normal Controls
-
-
.73
.72
.28
.45
.55
.33
z
Positive Change
-
-
.73
.67
.27
.38
.62
.37
z
Little Change
-
-
.78
.63
.22
.35
.65
.41
z
Negative Change
-
-
.59
.78
.22
33Modeling Extensions
34Extensions
- Multi-label Document Classification
- conditional topic models
- Topic models and word order
- ngrams/collocations
- hidden-markov models
- Some potential model developments
- topic models incorporating parse trees
- topic models for dialogue
- topic models incorporating speech information
35Conditional Topic Models
Assume there is a topic associated with each
label/behavioral code. Model only is allowed to
assign words to labels that are associated with
the document This model can learn the
distribution of words associated with each
label/behavioral code
36Vulnerabilityyes Hard Expressionno
Vulnerability
word? word word? word? word? word? word? word?
word? word? word? word? word? ....
?
Vulnerabilityno Hard Expressionyes
word? word? word? word? word? word? word? word?
word? word? word? word? ....
Hard Expression
?
Vulnerabilityyes Hard Expressionyes
word? word? word? word? word? word?....
Topics associated with Behavioral Codes
Topic Weights
Documents and topic assignments
37Preliminary Results
38Topic Models for short-range sequential
dependencies
39Hidden Markov Topics Model
- Syntactic dependencies ? short range dependencies
- Semantic dependencies ? long-range
q
Semantic state generate words from topic model
z1
z2
z3
z4
w1
w2
w3
w4
Syntactic states generate words from HMM
s1
s2
s3
s4
(Griffiths, Steyvers, Blei, Tenenbaum, 2004)
40NIPS Semantics
IMAGE IMAGES OBJECT OBJECTS FEATURE RECOGNITION VI
EWS PIXEL VISUAL
KERNEL SUPPORT VECTOR SVM KERNELS SPACE FUNCTION
MACHINES SET
NETWORK NEURAL NETWORKS OUPUT INPUT TRAINING INPUT
S WEIGHTS OUTPUTS
EXPERTS EXPERT GATING HME ARCHITECTURE MIXTURE LEA
RNING MIXTURES FUNCTION GATE
MEMBRANE SYNAPTIC CELL CURRENT DENDRITIC POTENTI
AL NEURON CONDUCTANCE CHANNELS
DATA GAUSSIAN MIXTURE LIKELIHOOD POSTERIOR PRIOR D
ISTRIBUTION EM BAYESIAN PARAMETERS
STATE POLICY VALUE FUNCTION ACTION REINFORCEMENT L
EARNING CLASSES OPTIMAL
NIPS Syntax
IN WITH FOR ON FROM AT USING INTO OVER WITHIN
I X T N - C F P
IS WAS HAS BECOMES DENOTES BEING REMAINS REPRESENT
S EXISTS SEEMS
SEE SHOW NOTE CONSIDER ASSUME PRESENT NEED PROPOSE
DESCRIBE SUGGEST
MODEL ALGORITHM SYSTEM CASE PROBLEM NETWORK METHOD
APPROACH PAPER PROCESS
HOWEVER ALSO THEN THUS THEREFORE FIRST HERE NOW HE
NCE FINALLY
USED TRAINED OBTAINED DESCRIBED GIVEN FOUND PRESEN
TED DEFINED GENERATED SHOWN
41Random sentence generation
LANGUAGE S RESEARCHERS GIVE THE SPEECH S THE
SOUND FEEL NO LISTENERS S WHICH WAS TO BE
MEANING S HER VOCABULARIES STOPPED WORDS S HE
EXPRESSLY WANTED THAT BETTER VOWEL
42Collocation Topic Model
Terrorism
Wall Street Firms
Stock Market
Bankruptcy
WEEK DOW_JONES POINTS 10_YR_TREASURY_YIELD PERCENT
CLOSE NASDAQ_COMPOSITE STANDARD_POOR CHANGE FRIDA
Y DOW_INDUSTRIALS GRAPH_TRACKS EXPECTED BILLION NA
SDAQ_COMPOSITE_INDEX EST_02 PHOTO_YESTERDAY YEN 10
500_STOCK_INDEX
WALL_STREET ANALYSTS INVESTORS FIRM GOLDMAN_SACHS
FIRMS INVESTMENT MERRILL_LYNCH COMPANIES SECURITIE
S RESEARCH STOCK BUSINESS ANALYST WALL_STREET_FIRM
S SALOMON_SMITH_BARNEY CLIENTS INVESTMENT_BANKING
INVESTMENT_BANKERS INVESTMENT_BANKS
SEPT_11 WAR SECURITY IRAQ TERRORISM NATION KILLED
AFGHANISTAN ATTACKS OSAMA_BIN_LADEN AMERICAN ATTAC
K NEW_YORK_REGION NEW MILITARY NEW_YORK WORLD NATI
ONAL QAEDA TERRORIST_ATTACKS
BANKRUPTCY CREDITORS BANKRUPTCY_PROTECTION ASSETS
COMPANY FILED BANKRUPTCY_FILING ENRON BANKRUPTCY_C
OURT KMART CHAPTER_11 FILING COOPER BILLIONS COMPA
NIES BANKRUPTCY_PROCEEDINGS DEBTS RESTRUCTURING CA
SE GROUP
43Potential Model Developments
44Using parse trees/ pos taggers?
S
S
NP
NP
VP
VP
You complete me
I complete you
45Modeling Dialogue
46Topic Segmentation Model
- Purver, Kording, Griffiths, Tenenbaum, J. B.
(2006). Unsupervised topic modeling for
multi-party spoken discourse. Proceedings of the
21st International Conference on Computational
Linguistics and 44th Annual Meeting of the
Association for Computational Linguistics - Automatically segments multi-party discourse into
topically coherent segments - Outperforms standard HMMs
- Model does not incorporate speaker information or
speaker turns - goal is simply to segment long stream of words
into segments
47At each utterance, there is a prob. of changing
theta, the topic mixture. If no change is
indicated, words are drawn from the same mixture
of topics. If there is a change, the topic
mixture is resampled from Dirichley
48Latent Dialogue Structure modelDing et al. (Nips
workshop, 2009)
- Designed for modeling sequences of messages on
discussion forums - Models the relationship of messages within
documents a message might relate to any
previous message within a dialogue - It does not incorporate speaker specific
variables
49Some details
50Learning User Intentions in Spoken Dialogue
Systems Chinaei et al. (ICAART, 2009)
- Applies HTMM model (Gruber et al., 2007) to
dialogue - Assumes that within each talk-turn, words are
drawn from same topic z (not mixture!). At start
of new talk-turn, there is some probability (psi
below) of sampling new topic z from mixture theta
51Other ideas
- Can we enhance topic models with non-verbal
speech information - Each topic is a distribution over words as well
as voicing information (f0, timing, etc)
T
Nd
D
Non-verbal feature
52Other Extensions
53Learning Topic Hierarchies(example psych Review
Abstracts)
THE OF AND TO IN A IS
A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS
ACCOUNT
SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES IN
TERPERSONAL PERSONALITY SAMPLING
MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DI
RECTION CONTOURS SURFACES
DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGE
R EXTINCTION PAIN
RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMUL
I RECALL CHOICE CONDITIONING
SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SE
MANTIC
ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIO
NAL THINKING
GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL G
ROUPS MEMBERS
SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HA
NDEDNESS
REASONING ATTITUDE CONSISTENCY SITUATIONAL INFEREN
CE JUDGMENT PROBABILITIES STATISTICAL
IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT
ORIENTATION HOLOGRAPHIC
CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMU
LATION TOLERANCE RESPONSES