Title: Learning WithinSentence Semantic Coherence
1Learning Within-Sentence Semantic Coherence
- Elena Eneva
- Rose Hoberman
- Lucian Lita
- Carnegie Mellon University
2Semantic (in)Coherence
- Trigram content words unrelated
- Effect on speech recognition
- Actual Utterance THE BIRD FLU HAS AFFECTED
CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING
HUMANS SICK - Top Hypothesis THE BIRD FLU HAS AFFECTED
SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING
HUMAN SAID - Our goal model semantic coherence
3A Whole Sentence Exponential Model Rosenfeld
1997
def
- P0(s) is an arbitrary initial model (typically
N-gram) - fi(s)s are arbitrary computable properties of s
(aka features) - Z is a universal normalizing constant
4A Methodology for Feature Induction
- Given corpus T of training sentences
- Train best-possible baseline model, P0(s)
- Use P0(s) to generate corpus T0 of pseudo
sentences - Pose a challenge find (computable) differences
that allow discrimination between T and T0 - Encode the differences as features fi(s)
- Train a new model
5Discrimination Task
Are these content words generated from a trigram
or a natural sentence?
- - - - feel - - sacrifice - - sense - - - - - - -
- -meant - - - - - - - - trust - - - - truth - - - kind - free trade agreements - - - living - -
ziplock bag - - - - - - university japan's daiwa
bank stocks step
6Building on Prior Work
- Define content words (all but top 50)
- Goal model distribution of content words in
sentence - Simplify model pairwise co-occurrences (content
word pairs) - Collect contingency tables calculate measure of
association for them
7Q Correlation Measure
Derived from Co-occurrence Contingency Table
- Q values range from 1 to 1
8Density Estimates
- We hypothesized
- Trigram sentences wordpair correlation
completely determined by distance - Natural sentences wordpair correlation
independent of distance - kernel density estimation
- distribution of Q values in each corpus
- at varying distances
9Q Distributions
---- Trigram Generated Broadcast News
Density
Q Value
10Likelihood Ratio Feature
she is a country singer searching for fame and
fortune in nashville Q(country,nashville) 0.76
Distance 8 Pr (Q0.76d8,BNews) 0.32
Pr(Q0.76d8,Trigram) 0.11 Likelihood ratio
0.32/0.11 2.9
11Simpler Features
- Q Value based
- Mean, median, min, max of Q values for content
word pairs in the sentence (Cai et al 2000) - Percentage of Q values above a threshold
- High/low correlations across large/small
distances - Other
- Word and phrase repetition
- Percentage of stop words
- Longest sequence of consecutive stop/content
words
12Datasets
- LM and contingency tables (Q values) derived from
103 million words of BN - From remainder of BN corpus and sentences sampled
from trigram LM - Q value distributions estimated from 100,000
sentences - Decision tree trained and test on 60,000
sentences - Disregarded sentences with
- Mike Stevens says its not real
- Weve been hearing about it
13Experiments
- Learners
- C5.0 decision tree
- Boosting decision stumps with Adaboost.MH
- Methodology
- 5-fold cross validation on 60,000 sentences
- Boosting for 300 rounds
14Results
15Shannon-Style Experiment
- 50 sentences
- ½ real and ½ trigram-generated
- Stopwords replaced by dashes
- 30 participants
- Average accuracy of 73.77 6
- Best individual accuracy 84
- Our classifier
- Accuracy of 78.9 0.42
16Summary
- Introduced a set of statistical features which
capture aspects of semantic coherence - Trained a decision tree to classify with accuracy
of 80 - Next step incorporate features into exponential
LM
17Future Work
- Combat data sparsity
- Confidence intervals
- Different correlation statistic
- Stemming or clustering vocabulary
- Evaluate derived features
- Incorporate into an exponential language model
- Evaluate the model on a practical application
18Agreement among Participants
19Expected Perplexity Reduction
- Semantic coherence feature
- 78 of broadcast news sentences
- 18 of trigram-generated sentences
- Kullback-Leibler divergence .814
- Average perplexity reduction per word .0419
(2.814/21) per sentence? - Features modify probability of entire sentence
- Effect of feature on per-word probability is small
20Distribution of Likelihood Ratio
Density
Likelihood Value
21Discrimination Task
- Natural Sentence
- but it doesn't feel like a sacrifice in a sense
that you're really saying this is you know i'm
meant to do things the right way and you trust it
and tell the truth - Trigram-Generated
- they just kind of free trade agreements which
have been living in a ziplock bag that you say
that i see university japan's daiwa bank stocks
step though
22Q Values at Distance 1
---- Trigram Generated Broadcast News
Density
Q Value
23Q Values at Distance 3
Density
Q Value
24Outline
- The problem of semantic (in)coherence
- Incorporating this into the whole-sentence
exponential LM - Finding better features for this model using
machine learning - Semantic coherence features
- Experiments and results