Learning WithinSentence Semantic Coherence

About This Presentation

Title:

Learning WithinSentence Semantic Coherence

Description:

Learning Within-Sentence Semantic Coherence. Elena Eneva. Rose Hoberman. Lucian Lita ... Top Hypothesis: 'THE BIRD FLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 25

Provided by: unkn527

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning WithinSentence Semantic Coherence

1
Learning Within-Sentence Semantic Coherence

Elena Eneva
Rose Hoberman
Lucian Lita
Carnegie Mellon University

2
Semantic (in)Coherence

Trigram content words unrelated
Effect on speech recognition
Actual Utterance THE BIRD FLU HAS AFFECTED
CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING
HUMANS SICK
Top Hypothesis THE BIRD FLU HAS AFFECTED
SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING
HUMAN SAID
Our goal model semantic coherence

3
A Whole Sentence Exponential Model Rosenfeld
1997
def

P0(s) is an arbitrary initial model (typically
N-gram)
fi(s)s are arbitrary computable properties of s
(aka features)
Z is a universal normalizing constant

4
A Methodology for Feature Induction

Given corpus T of training sentences
Train best-possible baseline model, P0(s)
Use P0(s) to generate corpus T0 of pseudo
sentences
Pose a challenge find (computable) differences
that allow discrimination between T and T0
Encode the differences as features fi(s)
Train a new model

5
Discrimination Task
Are these content words generated from a trigram
or a natural sentence?

- - - feel - - sacrifice - - sense - - - - - - -
- -meant - - - - - - - - trust - - - - truth
- - kind - free trade agreements - - - living - -
ziplock bag - - - - - - university japan's daiwa
bank stocks step

6
Building on Prior Work

Define content words (all but top 50)
Goal model distribution of content words in
sentence
Simplify model pairwise co-occurrences (content
word pairs)
Collect contingency tables calculate measure of
association for them

7
Q Correlation Measure
Derived from Co-occurrence Contingency Table

Q values range from 1 to 1

8
Density Estimates

We hypothesized
Trigram sentences wordpair correlation
completely determined by distance
Natural sentences wordpair correlation
independent of distance
kernel density estimation
distribution of Q values in each corpus
at varying distances

9
Q Distributions
---- Trigram Generated Broadcast News
Density
Q Value
10
Likelihood Ratio Feature
she is a country singer searching for fame and
fortune in nashville Q(country,nashville) 0.76
Distance 8 Pr (Q0.76d8,BNews) 0.32
Pr(Q0.76d8,Trigram) 0.11 Likelihood ratio
0.32/0.11 2.9
11
Simpler Features

Q Value based
Mean, median, min, max of Q values for content
word pairs in the sentence (Cai et al 2000)
Percentage of Q values above a threshold
High/low correlations across large/small
distances
Other
Word and phrase repetition
Percentage of stop words
Longest sequence of consecutive stop/content
words

12
Datasets

LM and contingency tables (Q values) derived from
103 million words of BN
From remainder of BN corpus and sentences sampled
from trigram LM
Q value distributions estimated from 100,000
sentences
Decision tree trained and test on 60,000
sentences
Disregarded sentences with
Mike Stevens says its not real
Weve been hearing about it

13
Experiments

Learners
C5.0 decision tree
Boosting decision stumps with Adaboost.MH
Methodology
5-fold cross validation on 60,000 sentences
Boosting for 300 rounds

14
Results
15
Shannon-Style Experiment

50 sentences
½ real and ½ trigram-generated
Stopwords replaced by dashes
30 participants
Average accuracy of 73.77 6
Best individual accuracy 84
Our classifier
Accuracy of 78.9 0.42

16
Summary

Introduced a set of statistical features which
capture aspects of semantic coherence
Trained a decision tree to classify with accuracy
of 80
Next step incorporate features into exponential
LM

17
Future Work

Combat data sparsity
Confidence intervals
Different correlation statistic
Stemming or clustering vocabulary
Evaluate derived features
Incorporate into an exponential language model
Evaluate the model on a practical application

18
Agreement among Participants
19
Expected Perplexity Reduction

Semantic coherence feature
78 of broadcast news sentences
18 of trigram-generated sentences
Kullback-Leibler divergence .814
Average perplexity reduction per word .0419
(2.814/21) per sentence?
Features modify probability of entire sentence
Effect of feature on per-word probability is small

20
Distribution of Likelihood Ratio
Density
Likelihood Value
21
Discrimination Task

Natural Sentence
but it doesn't feel like a sacrifice in a sense
that you're really saying this is you know i'm
meant to do things the right way and you trust it
and tell the truth
Trigram-Generated
they just kind of free trade agreements which
have been living in a ziplock bag that you say
that i see university japan's daiwa bank stocks
step though

22
Q Values at Distance 1
---- Trigram Generated Broadcast News
Density
Q Value
23
Q Values at Distance 3
Density
Q Value
24
Outline

The problem of semantic (in)coherence
Incorporating this into the whole-sentence
exponential LM
Finding better features for this model using
machine learning
Semantic coherence features
Experiments and results

Write a Comment

User Comments (0)

About PowerShow.com

Learning WithinSentence Semantic Coherence - PowerPoint PPT Presentation

Learning WithinSentence Semantic Coherence

Learning Within-Sentence Semantic Coherence. Elena Eneva. Rose Hoberman. Lucian Lita ... Top Hypothesis: 'THE BIRD FLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY ... – PowerPoint PPT presentation