Learning WithinSentence Semantic Coherence - PowerPoint PPT Presentation

About This Presentation
Title:

Learning WithinSentence Semantic Coherence

Description:

Learning Within-Sentence Semantic Coherence. Elena Eneva. Rose Hoberman. Lucian Lita ... Top Hypothesis: 'THE BIRD FLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 25
Provided by: unkn527
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning WithinSentence Semantic Coherence


1
Learning Within-Sentence Semantic Coherence
  • Elena Eneva
  • Rose Hoberman
  • Lucian Lita
  • Carnegie Mellon University

2
Semantic (in)Coherence
  • Trigram content words unrelated
  • Effect on speech recognition
  • Actual Utterance THE BIRD FLU HAS AFFECTED
    CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING
    HUMANS SICK
  • Top Hypothesis THE BIRD FLU HAS AFFECTED
    SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING
    HUMAN SAID
  • Our goal model semantic coherence

3
A Whole Sentence Exponential Model Rosenfeld
1997
def
  • P0(s) is an arbitrary initial model (typically
    N-gram)
  • fi(s)s are arbitrary computable properties of s
    (aka features)
  • Z is a universal normalizing constant

4
A Methodology for Feature Induction
  • Given corpus T of training sentences
  • Train best-possible baseline model, P0(s)
  • Use P0(s) to generate corpus T0 of pseudo
    sentences
  • Pose a challenge find (computable) differences
    that allow discrimination between T and T0
  • Encode the differences as features fi(s)
  • Train a new model

5
Discrimination Task
Are these content words generated from a trigram
or a natural sentence?
  • - - - feel - - sacrifice - - sense - - - - - - -
    - -meant - - - - - - - - trust - - - - truth
  • - - kind - free trade agreements - - - living - -
    ziplock bag - - - - - - university japan's daiwa
    bank stocks step

6
Building on Prior Work
  • Define content words (all but top 50)
  • Goal model distribution of content words in
    sentence
  • Simplify model pairwise co-occurrences (content
    word pairs)
  • Collect contingency tables calculate measure of
    association for them

7
Q Correlation Measure
Derived from Co-occurrence Contingency Table
  • Q values range from 1 to 1

8
Density Estimates
  • We hypothesized
  • Trigram sentences wordpair correlation
    completely determined by distance
  • Natural sentences wordpair correlation
    independent of distance
  • kernel density estimation
  • distribution of Q values in each corpus
  • at varying distances

9
Q Distributions
---- Trigram Generated Broadcast News
Density
Q Value
10
Likelihood Ratio Feature
she is a country singer searching for fame and
fortune in nashville Q(country,nashville) 0.76
Distance 8 Pr (Q0.76d8,BNews) 0.32
Pr(Q0.76d8,Trigram) 0.11 Likelihood ratio
0.32/0.11 2.9
11
Simpler Features
  • Q Value based
  • Mean, median, min, max of Q values for content
    word pairs in the sentence (Cai et al 2000)
  • Percentage of Q values above a threshold
  • High/low correlations across large/small
    distances
  • Other
  • Word and phrase repetition
  • Percentage of stop words
  • Longest sequence of consecutive stop/content
    words

12
Datasets
  • LM and contingency tables (Q values) derived from
    103 million words of BN
  • From remainder of BN corpus and sentences sampled
    from trigram LM
  • Q value distributions estimated from 100,000
    sentences
  • Decision tree trained and test on 60,000
    sentences
  • Disregarded sentences with
  • Mike Stevens says its not real
  • Weve been hearing about it

13
Experiments
  • Learners
  • C5.0 decision tree
  • Boosting decision stumps with Adaboost.MH
  • Methodology
  • 5-fold cross validation on 60,000 sentences
  • Boosting for 300 rounds

14
Results
15
Shannon-Style Experiment
  • 50 sentences
  • ½ real and ½ trigram-generated
  • Stopwords replaced by dashes
  • 30 participants
  • Average accuracy of 73.77 6
  • Best individual accuracy 84
  • Our classifier
  • Accuracy of 78.9 0.42

16
Summary
  • Introduced a set of statistical features which
    capture aspects of semantic coherence
  • Trained a decision tree to classify with accuracy
    of 80
  • Next step incorporate features into exponential
    LM

17
Future Work
  • Combat data sparsity
  • Confidence intervals
  • Different correlation statistic
  • Stemming or clustering vocabulary
  • Evaluate derived features
  • Incorporate into an exponential language model
  • Evaluate the model on a practical application

18
Agreement among Participants
19
Expected Perplexity Reduction
  • Semantic coherence feature
  • 78 of broadcast news sentences
  • 18 of trigram-generated sentences
  • Kullback-Leibler divergence .814
  • Average perplexity reduction per word .0419
    (2.814/21) per sentence?
  • Features modify probability of entire sentence
  • Effect of feature on per-word probability is small

20
Distribution of Likelihood Ratio
Density
Likelihood Value
21
Discrimination Task
  • Natural Sentence
  • but it doesn't feel like a sacrifice in a sense
    that you're really saying this is you know i'm
    meant to do things the right way and you trust it
    and tell the truth
  • Trigram-Generated
  • they just kind of free trade agreements which
    have been living in a ziplock bag that you say
    that i see university japan's daiwa bank stocks
    step though

22
Q Values at Distance 1
---- Trigram Generated Broadcast News
Density
Q Value
23
Q Values at Distance 3
Density
Q Value
24
Outline
  • The problem of semantic (in)coherence
  • Incorporating this into the whole-sentence
    exponential LM
  • Finding better features for this model using
    machine learning
  • Semantic coherence features
  • Experiments and results
Write a Comment
User Comments (0)
About PowerShow.com