Predicting Sentences using NGram Language Models - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Predicting Sentences using NGram Language Models

Description:

Steffen Bickel, Peter Haider and Tobias Scheffer. Steffen Bickel, ... Expansion of trellis: In each iteration keep only maximum for each state. Final sequence: ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 25
Provided by: bic98
Category:

less

Transcript and Presenter's Notes

Title: Predicting Sentences using NGram Language Models


1
Predicting Sentences using N-Gram Language Models
  • Steffen Bickel, Peter Haider and Tobias Scheffer

2
Overview
  • Motivation.
  • Problem Setting.
  • Prediction Algorithm.
  • Performance Measures.
  • Empirical Results.

3
Motivation
  • Text entry tasks dominate human-computer
    interaction.
  • Optimization of user interfaces for text entry is
    important research problem.
  • Prediction of sentences can reduce input time on
  • regular keyboards,
  • small devices (PDAs, cell phones),
  • special input devices users with motor
    disabilities.
  • ? We develop and evaluate a prediction algorithm
    for sentences.

4
Application example
Dear Mr. Miller,
our apologies for this delay! (press tab to
insert)
Please accept
5
Problem setting
  • Given
  • Training corpus,
  • Initial text fragment.
  • Predict
  • Remaining part of current sentence,
  • When confidence above threshold.

6
Prediction with N-gram language model
  • Goal
  • Factorization with chain rule
  • N-th order Markov assumption

N-gram probability
7
Prediction with N-gram language model
N-gram probability
  • Linear interpolation of N-gram models up to 5.
  • Open problems for prediction
  • How to find argmax decoding of most likely
    prediction?
  • Prediction of how many words?

8
How to find argmax decoding of most likely
prediction
  • Problem size of search space vocabulary
    sizeT
  • T is prediction length
  • Solution Viterbi decoding.

9
Viterbi decoding HMM brush up
  • Find state sequence with maximum probability.
  • Expansion of trellis
  • In each iteration keep only maximum for each
    state.
  • Final sequence
  • Trace back path of maximum probability sequences.

S1
S1
S1
S1
S1
S2
S2
B
S2
S2
S2
S3
S3
S3
S3
S3
10
Viterbi decoding HMM brush up
  • Find state sequence with maximum probability.
  • Expansion of trellis
  • In each iteration keep only maximum for each
    state.
  • Final sequence
  • Trace back path of maximum probability sequences.

S1
S1
S1
S1
S1
S2
S2
B
S2
S2
S2
S3
S3
S3
S3
S3
11
Viterbi decoding N-grams
  • Differences
  • State is described by sequence of (N-1) words,
  • not all transitions allowed,
  • initial sequence defines starting state.
  • Very simple example with N3 and vocabularya,b
  • Initial sequence is ab.

aa
aa
ba
ab
ab
ab
ba
ba
bb
bb
bb
12
Viterbi decoding N-grams
  • Differences
  • State is described by sequence of (N-1) words,
  • not all transitions allowed,
  • initial sequence defines starting state.
  • Very simple example with N3 and vocabularya,b
  • Initial sequence is ab.

abaab
abaa
aba
ab
aa
aa
aa
ba
ba
ab
ab
ab
ab
ab
ba
ba
bb
bb
bb
13
k-best Viterbi beam search
  • Expansion complexity still O(vocabulary sizeN)
  • Solution k-best Viterbi beam search.
  • Keep only k best predictions for next expansion.
  • Expansion complexity now O(k)
  • We use k20.

14
Prediction of how many words?
  • Stopping conditions for Viterbi expansion
    dependent on highest scored prediction
  • Last token is period,
  • or probability is below given threshold.

15
Algorithm for Prediction
  • Input
  • linear interpolated N-gram model,
  • initial sentence fragment, beam width k,
    threshold ?.
  • Viterbi initialization.
  • Do until period or max prob. below ?.
  • Viterbi expansion,
  • prune all but best k elements.
  • Collect words by path backtracking.
  • Return prediction.

16
Performance Measures
  • Prediction is correct when all words are correct.
  • Problem specific adaptation of precision and
    recall.
  • Ratio of characters the user accepts from all
    scanned characters.
  • Measures unnecessary distraction.
  • Fraction of saved keystrokes.

17
Reference Solution
  • Instance-based inference (Grabski and Scheffer,
    2003).
  • Select nearest neighbor within training
    collection to initial fragment.
  • Distance to nearest neighbor gives confidence.
  • Distance measure cosine similarity in TFIDF
    bag-of-words vector space.
  • Indexing structure for retrieval in sub-linear
    time.

18
Empirical Studies
  • Comparison N-gram and instance-based prediction.
  • Prediction behavior on different document
    collections.
  • 4 data collections

19
Evaluation Protocol
  • Split into training and test data.
  • Random sampling of 1000 sentences out of test
    collection,
  • random split into initial fragment and remainder
    to be predicted.
  • human evaluators judge whether they accept
    prediction,at different confidence levels.
  • Computation of precision/recall curve.

20
Precision/Recall Curves
  • N-gram method better precision/recall profile
    than instance-based method.

21
Precision/Recall Curves
  • N-gram method better precision/recall profile
    than instance-based method.

22
Example Predictions
23
Prediction Time
  • Dependent on prediction length

24
Conclusion
  • Optimization of human-computer interaction on
    natural language text entry tasks.
  • N-gram-based prediction algorithm.
  • New problem specific definition of
    precision/recall.
  • Benefit dependent on application setting
  • Service center 60 keystroke savings, 80
    acceptable suggestions.
  • Enron 2 keystroke savings, 50 acceptable
    suggestions.
Write a Comment
User Comments (0)
About PowerShow.com