Predicting Sentences using NGram Language Models - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Predicting Sentences using NGram Language Models

Description:

Steffen Bickel, Peter Haider and Tobias Scheffer. Steffen Bickel, ... Expansion of trellis: In each iteration keep only maximum for each state. Final sequence: ... – PowerPoint PPT presentation

Number of Views:249

Avg rating:3.0/5.0

Slides: 25

Provided by: bic98

Category:

more less

Transcript and Presenter's Notes

Title: Predicting Sentences using NGram Language Models

1
Predicting Sentences using N-Gram Language Models

Steffen Bickel, Peter Haider and Tobias Scheffer

2
Overview

Motivation.
Problem Setting.
Prediction Algorithm.
Performance Measures.
Empirical Results.

3
Motivation

Text entry tasks dominate human-computer
interaction.
Optimization of user interfaces for text entry is
important research problem.
Prediction of sentences can reduce input time on
regular keyboards,
small devices (PDAs, cell phones),
special input devices users with motor
disabilities.
? We develop and evaluate a prediction algorithm
for sentences.

4
Application example
Dear Mr. Miller,
our apologies for this delay! (press tab to
insert)
Please accept
5
Problem setting

Given
Training corpus,
Initial text fragment.
Predict
Remaining part of current sentence,
When confidence above threshold.

6
Prediction with N-gram language model

Goal
Factorization with chain rule
N-th order Markov assumption

N-gram probability
7
Prediction with N-gram language model
N-gram probability

Linear interpolation of N-gram models up to 5.
Open problems for prediction
How to find argmax decoding of most likely
prediction?
Prediction of how many words?

8
How to find argmax decoding of most likely
prediction

Problem size of search space vocabulary
sizeT
T is prediction length
Solution Viterbi decoding.

9
Viterbi decoding HMM brush up

Find state sequence with maximum probability.
Expansion of trellis
In each iteration keep only maximum for each
state.
Final sequence
Trace back path of maximum probability sequences.

S1
S1
S1
S1
S1
S2
S2
B
S2
S2
S2
S3
S3
S3
S3
S3
10
Viterbi decoding HMM brush up

Find state sequence with maximum probability.
Expansion of trellis
In each iteration keep only maximum for each
state.
Final sequence
Trace back path of maximum probability sequences.

S1
S1
S1
S1
S1
S2
S2
B
S2
S2
S2
S3
S3
S3
S3
S3
11
Viterbi decoding N-grams

Differences
State is described by sequence of (N-1) words,
not all transitions allowed,
initial sequence defines starting state.
Very simple example with N3 and vocabularya,b
Initial sequence is ab.

aa
aa
ba
ab
ab
ab
ba
ba
bb
bb
bb
12
Viterbi decoding N-grams

Differences
State is described by sequence of (N-1) words,
not all transitions allowed,
initial sequence defines starting state.
Very simple example with N3 and vocabularya,b
Initial sequence is ab.

abaab
abaa
aba
ab
aa
aa
aa
ba
ba
ab
ab
ab
ab
ab
ba
ba
bb
bb
bb
13
k-best Viterbi beam search

Expansion complexity still O(vocabulary sizeN)
Solution k-best Viterbi beam search.
Keep only k best predictions for next expansion.
Expansion complexity now O(k)
We use k20.

14
Prediction of how many words?

Stopping conditions for Viterbi expansion
dependent on highest scored prediction
Last token is period,
or probability is below given threshold.

15
Algorithm for Prediction

Input
linear interpolated N-gram model,
initial sentence fragment, beam width k,
threshold ?.
Viterbi initialization.
Do until period or max prob. below ?.
Viterbi expansion,
prune all but best k elements.
Collect words by path backtracking.
Return prediction.

16
Performance Measures

Prediction is correct when all words are correct.
Problem specific adaptation of precision and
recall.
Ratio of characters the user accepts from all
scanned characters.
Measures unnecessary distraction.
Fraction of saved keystrokes.

17
Reference Solution

Instance-based inference (Grabski and Scheffer,
2003).
Select nearest neighbor within training
collection to initial fragment.
Distance to nearest neighbor gives confidence.
Distance measure cosine similarity in TFIDF
bag-of-words vector space.
Indexing structure for retrieval in sub-linear
time.

18
Empirical Studies

Comparison N-gram and instance-based prediction.
Prediction behavior on different document
collections.
4 data collections

19
Evaluation Protocol

Split into training and test data.
Random sampling of 1000 sentences out of test
collection,
random split into initial fragment and remainder
to be predicted.
human evaluators judge whether they accept
prediction,at different confidence levels.
Computation of precision/recall curve.

20
Precision/Recall Curves

N-gram method better precision/recall profile
than instance-based method.

21
Precision/Recall Curves

N-gram method better precision/recall profile
than instance-based method.

22
Example Predictions
23
Prediction Time

Dependent on prediction length

24
Conclusion

Optimization of human-computer interaction on
natural language text entry tasks.
N-gram-based prediction algorithm.
New problem specific definition of
precision/recall.
Benefit dependent on application setting
Service center 60 keystroke savings, 80
acceptable suggestions.
Enron 2 keystroke savings, 50 acceptable
suggestions.