Sentence completion - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Sentence completion

Description:

The sentence completion problem is to identify the remaining part of the ... This problem setting can be generalized along several dimension: ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 17
Provided by: sil973
Category:

less

Transcript and Presenter's Notes

Title: Sentence completion


1
Sentence completion
  • Korinna Grabski Tobias Scheffer. Sentence
    completion.In the 27th Annual International ACM
    SIGIR Conference (SIGIR'2004), 2004.
  • Presenter Suhan Yu

2
Introduction
Dont propose Propose All queries
Q query (The first words of sentences)
Dont propose
No
Retrieval algorithm
gt threshold ?
similarity
Propose
S retrieved sentence
a specific document collection
Yes
Cluster algorithm
accepted
Unaccepted
3
Problem Setting
  • Problem setting
  • Given a domain specific document collection.
  • Given an initial document fragment.
  • The sentence completion problem is to identify
    the remaining part of the sentence that the user
    currently intends to write.
  • This problem setting can be generalized along
    several dimension
  • Consider additional attributes of the current
    communication process.
  • More natural to predict the uncertainty about the
    remaining words of the sentence increase.

4
evaluate
  • Evaluate the performance of sentence completion
    methods

5
Retrieval algorithm
length l
The number of sentences containing term t i
The jth sentence
6
Indexing algorithm
  • Inverted index structure over the data
  • Sorted
  • appears in the document collection more
    frequently than
  • Appear equally frequent and is
    alphabetically smaller than

????
??
?
??
?????
??????
????
????
7
Retrieval algorithm
  • Similarity between f (query) and s (sentence)

Between 01
query
8
Retrieval algorithm
9
Data compression by clustering
  • Run EM algorithm with
  • mixtures of two Gaussian
  • model recursively.
  • Each data element is assigned to the
  • cluster with higher likelihood.
  • Give a threshold, if a cluster falls below
  • a threshold, or the variance within the
    cluster falls below a variance threshold
  • The result of the clustering algorithm is a tree
    of clusters.

100
56
44
20
36
39
5
10
Data compression by clustering
fragment
100
56
44
36
39
20
24
15
15
21
15
16
11
Data compression by clustering
  • Characteristic sentences from the ten largest
    clusters.

12
Empirical studies
  • Use two collections
  • Collection A
  • Provided by an online education provider and
    contains emails that have been sent to students.
  • Collection B
  • Provided by a large online shop and contains
    emails that have been sent in reply to customer
    requests.
  • Two collection have same size, around 10000
    sentences.
  • Random split into a training set (75) and a test
    set (25).

13
Empirical studies
  • Collection A

14
Empirical studies
  • Collection B

15
Empirical studies
16
Conclusion
  • Comparing the retrieval to the clustering
    approach we can conclude that the retrieval
    method, on average, has higher precision and
    recall.
  • This paper investigate on methods that may
    predict some succeeding words, but not
    necessarily the complete remainder of the current
    sentence.
  • Your order proceeds as will be shipped on but
    it may not be possible to predict whether the
    final word is Monday or Tuesday.
Write a Comment
User Comments (0)
About PowerShow.com