Summarizing Email Conversations with Clue Words - PowerPoint PPT Presentation

About This Presentation
Title:

Summarizing Email Conversations with Clue Words

Description:

Access emails from mobile devices. 3. Outline. Characteristics of email. Related work ... Characteristics of Emails. Conversation structure ... – PowerPoint PPT presentation

Number of Views:261
Avg rating:3.0/5.0
Slides: 29
Provided by: people90
Category:

less

Transcript and Presenter's Notes

Title: Summarizing Email Conversations with Clue Words


1
Summarizing Email Conversations with Clue Words
  • Giuseppe Carenini
  • Raymond T. Ng
  • Xiaodong Zhou
  • Department of Computer Science
  • Univ. of British Columbia

2
Motivations of Email Summarization
  • Email overloading
  • 4060 emails per day or even more
  • Personal information repository
  • Email summarization can be helpful
  • Two examples
  • Meeting
  • Access emails from mobile devices.

3
Outline
  • Characteristics of email
  • Related work
  • Our summarization approach
  • Experimental results
  • Conclusions and future work

4
Characteristics of Emails
m1
  • Conversation structure
  • Context related reply to the previous messages.
    (gt60)
  • Hidden email
  • A hidden email is an email quoted by at least one
    email in a folder but is not present itself in
    the same folder.
  • Writing style
  • Short length, informal writing, multiple authors,
    etc.

A B
m3
gt A gt B C
m2
E gt D gt gt A gt gt B
m4
gt C F gt gt A gt gt B G
5
Requirements for Email Summarization
  • Conversation structure
  • Context information is provided.
  • Information completeness
  • Include hidden emails as well as existing
    messages.
  • Informative summarization
  • Cover the core points of the email discussion.
  • Replacement of the original emails.

6
Outline
  • Characteristics of email
  • Related work
  • Our summarization approach
  • Result
  • Conclusions and future work

7
Related Work
  • Multi-Document Summarization (MDS)
  • Extractive MEAD, MMR-MD.
  • Abstractive/Generative MultiGen, SEA
  • Email summarization
  • Single email summarization(Muresan et al.)
  • Summarizing email threads by sentence selection
    (Rambow et al. and Wan et al.)

8
Related Work
MDS methods MDS methods MDS methods Email summarization Email summarization Email summarization Our method
MEAD MMR-MD MultiGen SEA Muresan et al. Rambow et al. Wan et al. Our method
Hidden Email Hidden Email x
Conv. Structure Thread x x x
Conv. Structure Quotation analysis x
informative summary Sentence selection x x x x x
informative summary Lang. gen. x x
9
Outline
  • Characteristics of email
  • Related work
  • Our summarization approach
  • Fragment quotation graph
  • ClueWordSummarizer (CWS)
  • Result
  • Conclusions and future work

10
Framework
  • Input a set of emails
  • Output email summaries
  • Process
  • Discover and represent email conversations as
    fragment quotation graphs
  • ClueWordSummarizer generates email summaries.

11
Conversation Structure - Fragment Quotation
Graph
  • Complications of email conversation
  • Header information
  • E.g., subject, in-reply-to, and references.
  • Not accurate enough.
  • Quotation
  • A good indication for email conversation(Yeh et
    al.).
  • Selective quotations reflect the conversation in
    detail.
  • Assumption quotation ? conversation
  • ? Build a fragment quotation graph? email
    conversation.

12
Fragment Quotation Graph
  • Create nodes
  • Compare quotations and new messages
  • a, b, c, d, e, f, g, h, i, j.
  • Create edges
  • Neighbouring quotations

13
Outline
  • Characteristics of email
  • Related work
  • Our summarization approach
  • Fragment quotation graph
  • ClueWordSummarizer (CWS)
  • Result
  • Conclusions and future work

14
ClueWordSummarizer
  • Clue words in the fragment quotation graph
  • A clue word in node (fragment) F is a word which
    also appears in a semantically similar form in a
    parent or a child node of F in the fragment
    quotation graph.
  • E.g.,

15
ClueWordSummarizer
  • Three types of clue words
  • Root/stem
  • settle vs. settlement
  • Synonym/antonym
  • war vs. peace
  • Loose semantic meaning
  • Friday vs. deadline

16
ClueWordSummarizer
  • ClueScore(CW)
  • A word CW is in a sentence S of a fragment F
  • ClueScore(discussed, a )1
  • ClueScore(settle, b ) 2

17
ClueWordSummarizer
  • For each conversation, rank all of the sentences
    based on their ClueScores.
  • Select the top-k sentences as the summary.

18
Outline
  • Characteristics of email
  • Related work
  • Our summarization approach
  • Result
  • User study
  • Empirical experiments
  • Conclusions and future work

19
Result 1 User Study
  • Objective
  • Gold standard
  • How human summarize email conversations
  • Setup
  • Dataset 20 conversations from Enron dataset
  • Human reviewers 25 grads/ugrads in UBC
  • Each sentence is evaluated by 5 different human
    reviewers.
  • Select important sentences and mark crucial
    important ones.
  • Gold standard
  • 4 selections and at least 2 are essentially
    important.
  • 88 gold sentences out of the 20 conversations
    (12).

20
Result 1 User Study
  • Information completeness
  • 18 gold sentences from hidden emails.
  • Hidden emails carry crucial information as well.
  • Significance of clue words
  • Clue words appears more frequently in the 88 gold
    sentences.
  • Average ratio of ClueScore in gold sentences
    ClueScore in non-gold sentences ? 3.9

21
Result 2 Empirical Experiments
  • RIPPER
  • A machine learning classifier ? In the summary or
    not.
  • 14 features(Rambow et al.) linguistic and email
    specific.
  • Sentence/conversation level training
  • 10-fold cross validation
  • CWS MEAD
  • The same summary length(2) as that of
    RIPPER.

22
Result 2 Empirical Experiments (CWS v.s
MEAD)
  • sumLen 15
  • CWS has a higher accuracy.
  • P-value
  • 0.077 (precision)
  • 0.049 (recall)
  • 0.053 (F-measure)

23
Result 2 Empirical Experiments (CWS v.s
MEAD)
  • CWS has a higher accuracy when sumLen lt 30.
  • MEAD is more accurate when sumLen 40 and
    higher.
  • Clue words are significant in important
    sentences.

24
Result 2 Empirical Experiments
(Fragment quotation graph)
25
Outline
  • Characteristics of email
  • Related work
  • Our conversation-based approach
  • Result
  • Conclusions and future work

26
Conclusions and Future Work
  • Conclusions
  • The conversation structure is important and
    should be paid more attention.
  • Fragment quotation graph
  • Clue Words and ClueWordSummarizer
  • Empirical evaluation
  • Clue words frequently appears in important
    sentences.
  • CWS is accurate.

27
Future Work
  • Refine the fragment quotation graph
  • User study on different dataset
  • Try other ML classifiers
  • Integrate CWS and other methods

28
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com