Title: Summarizing Email Conversations with Clue Words
1Summarizing Email Conversations with Clue Words
- Giuseppe Carenini
- Raymond T. Ng
- Xiaodong Zhou
- Department of Computer Science
- Univ. of British Columbia
2Motivations of Email Summarization
- Email overloading
- 4060 emails per day or even more
- Personal information repository
- Email summarization can be helpful
- Two examples
- Meeting
- Access emails from mobile devices.
3Outline
- Characteristics of email
- Related work
- Our summarization approach
- Experimental results
- Conclusions and future work
4Characteristics of Emails
m1
- Conversation structure
- Context related reply to the previous messages.
(gt60) - Hidden email
- A hidden email is an email quoted by at least one
email in a folder but is not present itself in
the same folder. - Writing style
- Short length, informal writing, multiple authors,
etc.
A B
m3
gt A gt B C
m2
E gt D gt gt A gt gt B
m4
gt C F gt gt A gt gt B G
5Requirements for Email Summarization
- Conversation structure
- Context information is provided.
- Information completeness
- Include hidden emails as well as existing
messages. - Informative summarization
- Cover the core points of the email discussion.
- Replacement of the original emails.
6Outline
- Characteristics of email
- Related work
- Our summarization approach
- Result
- Conclusions and future work
7Related Work
- Multi-Document Summarization (MDS)
- Extractive MEAD, MMR-MD.
- Abstractive/Generative MultiGen, SEA
- Email summarization
- Single email summarization(Muresan et al.)
- Summarizing email threads by sentence selection
(Rambow et al. and Wan et al.)
8Related Work
MDS methods MDS methods MDS methods Email summarization Email summarization Email summarization Our method
MEAD MMR-MD MultiGen SEA Muresan et al. Rambow et al. Wan et al. Our method
Hidden Email Hidden Email x
Conv. Structure Thread x x x
Conv. Structure Quotation analysis x
informative summary Sentence selection x x x x x
informative summary Lang. gen. x x
9Outline
- Characteristics of email
- Related work
- Our summarization approach
- Fragment quotation graph
- ClueWordSummarizer (CWS)
- Result
- Conclusions and future work
10Framework
- Input a set of emails
- Output email summaries
- Process
- Discover and represent email conversations as
fragment quotation graphs - ClueWordSummarizer generates email summaries.
11Conversation Structure - Fragment Quotation
Graph
- Complications of email conversation
- Header information
- E.g., subject, in-reply-to, and references.
- Not accurate enough.
- Quotation
- A good indication for email conversation(Yeh et
al.). - Selective quotations reflect the conversation in
detail. - Assumption quotation ? conversation
- ? Build a fragment quotation graph? email
conversation.
12Fragment Quotation Graph
- Create nodes
- Compare quotations and new messages
- a, b, c, d, e, f, g, h, i, j.
- Create edges
- Neighbouring quotations
13Outline
- Characteristics of email
- Related work
- Our summarization approach
- Fragment quotation graph
- ClueWordSummarizer (CWS)
- Result
- Conclusions and future work
14ClueWordSummarizer
- Clue words in the fragment quotation graph
- A clue word in node (fragment) F is a word which
also appears in a semantically similar form in a
parent or a child node of F in the fragment
quotation graph. - E.g.,
15ClueWordSummarizer
- Three types of clue words
- Root/stem
- settle vs. settlement
- Synonym/antonym
- war vs. peace
- Loose semantic meaning
- Friday vs. deadline
16ClueWordSummarizer
- ClueScore(CW)
- A word CW is in a sentence S of a fragment F
- ClueScore(discussed, a )1
- ClueScore(settle, b ) 2
17ClueWordSummarizer
-
- For each conversation, rank all of the sentences
based on their ClueScores. - Select the top-k sentences as the summary.
18Outline
- Characteristics of email
- Related work
- Our summarization approach
- Result
- User study
- Empirical experiments
- Conclusions and future work
19Result 1 User Study
- Objective
- Gold standard
- How human summarize email conversations
- Setup
- Dataset 20 conversations from Enron dataset
- Human reviewers 25 grads/ugrads in UBC
- Each sentence is evaluated by 5 different human
reviewers. - Select important sentences and mark crucial
important ones. - Gold standard
- 4 selections and at least 2 are essentially
important. - 88 gold sentences out of the 20 conversations
(12).
20Result 1 User Study
- Information completeness
- 18 gold sentences from hidden emails.
- Hidden emails carry crucial information as well.
- Significance of clue words
- Clue words appears more frequently in the 88 gold
sentences. - Average ratio of ClueScore in gold sentences
ClueScore in non-gold sentences ? 3.9
21Result 2 Empirical Experiments
- RIPPER
- A machine learning classifier ? In the summary or
not. - 14 features(Rambow et al.) linguistic and email
specific. - Sentence/conversation level training
- 10-fold cross validation
- CWS MEAD
- The same summary length(2) as that of
RIPPER.
22Result 2 Empirical Experiments (CWS v.s
MEAD)
- sumLen 15
- CWS has a higher accuracy.
- P-value
- 0.077 (precision)
- 0.049 (recall)
- 0.053 (F-measure)
23Result 2 Empirical Experiments (CWS v.s
MEAD)
- CWS has a higher accuracy when sumLen lt 30.
- MEAD is more accurate when sumLen 40 and
higher. - Clue words are significant in important
sentences.
24Result 2 Empirical Experiments
(Fragment quotation graph)
25Outline
- Characteristics of email
- Related work
- Our conversation-based approach
- Result
- Conclusions and future work
26Conclusions and Future Work
- Conclusions
- The conversation structure is important and
should be paid more attention. - Fragment quotation graph
- Clue Words and ClueWordSummarizer
- Empirical evaluation
- Clue words frequently appears in important
sentences. - CWS is accurate.
27Future Work
- Refine the fragment quotation graph
- User study on different dataset
- Try other ML classifiers
- Integrate CWS and other methods
-
28Thank you!