Title: Email Mining: Extracting Collaborative Activities from EMail
1E-mail MiningExtracting Collaborative
Activities from E-Mail
- Akiko Murakami
- Koichi Takeda
2Contents
- Overview of our Text mining work
- Text Mining for individual text
- Text Mining for discussion text
- Text Mining for e-mail
- Discussion on E-mail mining
- Pair-mail
- Three levels of e-mail mining targets
- Preliminary study of e-mail mining
3Text Mining
- Text mining has become one of the most
influential natural language processing research. - Text mining is extended to various domain
- CRM (Customer Relationship Management)
- Biomedical domain
- Web pages
- Discussion records
- Patent
4Text Mining for Individual Text
Mining target individual text Mining unit
gttexts gtcategory labeled items extracted
from text using NLP
IBM TAKMI (Nasukawa, Nagano,1999)
Original Data
Meta Data
Category
Item
Visualization Interactive Mining
Category Dictionary
Structured Data
Call Taker James Date 2002/08/30 Duration
10 min. CustomerID ADC00123 Noun
Customer Software BIOS Subj...Verb
customer system..stop SW..Problem BIOS..need
Call Taker James Date Aug. 30, 2002 Duration
10 min. CustomerID ADC00123
Synonym Dictionary
Mining
Linguistic Analysis
Q cust sys has stopped working. A checked cust
bios and it need updated.
Unstructured Data
- Tagging
- Dependency Analysis
- Named Entity Extraction
- Intention Analysis
5TAKMI Client GUI
Mining History
Distribution Analysis View
Document List
Other Mining Views
6Text Mining for Discussion Records
Mining target discussion records Mining
unit gtsummarized texts based on thread structure
gtmail graph structures
Discussion Mining (Murakami, Nagao,2001)
Linguistic Annotation
Mail A
Quotation from Mail A
Comment on the quotation
Mail B
Thread Summary
Quotation from Mail B
Comment on the quotation
Mail C
7Discussion Mining
8Text Mining for E-mail
- Private E-mail Data
- Various structured data as mail messages
- Sender(From), Receiver(To,cc.,bcc.), Time Stamp,
Mail unique ID, Referential ID, etc. - Independent and relational documents are mixed in
e-mail data. - F.Y.I., invitation, CFP etc.
- Mailing List, inquiry, request etc.
9Properties of e-mail messages
Discussion, BBS,,,
Public
Discussion Mining
Spam
Schedule
Mailing List
F.Y.I
Relative
Private Mail with c.c.
Independent
Paper, Report,,,
Text Mining
Private Mail without c.c.
memo
E-mail Mining
Private
Discourse
10E-mail mining
- Not suitable for annotation
- Need to consider scalability
- Shorter threads than discussion records.
AND
- Lack of information like discourse structure
- participants are small than discussion
New concept of the E-mail mining target is
required.
11Pair-mail
- Pair-mail is formed by reference link, reply-to
information. - Each reply-to link forms a pair-mail.
- It contains reference type information based on
previous/next mail contents - Question/Answer, Imperative/Action,
Action/Regards... etc
12Mining Target -mining units-
- Three levels of mining target in mail data
- 1st level e-mail
- an individual e-mail as a single substance
- 2nd level pair-mail
- a pair of e-mail linked by reply-to relations.
- 3rd level thread
- a chain of e-mail messages (threads)
High
Scalability
Low
13Preliminary study
14Examples of mail mining
- Mail data for one month (May, 2003)
- Business related mails
- discussion with co-author of my paper
- meeting invitations
- mail magazines and mailing list messages are
received in another account - Including my sending messages
- Volume 380mail messages
- (19 mail messages / a working day)
15Thread Properties
- Extracting thread structure based on the header
information (Reference ID). - Average length of threads
- 1.60 mail message(238 threads).
- but, most of mail message are individual type
- Average length without individual mail is 3.09
mail messages(68 threads). - Most threads are shorter than 3 messages
- Long thread (over 4 messages) is only 16
- The average of participant number of long thread
(more than 4 messages) is 3.5.
16Changes in numbers of thread participants
Consider the pair mail properties (ex. the shift
of the number of participants), it helps to
extract the relevant information.
17Pair-mail Extraction
- Extracted pair-mail contains some expression in
second mail - ex. gratitude expression such as Thank you.
- These pair-mails contain some relation to the
expression - in the example, gratitude expressions is a
result of some action in the previous mail
Action
thank you...
18Result of pair-mail extraction
Extracted 106 pair-mail
- Most of the expressions are found in previous
mail as attachment - data cleansing are required - In the rest of results, we can find the action
described in previous mail. About 40 is ones
gratitude for actions described in mail (8 is
for information) and 10 is for real world
action. - 5 is platitudinous expression.
19Summary
- Text Mining for e-mail
- Text Mining for individual and relational text
- Introduce the new mining unit
- Three levels of e-mail mining targets
- single mail. pair-mail. thread
- Preliminary study of e-mail mining
- Pair-mail information is important in threads.
- Needs data cleansing.
- Remove signature, attachment,,,