Email Mining: Extracting Collaborative Activities from EMail - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Email Mining: Extracting Collaborative Activities from EMail

Description:

Text mining has become one of the most influential natural language ... Call Taker: James. Date: Aug. 30, 2002. Duration: 10 min. CustomerID: ADC00123 ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 20
Provided by: akikomu
Category:

less

Transcript and Presenter's Notes

Title: Email Mining: Extracting Collaborative Activities from EMail


1
E-mail MiningExtracting Collaborative
Activities from E-Mail
  • Akiko Murakami
  • Koichi Takeda

2
Contents
  • Overview of our Text mining work
  • Text Mining for individual text
  • Text Mining for discussion text
  • Text Mining for e-mail
  • Discussion on E-mail mining
  • Pair-mail
  • Three levels of e-mail mining targets
  • Preliminary study of e-mail mining

3
Text Mining
  • Text mining has become one of the most
    influential natural language processing research.
  • Text mining is extended to various domain
  • CRM (Customer Relationship Management)
  • Biomedical domain
  • Web pages
  • Discussion records
  • Patent

4
Text Mining for Individual Text
Mining target individual text Mining unit
gttexts gtcategory labeled items extracted
from text using NLP
IBM TAKMI (Nasukawa, Nagano,1999)
Original Data
Meta Data
Category
Item
Visualization Interactive Mining
Category Dictionary
Structured Data
Call Taker James Date 2002/08/30 Duration
10 min. CustomerID ADC00123 Noun
Customer Software BIOS Subj...Verb
customer system..stop SW..Problem BIOS..need
Call Taker James Date Aug. 30, 2002 Duration
10 min. CustomerID ADC00123
Synonym Dictionary
Mining
Linguistic Analysis
Q cust sys has stopped working. A checked cust
bios and it need updated.
Unstructured Data
  • Tagging
  • Dependency Analysis
  • Named Entity Extraction
  • Intention Analysis

5
TAKMI Client GUI
Mining History
Distribution Analysis View
Document List
Other Mining Views
6
Text Mining for Discussion Records
Mining target discussion records Mining
unit gtsummarized texts based on thread structure
gtmail graph structures
Discussion Mining (Murakami, Nagao,2001)
Linguistic Annotation
Mail A
Quotation from Mail A
Comment on the quotation
Mail B
Thread Summary
Quotation from Mail B
Comment on the quotation
Mail C
7
Discussion Mining
8
Text Mining for E-mail
  • Private E-mail Data
  • Various structured data as mail messages
  • Sender(From), Receiver(To,cc.,bcc.), Time Stamp,
    Mail unique ID, Referential ID, etc.
  • Independent and relational documents are mixed in
    e-mail data.
  • F.Y.I., invitation, CFP etc.
  • Mailing List, inquiry, request etc.

9
Properties of e-mail messages
Discussion, BBS,,,
Public
Discussion Mining
Spam
Schedule
Mailing List
F.Y.I
Relative
Private Mail with c.c.
Independent
Paper, Report,,,
Text Mining
Private Mail without c.c.
memo
E-mail Mining
Private
Discourse
10
E-mail mining
  • Not suitable for annotation
  • Need to consider scalability
  • Shorter threads than discussion records.

AND
  • Lack of information like discourse structure
  • participants are small than discussion

New concept of the E-mail mining target is
required.
11
Pair-mail
  • Pair-mail is formed by reference link, reply-to
    information.
  • Each reply-to link forms a pair-mail.
  • It contains reference type information based on
    previous/next mail contents
  • Question/Answer, Imperative/Action,
    Action/Regards... etc

12
Mining Target -mining units-
  • Three levels of mining target in mail data
  • 1st level e-mail
  • an individual e-mail as a single substance
  • 2nd level pair-mail
  • a pair of e-mail linked by reply-to relations.
  • 3rd level thread
  • a chain of e-mail messages (threads)

High
Scalability
Low
13
Preliminary study
14
Examples of mail mining
  • Mail data for one month (May, 2003)
  • Business related mails
  • discussion with co-author of my paper
  • meeting invitations
  • mail magazines and mailing list messages are
    received in another account
  • Including my sending messages
  • Volume 380mail messages
  • (19 mail messages / a working day)

15
Thread Properties
  • Extracting thread structure based on the header
    information (Reference ID).
  • Average length of threads
  • 1.60 mail message(238 threads).
  • but, most of mail message are individual type
  • Average length without individual mail is 3.09
    mail messages(68 threads).
  • Most threads are shorter than 3 messages
  • Long thread (over 4 messages) is only 16
  • The average of participant number of long thread
    (more than 4 messages) is 3.5.

16
Changes in numbers of thread participants
Consider the pair mail properties (ex. the shift
of the number of participants), it helps to
extract the relevant information.
17
Pair-mail Extraction
  • Extracted pair-mail contains some expression in
    second mail
  • ex. gratitude expression such as Thank you.
  • These pair-mails contain some relation to the
    expression
  • in the example, gratitude expressions is a
    result of some action in the previous mail

Action
thank you...
18
Result of pair-mail extraction
Extracted 106 pair-mail
  • Most of the expressions are found in previous
    mail as attachment - data cleansing are required
  • In the rest of results, we can find the action
    described in previous mail. About 40 is ones
    gratitude for actions described in mail (8 is
    for information) and 10 is for real world
    action.
  • 5 is platitudinous expression.

19
Summary
  • Text Mining for e-mail
  • Text Mining for individual and relational text
  • Introduce the new mining unit
  • Three levels of e-mail mining targets
  • single mail. pair-mail. thread
  • Preliminary study of e-mail mining
  • Pair-mail information is important in threads.
  • Needs data cleansing.
  • Remove signature, attachment,,,
Write a Comment
User Comments (0)
About PowerShow.com