Title: SEMEX
1SEMEX Toward On-the-fly Personal Information
Integration
- Xin (Luna) Dong, Alon Halevy, Ema Nemes, Stephan
Sigurdsson, and Pedro Domingos - University of Washington
2What is Personal Information Management (PIM)
Intranet Internet
3Question 1 Which Bernstein paper did I cite in
my VLDB04 paper?
4Question 2 Among the authors of VLDB04 papers,
whom have I emailed with?
5Solution
- Key Provide a logical view of the data
- Objects and associations
- Step 1. Integrate data from different applications
6Solution
- Key Provide a logical view of the data
- Objects and associations
- Step 2. Integrate personal and organizational data
7Solution
- Key Provide a logical view of the data
- Objects and associations
- On-the-fly data integration
- Performed by non-technical users
- Small-scale and short-lived
- Should happen as a side-effect of daily jobs
8Browse by Associations
9Browse by Associations
A survey of approaches to automatic schema
matching Corpus-based schema
matching Database management for peer-to-peer
computing A vision Matching schemas by
learning from others
A survey of approaches to automatic schema
matching Corpus-based schema
matching Database management for peer-to-peer
computing A vision Matching schemas by
learning from others
Publication
Bernstein
10Browse by Associations
Cited by
Publication
Publication
Citations
Bernstein
11Reference Reconciliation
P. A. Bernstein P. Bernstein
12Outline
- Motivation
- Personal Information Management in SEMEX
(SEMantic EXplorer) - Reference Reconciliation
- Related Work and Conclusions
13Strategy
- Create a database that consists of objects and
associations between them - Database is created according to a domain model
- Starting with a domain model for core concepts
- Personalized by users
- Objects and associations are automatically
extracted from personal data
Homepage
Web Page
Cached
Person
Organizer, Participants
File
AuthorOf
Softcopy
Sender, Recipients
Event
Softcopy
Paper
Message
Presentation
Cites
Conference
PublishedIn
14System Architecture
Domain Model
Data Repository
15Outline
- Motivation
- Personal Information Management in SEMEX
(SEMantic EXplorer) - Reference Reconciliation
- Related Work and Conclusions
16Reconcile Person References
17Previous Work
- Record linkage is an active topic in machine
learning, data mining and data management (see
survey by Winkler et al.) - Assume matching tuples from a single table
- Traditional approach
- Compare tuples pairwise
- Generate transitive closures
18Challenges in Reconciling Person References
- Example references to Alon Halevy
- Address book Alon Halevy, (123)456-7890
- Email message alon_at_cs.washington.edu
- Bibtex A. Levy
- Challenges
- References contain different sets of attributes
- Each attribute may have multiple values
- Each reference may contain very limited
information - Labeled training data is not available
19Our Solution
- Gradually enrich references when they match with
others - E.g. (Alon Levy, alon_at_cs, UW)
- (Alon Halevy, alon_at_cs.washington.edu)
- (Alon Halevy, Univ. of Washington)
- (Alon Halevy, alon_at_cs.washington.edu,
Univ. of Washington)
20Apply Global Knowledge on Enriched References
- Temporal comparison
- (Alon Levy, alon_at_research.att.com)
- (Alon Levy, alon_at_cs.washington.edu)
- Search-engine analysis
- Alon Y. Halevy
- Alon Levy
- http//www.cs.washington.edu/homes/alon/
21Experimental Dataset
- Personal data of one author over the past 7 years
- 7085 files and 18037 messages
- 5014 persons extracted
- 2590 real-world persons involved
22Experimental Results
23Reference Reconciliation in General
- Reconcile references by applying knowledge
gleaned from associated objects - Reconcile references of multiple types of objects
at one time - Reconcile references being aware of object
evolution
Paper1 Paper2
Person1 Person3
Person2 Person4
Journal1 Journal2
Institution1 Institution2
24Related Work
- Personal Information Management Systems
- LifeStreams Freeman and Gelernter, 1996
- Stuff Ive Seen Dumais et al., 2003
- Placeless Documents Dourish et al., 2000
- Haystack Karger et al., 2003
- MyLifeBits Gemmell et al., 2002
25Conclusions
- On-the-fly data integration is an important
component of PIM - At the schema level Integrate data from
different applications - Automatically construct a database of objects and
associations - At the instance levelReconcile references to
persons - Gradually enrich references when they match with
others
26Future Work
- Improve reference reconciliation algorithm
- Integrate personal data and organizational data
- Personalize the domain model
- Mine previous browsing experiences to personalize
browsing and searching - Mine previous data integration experiences to
facilitate data integration
27SEMEX Toward On-the-fly Personal Information
Integration
- _at_VLDB IIWEB 2004
- Xin (Luna) Dong, Alon Halevy, Ema Nemes, Stephan
Sigurdsson, and Pedro Domingos - University of Washington
- www.cs.washington.edu/homes/lunadong
28 29Reference Reconciliation in General
- Reconcile references by enriching references
- Reconcile references by applying context
knowledge - Applying knowledge gleaned from other matching
decisions - Applying knowledge gleaned from associated
objects - Reconcile references of multiple types of objects
- Reconcile references being aware of object
evolution
30Solution
- Key Provide a logical view of the data
- Objects and associations
- Step 1. Integrate data from different
applications - Step 2. Integrate personal and organizational
data - On-the-fly data integration
- Performed by non-technical users
- Small-scale and short-lived
- Should happen as a side-effect of daily jobs