SEMEX - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

SEMEX

Description:

Xin (Luna) Dong, Alon Halevy, Ema Nemes, Stephan Sigurdsson, and Pedro Domingos ... machine learning, data mining and data management (see survey by Winkler et al. ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 31
Provided by: Sweet7
Category:
Tags: semex | winkler

less

Transcript and Presenter's Notes

Title: SEMEX


1
SEMEX Toward On-the-fly Personal Information
Integration
  • Xin (Luna) Dong, Alon Halevy, Ema Nemes, Stephan
    Sigurdsson, and Pedro Domingos
  • University of Washington

2
What is Personal Information Management (PIM)
Intranet Internet
3
Question 1 Which Bernstein paper did I cite in
my VLDB04 paper?
4
Question 2 Among the authors of VLDB04 papers,
whom have I emailed with?
5
Solution
  • Key Provide a logical view of the data
  • Objects and associations
  • Step 1. Integrate data from different applications

6
Solution
  • Key Provide a logical view of the data
  • Objects and associations
  • Step 2. Integrate personal and organizational data

7
Solution
  • Key Provide a logical view of the data
  • Objects and associations
  • On-the-fly data integration
  • Performed by non-technical users
  • Small-scale and short-lived
  • Should happen as a side-effect of daily jobs

8
Browse by Associations
9
Browse by Associations
A survey of approaches to automatic schema
matching Corpus-based schema
matching Database management for peer-to-peer
computing A vision Matching schemas by
learning from others
A survey of approaches to automatic schema
matching Corpus-based schema
matching Database management for peer-to-peer
computing A vision Matching schemas by
learning from others
Publication
Bernstein
10
Browse by Associations
Cited by
Publication
Publication
Citations
Bernstein
11
Reference Reconciliation
P. A. Bernstein P. Bernstein
12
Outline
  • Motivation
  • Personal Information Management in SEMEX
    (SEMantic EXplorer)
  • Reference Reconciliation
  • Related Work and Conclusions

13
Strategy
  • Create a database that consists of objects and
    associations between them
  • Database is created according to a domain model
  • Starting with a domain model for core concepts
  • Personalized by users
  • Objects and associations are automatically
    extracted from personal data

Homepage
Web Page
Cached
Person
Organizer, Participants
File
AuthorOf
Softcopy
Sender, Recipients
Event
Softcopy
Paper
Message
Presentation
Cites
Conference
PublishedIn
14
System Architecture
Domain Model
Data Repository
15
Outline
  • Motivation
  • Personal Information Management in SEMEX
    (SEMantic EXplorer)
  • Reference Reconciliation
  • Related Work and Conclusions

16
Reconcile Person References
17
Previous Work
  • Record linkage is an active topic in machine
    learning, data mining and data management (see
    survey by Winkler et al.)
  • Assume matching tuples from a single table
  • Traditional approach
  • Compare tuples pairwise
  • Generate transitive closures

18
Challenges in Reconciling Person References
  • Example references to Alon Halevy
  • Address book Alon Halevy, (123)456-7890
  • Email message alon_at_cs.washington.edu
  • Bibtex A. Levy
  • Challenges
  • References contain different sets of attributes
  • Each attribute may have multiple values
  • Each reference may contain very limited
    information
  • Labeled training data is not available

19
Our Solution
  • Gradually enrich references when they match with
    others
  • E.g. (Alon Levy, alon_at_cs, UW)
  • (Alon Halevy, alon_at_cs.washington.edu)
  • (Alon Halevy, Univ. of Washington)
  • (Alon Halevy, alon_at_cs.washington.edu,
    Univ. of Washington)

20
Apply Global Knowledge on Enriched References
  • Temporal comparison
  • (Alon Levy, alon_at_research.att.com)
  • (Alon Levy, alon_at_cs.washington.edu)
  • Search-engine analysis
  • Alon Y. Halevy
  • Alon Levy
  • http//www.cs.washington.edu/homes/alon/

21
Experimental Dataset
  • Personal data of one author over the past 7 years
  • 7085 files and 18037 messages
  • 5014 persons extracted
  • 2590 real-world persons involved

22
Experimental Results
23
Reference Reconciliation in General
  • Reconcile references by applying knowledge
    gleaned from associated objects
  • Reconcile references of multiple types of objects
    at one time
  • Reconcile references being aware of object
    evolution

Paper1 Paper2
Person1 Person3
Person2 Person4
Journal1 Journal2
Institution1 Institution2
24
Related Work
  • Personal Information Management Systems
  • LifeStreams Freeman and Gelernter, 1996
  • Stuff Ive Seen Dumais et al., 2003
  • Placeless Documents Dourish et al., 2000
  • Haystack Karger et al., 2003
  • MyLifeBits Gemmell et al., 2002

25
Conclusions
  • On-the-fly data integration is an important
    component of PIM
  • At the schema level Integrate data from
    different applications
  • Automatically construct a database of objects and
    associations
  • At the instance levelReconcile references to
    persons
  • Gradually enrich references when they match with
    others

26
Future Work
  • Improve reference reconciliation algorithm
  • Integrate personal data and organizational data
  • Personalize the domain model
  • Mine previous browsing experiences to personalize
    browsing and searching
  • Mine previous data integration experiences to
    facilitate data integration

27
SEMEX Toward On-the-fly Personal Information
Integration
  • _at_VLDB IIWEB 2004
  • Xin (Luna) Dong, Alon Halevy, Ema Nemes, Stephan
    Sigurdsson, and Pedro Domingos
  • University of Washington
  • www.cs.washington.edu/homes/lunadong

28
  • Backup

29
Reference Reconciliation in General
  • Reconcile references by enriching references
  • Reconcile references by applying context
    knowledge
  • Applying knowledge gleaned from other matching
    decisions
  • Applying knowledge gleaned from associated
    objects
  • Reconcile references of multiple types of objects
  • Reconcile references being aware of object
    evolution

30
Solution
  • Key Provide a logical view of the data
  • Objects and associations
  • Step 1. Integrate data from different
    applications
  • Step 2. Integrate personal and organizational
    data
  • On-the-fly data integration
  • Performed by non-technical users
  • Small-scale and short-lived
  • Should happen as a side-effect of daily jobs
Write a Comment
User Comments (0)
About PowerShow.com