Title: Pedro DeRose
1The DBLife Prototype System
inThe Cimple Project onCommunity Information
Management
- Pedro DeRose
- University of Wisconsin-Madison
2Community Information Management
- Numerous Web communities
- database researchers, movie fans, legal
professionals, bioinformatics, etc. - enterprise intranets, tech support groups
- Each community many data sources many
members - Members often want to integrate data, query, and
discover community information - any interesting connection between researchers X
and Y? - find all citations of this paper in the past one
week on the Web - what is new in the past 24 hours in the database
community? - what are current hot topics? who has moved where?
-
3Cimple Project _at_ Wisconsin/Yahoo! Research
Structured community portal, driven by
extraction integration mass collaboration
Keyword search SQL querying Question
answering Browse Mining Alert/Monitor News
summary
Jim Gray
Jim Gray
Researcher Homepages Conference Pages Group
Pages DBworld mailing list DBLP
Web pages
give-talk
SIGMOD-04
SIGMOD-04
Text documents
Personalize system, provide feedback
4The Research Team
- Core Members
- Pedro DeRose
- Warren Shen
- AnHai Doan
- Raghu Ramakrishnan
- Supporting Members
- Fei Chen
- Yoonkyong Lee
- Doug Burdick
- Mayssam Sayyadian
- Xiaoyong Chai
- Ting Chen
5Prototype System DBLife
- Integrate data of the DB research community
- Live at dblife-labs.cs.wisc.edu
- 1,075 data sources
- 463 researcher homepages
- 103 department homepages
- 54 conference homepages
- 99 faculty hubs
- 56 database group pages
- 203 project homepages
- 85 colloquia
- 11 event pages
- DBWorld
- DBLP
Crawled daily, 11000 pages 160 MB / day
6Information Extraction
7Data Integration
Raghu Ramakrishnan
co-authors A. Doan, Divesh Srivastava, ...
8Resulting ER Graph
9Provide Services
10Mass Collaboration An Example
11Summary
- Community Information Management
- increasingly crucial problem
- The Cimple project
- sample challenges information extraction
data integration
mass collaboration - extends the footprints of DB technologies to Web
data - develops new DB technologies
- DBLife prototype
- more at dblife.cs.wisc.edu, latest features
(e.g., wiki) at dblife-labs.cs.wisc.edu - research/education tool, community
service,benchmark, challenge problem