CS411: Summary and Beyond - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

CS411: Summary and Beyond

Description:

( cars.com, edmunds.com) Wants to buy a house Where can she ... ( realtor.com) Wants to write a grant proposal. ( NSF Award Search) Wants to check for patents. ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 28
Provided by: ZhenZ7
Category:

less

Transcript and Presenter's Notes

Title: CS411: Summary and Beyond


1
CS411Summary and Beyond
  • Kevin C. Chang

2
About the Final Exam
  • Friday 12/17 130-430pm, 1320 DCL
  • Closed book/note, only scratch paper allowed
  • Coverage Accumulative, with emphasis on 2nd half
  • I and TA will be here to help with questions
  • Bring your UIUC ID
  • Do not discuss exam questions/solutions on the
    newsgroup

3
Midterm Format
  • One set of true/false questions
  • One set of short answer questions
  • Followed by several more questions
  • The question formats are similar to the questions
    we covered in the lectures and homeworks
  • 100 points
  • I believe you should have enough time

4
Suggested Method for Study
  • Go over the lecture slides
  • Read the textbook
  • Try to work out solutions to problems on
    hw/lectures before looking at the actual solution
  • Work on sample exams
  • before looking at solutions
  • Discuss with people in your group

5
Summary What you have learned?
  • The user perspective
  • How to use RDBMS and build database
    applications?
  • Demo 1a The ASAP team, Friends Forest
  • Demo 1b The Initech, Integrate 2XS

6
Summary What you have learned?
  • The system perspective
  • How does RDBMS work?
  • Demo 2 Preston Brown Jeffery Votteler
    PostgreSQL Plan Enumeration Visualizer

7
Beyond Data management in the information age
  • Information abounds in our civilization and
    inundates our daily life.
  • Beyond database management
  • Data management issues everywhere
  • Demo 3 My research projects

8
Todays Search Engine--
  • Only keyword matching-- Guess what your target
    page will say
  • e.g., to find Kevin Changs email
  • Only individual pages -- Search does not go
    beyond one page
  • e.g., to find Kevin Changs most likely email
    at UIUC
  • e.g., to find all CS profs email
  • Only follow links-- Databases remain untapped
    territory
  • e.g., to find all flights to San Francisco
  • e.g., to find all jobs in Urbana-Champaign

9
Getting Structure Data from the Web Integration
and Mining
  • Getting structured data from --
  • The deep Web
  • semantic-rich, structured data hidden deeply
    inside databases on the Web
  • need integration to access these databases
  • The surface Web
  • semantic-rich, structured data hidden
    implicitly on the surface Web
  • need mining to find these relations

10
Project 1 MetaQuerier Knocking the Door to the
Deep Web
11
The previous Web things are just on the surface
12
The current Web Getting deeper with
non-trivial access
13
How to enable effective access to the deep Web?
Cars.com
Amazon.com
Biography.com
Apartments.com
411localte.com
401carfinder.com
14
Amy is a new graduate, just moving to her new
career
  • Finding sources
  • Wants to upgrade her car Where can she study for
    her options? (cars.com, edmunds.com)
  • Wants to buy a house Where can she look for
    houses in her town? (realtor.com)
  • Wants to write a grant proposal. (NSF Award
    Search)
  • Wants to check for patents. (uspto.gov)
  • Querying sources
  • Then, she needs to learn the grueling details of
    querying

15
MetaQuerier Exploring and integrating deep
Web
  • Explorer
  • source discovery
  • source modeling
  • source indexing

FIND sources
Amazon.com
Cars.com
db of dbs
  • Integrator
  • source selection
  • schema integration
  • query mediation

Apartments.com
QUERY sources
411localte.com
unified query interface
16
Toward large scale integration
  • We are facing very different large scale
    scenarios!
  • Many sources on the Web, order of 105
  • Such integration must be dynamic and ad-hoc
  • Dynamic discovery
  • Sources are dynamically changing
  • On-the-fly integration
  • Queries are ad-hoc and need different sources

17
Our survey found SIGMOD-Record Sep04
  • Challenge reassured
  • 450,000 online databases
  • 1,258,000 query interfaces
  • 307,000 deep web sites
  • 3-7 times increase in 4 years
  • Insight revealed
  • Web sources are not arbitrarily complex
  • Amazon effect convergence and regularity
    naturally emerge

18
Demo.
19
Project 2 WISDM Uncovering Structured Data on
the Surface Web
20
The WISDM goal
WISDM Web Indexing and Search for Data Mining
The Web
21
Relation discovery Weaving entities into
relations
email
phone
prof
WISDM-ER

dewitt_at_cs.wisc.edu
608-263-5489
David DeWitt
R1
winslett_at_cs.uiuc.edu
333-3536
Marianne Winslett
Entity-Relation Discovery




R2
univ
research
prof
U. Wisconsin
database systems
David DeWitt
Purdue U.
data mining
Chris Clifton



The Web
22
Example applicationsRelation is the essence
of many info search
  • CSContact By weaving R1
  • What is the phone and email of, say, David
    DeWitt?
  • What are the email of all profs at Wisconsin?
  • CSResearch By weaving R2 research
  • What is the research area of DeWitt?
  • Who are database professors at various
    universities?
  • Which area has the most faculty at Wisconsin?

23
Example applications Structured data can be
further processed
  • By joining R1 with R2
  • What are the emails of the database professors at
    Wisconsin?
  • By joining R2 with a university ranking
    database
  • Which top-20 university has the most database
    faculty?

24
Current testbed A small corpus to peek the
potential
  • Data pages 6 midwest CS departments
  • Tagged entities prof, email, phone, univ,
    research, state

25
Demo.
26
So, thats the end of CS411 And hopefully the
start of your DATA career
27
Thank You!
For more information http//www-faculty.cs.uiuc
.edu/kcchang/ kcchang_at_cs.uiuc.edu
Write a Comment
User Comments (0)
About PowerShow.com