Title: CS411: Summary and Beyond
1CS411Summary and Beyond
2About the Final Exam
- Friday 12/17 130-430pm, 1320 DCL
- Closed book/note, only scratch paper allowed
- Coverage Accumulative, with emphasis on 2nd half
- I and TA will be here to help with questions
- Bring your UIUC ID
- Do not discuss exam questions/solutions on the
newsgroup
3Midterm Format
- One set of true/false questions
- One set of short answer questions
- Followed by several more questions
- The question formats are similar to the questions
we covered in the lectures and homeworks - 100 points
- I believe you should have enough time
4Suggested Method for Study
- Go over the lecture slides
- Read the textbook
- Try to work out solutions to problems on
hw/lectures before looking at the actual solution - Work on sample exams
- before looking at solutions
- Discuss with people in your group
5Summary What you have learned?
- The user perspective
- How to use RDBMS and build database
applications? - Demo 1a The ASAP team, Friends Forest
- Demo 1b The Initech, Integrate 2XS
-
6Summary What you have learned?
- The system perspective
- How does RDBMS work?
- Demo 2 Preston Brown Jeffery Votteler
PostgreSQL Plan Enumeration Visualizer -
7Beyond Data management in the information age
- Information abounds in our civilization and
inundates our daily life. - Beyond database management
- Data management issues everywhere
- Demo 3 My research projects
-
8Todays Search Engine--
- Only keyword matching-- Guess what your target
page will say - e.g., to find Kevin Changs email
- Only individual pages -- Search does not go
beyond one page - e.g., to find Kevin Changs most likely email
at UIUC - e.g., to find all CS profs email
- Only follow links-- Databases remain untapped
territory - e.g., to find all flights to San Francisco
- e.g., to find all jobs in Urbana-Champaign
9Getting Structure Data from the Web Integration
and Mining
- Getting structured data from --
- The deep Web
- semantic-rich, structured data hidden deeply
inside databases on the Web - need integration to access these databases
- The surface Web
- semantic-rich, structured data hidden
implicitly on the surface Web - need mining to find these relations
10Project 1 MetaQuerier Knocking the Door to the
Deep Web
11The previous Web things are just on the surface
12The current Web Getting deeper with
non-trivial access
13How to enable effective access to the deep Web?
Cars.com
Amazon.com
Biography.com
Apartments.com
411localte.com
401carfinder.com
14Amy is a new graduate, just moving to her new
career
- Finding sources
- Wants to upgrade her car Where can she study for
her options? (cars.com, edmunds.com) - Wants to buy a house Where can she look for
houses in her town? (realtor.com) - Wants to write a grant proposal. (NSF Award
Search) - Wants to check for patents. (uspto.gov)
- Querying sources
- Then, she needs to learn the grueling details of
querying
15MetaQuerier Exploring and integrating deep
Web
- Explorer
- source discovery
- source modeling
- source indexing
FIND sources
Amazon.com
Cars.com
db of dbs
- Integrator
- source selection
- schema integration
- query mediation
Apartments.com
QUERY sources
411localte.com
unified query interface
16Toward large scale integration
- We are facing very different large scale
scenarios! - Many sources on the Web, order of 105
- Such integration must be dynamic and ad-hoc
- Dynamic discovery
- Sources are dynamically changing
- On-the-fly integration
- Queries are ad-hoc and need different sources
17Our survey found SIGMOD-Record Sep04
- Challenge reassured
- 450,000 online databases
- 1,258,000 query interfaces
- 307,000 deep web sites
- 3-7 times increase in 4 years
- Insight revealed
- Web sources are not arbitrarily complex
- Amazon effect convergence and regularity
naturally emerge
18Demo.
19Project 2 WISDM Uncovering Structured Data on
the Surface Web
20The WISDM goal
WISDM Web Indexing and Search for Data Mining
The Web
21Relation discovery Weaving entities into
relations
email
phone
prof
WISDM-ER
ltprof, phone, emailgt
dewitt_at_cs.wisc.edu
608-263-5489
David DeWitt
R1
winslett_at_cs.uiuc.edu
333-3536
Marianne Winslett
Entity-Relation Discovery
ltprof, univ, researchgt
R2
univ
research
prof
U. Wisconsin
database systems
David DeWitt
Purdue U.
data mining
Chris Clifton
The Web
22Example applicationsRelation is the essence
of many info search
- CSContact By weaving R1 ltprof, phone, emailgt
- What is the phone and email of, say, David
DeWitt? - What are the email of all profs at Wisconsin?
- CSResearch By weaving R2 ltprof, univ,
researchgt - What is the research area of DeWitt?
- Who are database professors at various
universities? - Which area has the most faculty at Wisconsin?
23Example applications Structured data can be
further processed
- By joining R1 with R2
- What are the emails of the database professors at
Wisconsin? - By joining R2 with a university ranking
database - Which top-20 university has the most database
faculty?
24Current testbed A small corpus to peek the
potential
- Data pages 6 midwest CS departments
- Tagged entities prof, email, phone, univ,
research, state
25Demo.
26So, thats the end of CS411 And hopefully the
start of your DATA career
27Thank You!
For more information http//www-faculty.cs.uiuc
.edu/kcchang/ kcchang_at_cs.uiuc.edu