Title: Effective Keyword Search in Relational Databases
1Effective Keyword Search in Relational Databases
- Fang Liu (University of Illinois at Chicago)
- Clement Yu (University of Illinois at Chicago)
- Weiyi Meng (Binghamton University)
- Abdur Chowdhury (America Online, Inc.)
2Effective Keyword Search in Relational Databases
- Introduction
- IR ranking in text databases
- Our ranking strategy in RDBs
- Experiments
- Conclusions and future work
SIGMOD 2006 Effective Keyword Search in
Relational Databases
3Introduction
- Why keyword search in relational databases?
- We want to search text data in relational
databases - SQL with the contains operator is not for
non-expert users - Keyword search is tremendous successful in text
database by ranking documents based on
similarity. It is for non-expert users
SIGMOD 2006 Effective Keyword Search in
Relational Databases
4Introduction
- Text data in relational databases
SIGMOD 2006 Effective Keyword Search in
Relational Databases
5Introduction
Suppose a user is looking for albums titled off
the wall
SIGMOD 2006 Effective Keyword Search in
Relational Databases
6Introduction
- Keyword search is very successful in text
database by ranking documents based on
similarity. Google, Yahoo and MSN search are the
examples.
So, lets do keyword search in relational
databases! (DBXplorer, BANKS, DISCOVER IR-style
DISCOVER, ObjectRank, Ranking Objects)
SIGMOD 2006 Effective Keyword Search in
Relational Databases
7Introduction
- Lets do it, but how?
- What are answers to be ranked?
- How should we rank these answers?
SIGMOD 2006 Effective Keyword Search in
Relational Databases
8Introduction -- an answer
An answer for a given query Q a tuple tree, in
which every leaf node must have at least one
keyword in Q.
SIGMOD 2006 Effective Keyword Search in
Relational Databases
9Introduction
- Use a slightly modified algorithm DISCOVER to
produce all answers for a given query.
SIGMOD 2006 Effective Keyword Search in
Relational Databases
10Introduction Ranking
- Our focus is on the effectiveness problem of
ranking answers the more relevant an answer is
to the user query, the higher it should be
ranked.
SIGMOD 2006 Effective Keyword Search in
Relational Databases
11Introduction Contributions
- We identify four new factors that are critical to
effective ranking and we propose a new ranking
strategy - Design and conduct comprehensive experiments for
the effectiveness problem - Experimental results show our strategy is
significantly better than existing works in
effectiveness
SIGMOD 2006 Effective Keyword Search in
Relational Databases
12Effective Keyword Search in Relational Databases
- Introduction
- IR ranking in text databases
- Our ranking strategy in RDBs
- Experiments
- Conclusions and future work
SIGMOD 2006 Effective Keyword Search in
Relational Databases
133.3 IR Ranking
tf2, ntf1.53tf10, ntf2.2 half idf
0.69, 1/100, idf4.6, 1/200,000, idf12,
s0.2 1 ndl1, half, ndl0.9, 1/10ndl 0.8, 2
ndl1.2, 10 ndl2.8
- Q(k1, k2, ..,kn), D is a document, Sim(Q,D) is
the ranking score of D.
SIGMOD 2006 Effective Keyword Search in
Relational Databases
14Effective Keyword Search in Relational Databases
- Introduction
- IR ranking in text databases
- Our ranking strategy in RDBs
- Experiments
- Conclusions and future work
SIGMOD 2006 Effective Keyword Search in
Relational Databases
15Our Ranking Strategy
- T(D1,D2,..Dn), so Sim(Q,D)?Sim(Q,T)
SIGMOD 2006 Effective Keyword Search in
Relational Databases
16Our Ranking Strategy
- T(D1,D2,..Dn), so Sim(Q,D)?Sim(Q,T)
SIGMOD 2006 Effective Keyword Search in
Relational Databases
17Our Ranking Strategy
- Tuple Tree Size Normalization
-
of tuples in a tuple tree T
SIGMOD 2006 Effective Keyword Search in
Relational Databases
18Our Ranking Strategy
- Document Length Normalization Reconsidered
SIGMOD 2006 Effective Keyword Search in
Relational Databases
19Our Ranking Strategy
- Document Frequency Normalization
SIGMOD 2006 Effective Keyword Search in
Relational Databases
20Our Ranking Strategy
- T(D1,D2,..Dn)
- maxWgt is the maximum weight(k, Di)
- sumWgt is the sum of weight(k, Di)
SIGMOD 2006 Effective Keyword Search in
Relational Databases
21Our Ranking Strategy
- T(D1,D2,..Dn), so Sim(Q,D)?Sim(Q,T)
SIGMOD 2006 Effective Keyword Search in
Relational Databases
22Our Ranking Strategy
- Schema Terms in Query
- lyrics for How come by D12
- lusher the singer's lyrics to burn
- Phrase-based Ranking
- Using position information to boast phrase
matching - Concept-based Ranking
- Can improve effectiveness
- Can assign semantics to answers
SIGMOD 2006 Effective Keyword Search in
Relational Databases
23Effective Keyword Search in Relational Databases
- Introduction
- IR ranking in text databases
- Our ranking strategy in RDBs
- Experiments
- Conclusions and future work
SIGMOD 2006 Effective Keyword Search in
Relational Databases
24Experiments data set
- A Lyrics Database
- 50 Queries from an AOL query log
- Relevance Judgment pooling logs
25Experiments some queries
- to me lyrics by lionel richie
- inner smile texas lyrics
- lionel richie lyrics
- lionel richie lyrics you mean more to me
- avril lavigne lyrics for the album under this
skin - avril lavigne lyrics
26Experiments measure
- Reciprocal rank measures how good the system is
to return the first relevant answer. - MAP (mean average precision) A precision is
computed after each relevant answer is retrieved.
Then we average all precision values to get a
single number to measure the overall
effectiveness.
27Experiments results
- Our ranking strategy the four new factors.
28Experiments results
- Comparison with related works
29Effective Keyword Search in Relational Databases
- Introduction
- IR ranking in text databases
- Our ranking strategy in RDBs
- Experiments
- Conclusions and future work
SIGMOD 2006 Effective Keyword Search in
Relational Databases
30Conclusions
- Effectiveness is as important as efficiency
- The four new factors are critical to search
effectiveness - Our strategy is significantly more effective than
related works
SIGMOD 2006 Effective Keyword Search in
Relational Databases
31Future Work
- Utilize link analysis
- Combine non-text columns
- Efficiency Problem
- More real world data sets
SIGMOD 2006 Effective Keyword Search in
Relational Databases
32Questions ?
SIGMOD 2006 Effective Keyword Search in
Relational Databases