Title: Keyword Search
1Keyword Search
- Su Zhan
- suzhan_at_fudan.edu.cn
2Introduction
- What is keyword search
- Present works
- Effective keyword search in relational databases.
- BLINKS ranked keyword searches on graphs.
3Effective keyword search in relational databases.
- Fang Liu
- Clement T. Yu
- Weiyi Meng
- Abdur Chowdhury
4Outline
- Introduction
- Answer generation
- Background in IR ranking
- Novel ranking strategy for relational databases
- Experiment results
- Conclusion
5Introduction
6Introduction
- Suppose a user is looking for albums titled off
the wall and he/she cannot remember the exact
title. - Select from Album B
- Where Contains(B.title, off wall, 1) gt 0
- Order by score(1) desc
- Or
- off wall
7Introduction
- Query 1 off wall
- Query 2 lyrics how come by D12
- Query 3 album by D12 and Eminem
- Tuple Tree 1 b2
- Tuple Tree 2
- a1? ab1? b1? bs1? s1
- Tuple Tree 3
- a1? ab1? b1? ab2 ? a2
8Introduction
- 3 key steps for processing a given keyword query
- Generate all candidate answers, each of which is
a tuple tree by joining tuples from multiple
tables. - Then compute a single score for each answer. The
scores should be defined in such a way so that
the most relevant answers are ranked as high as
possible. - And finally return answers with semantics.
- This paper focuses on search effectiveness, that
is, step (2).
9Answer Generation
- Schema Graph
- Tuple Tree
- Keyword Query
- Answer
- Query Tuple Set RQ
- Free Tuple Set RF
- Answer Graph
10(No Transcript)
11Background in IR ranking
- Ranking Model in IR
- 11-point precision and recall
- Mean average precision
- Reciprocal rank
12Background in IR ranking
13Novel ranking strategy for relational Database
- Let T be a tuple tree and D1, D2, , Dm be all
text column values in T. We define each text
column value Di as a document and T as a
super-document. Then we can compute a similarity
value between the query Q and the super-document
T as shown in Formula 3 to rank tuple trees. - Our focus is on weight(k,T)
14Novel ranking strategy for relational Database
- Four Normalizations
- Tuple Tree Size Normalization
- Document Length Normalization
- Document Frequency Normalization
- Inter-Document Weight Normalization
15Novel ranking strategy for relational Database
- (jojo leave lyrics)
- b3 and s3 score higher than (b3, bs3, s3)!
- Tuple Tree Size Normalization
16Novel ranking strategy for relational Database
- (how come)
- Title and Lyrics score the same!
- Global average?
- Document Length Normalization
17Novel ranking strategy for relational Database
- idf
- Document Frequency Normalization
18Novel ranking strategy for relational Database
- A term tends to appear more frequently in a T
with a larger size. - Inter-Document Weight Normalization
19Novel ranking strategy for relational Database
- Schema terms in query
- Value terms
- Schema terms
- Schema-based document frequency
- Assign the largest document frequency value among
all terms to df - Assign 1 to tf
- What if k is both value term and schema term?
20Novel ranking strategy for relational Database
- Phrase-based Ranking
- Utilize phrase-based ranking to improve
effectiveness. - If a sub-query of Q, Pki,ki1,..kj, where iltj,
appears in a document D, and ki-1 does not appear
in an adjacent location to ki in this occurrence
of P in D, and kj1 does not appear in an
adjacent location to kj in this occurrence of P
in D, then we define it as an occurrence of the
phrase P in D.
21Novel ranking strategy for relational Database
22Novel ranking strategy for relational Database
- Suppose Q1, 2, 3, 4 and a document D in T is
.. 1, 2, 3 .. 2, 3, 4 .. 2, 3, 4 .. 1, 2 .. 1
... - 1, 2, 3 and 2, 3, 4 overlap
- Choose the phrase with the highest weight.
23(No Transcript)
24Novel ranking strategy for relational Database
- concept set(CQ)
- Phrase model
- a document that contains only some highly
weighted terms - Concept ranking model
- concept similarity value Sim(CQ, T).
- if (1) Sim(CQ, T1) gt sim(CQ,T2)
- or (2) Sim(CQ,T1)Sim(CQ,T2) and Sim(Q,T1) gt
Sim(Q,T2), then T1 is ranked higher than T2.
25Experiment results
26Experiment results
27Experiment results
28Experiment results
29Experiment results
30Conclusion
- Four normalizations
- tuple tree size normalization
- document length normalization,
- document frequency normalization
- inter-document weight normalization.
- The results show that
- all the four new normalization factors are
critical to search effectiveness - phrase-based search and concept-based search
improve effectiveness significantly - our strategy is significantly better than related
works and significantly outperforms Google.
31BLINKS Ranked Keyword Searches on Graphs
- Hao Hey
- Haixun Wang
- Jun Yang
- Philip S. Yu