Title: Boolean Ranking: Querying a Database by K-Constrained Optimization
1Boolean Ranking Querying a Database by
K-Constrained Optimization
- Zhen Zhang
- Seung-won Hwang
- Kevin C. Chang
- Min Wang
- Christian A. Lang
- Yuan-chi Chang
- Presented ACM SIGMOD Conference (SIGMOD 2006),
Chicago, June 2006
Presented By Pavan Kumar M.K. (1000618890)
Aditya Mangipudi (1000649172)
2Outline
- Introduction
- Motivation
- A Search Algorithm
- A-Driven State Space Construction
- Optimization Driven Configuration
- OPT Search Algorithm
- Experiments
- Conclusion
3Motivation
- The wide spread of databases for managing
structured data, compounded with the expanded
reach of the Internet, has brought forward
interesting data retrieval and analysis scenarios
to RDBMS - Only the Top-K results are of interest to the
user.
4K-Constrained Optimization Query
QUERY Select the Top-5 2nd year students in CSE
with highest GPA
Boolean query dept CSE and year 2
Qualifying constraint
Find top answers
B dept CSE and year 2
O GPA
Ranking query Top 5 ranked by GPA
Quantifying function
5K-Constrained Optimization Query
- Query Q (G, k)
- G - Goal Function
- G B . O
- k Retrieval Size
-
6What is the query evaluation mechanism?
Ranking query
Boolean query
How to answer?
7Current techniques lack of global search mechanism
- If evaluated as separate operators
- If search by an overall goal function G as a
ranking function
Boolean query B
Ranking query R
Boolean query B
Ranking query R
- Current techniques optimize only
condition-by-condition
8Threshold Algorithm
Att 1 Att 2
9Assumptions
- Threshold Algorithm essentially relies on a rigid
assumption that G functions are Monotonic. - The monotonicity requires G to be decreasing if
all its parameters are decreasing.
10Non-Monotonic Functions
- Consider the example query as below to find
houses in a certain price range with good
price/sqrft ratio - The function G here in Non-Monotonic.
Select h.address from House h, Where h.price
200k ? h.price 400k Order by h.size/h.price-300
k
11New Algorithm
Att 1 Att 2
12Need for encoding as a search problem
- Existing algorithms build upon their
problem-specific assumptions on the goal
functions or index traversals. - For example, Threshold Algorithm assumes the
monotonicity of G and the use of sorted accesses
(interleaf navigation), based on which the search
is implicitly hardwired. - In a Boolean Query like B price gt 100K, such a
search is straightforward as the constraint
expressions B explicitly suggests how to carry
out a focused search, eg., visiting only the
nodes with locality potentially satisfying B.
13Need for encoding as a search problem
- In contrast, for a general k-constrained
optimization query potentially involving
arbitrary ranking combined with Boolean
conditions and joining multiple relations, eg.. Q
maximizing size/price ratio, it is no longer
clear how to focus the search. - By encoding into a generic search with no
assumptions on G, the search is generalized to
support arbitrary G over potentially multiple
indices and a combination of both hierarchical
and interleaf traversals.
14A Algorithm
- A is a well known search algorithm that finds
the Shortest Path, given an initial and a
designated goal state. - Widely used in the field of Artificial
Intelligence. - Uses Best-First Search Traversal.
- Uses heuristic information to carry out the
search in a guided manner. - A is guaranteed to find the correct answer
(Correctness) by visiting the least number of
states (Optimality) - Ex GPS, Google Maps, A lot of puzzles, games etc.
15Goal Function
- For a tuple t with m attribute values, Goal
Function G(t) maps the tuple to a positive
numeric score.
R(t) if B(t) is true 0 if B(t) is false
G(t) B(t)R(t)
(ie, lowest score)
16Query Model
Addr Price Size
1. Oak park, Chicago 600K 4500
2. Mattis, Champaign 350K 2000
3. 150K 1000
4. 250K 2000
5. 300K 3500
6. 80K 500
Score
15
0
6.67
0
0
2.27
Select h.address from House h, Where h.price
200k ? h.price 400k Order by h.size/h.price-300
k
17Landscape of Score Function - G
Addr Price Size
1. Oak park, Chicago 600K 4500
2. Mattis, Champaign 350K 2000
3. 150K 1000
4. 250K 2000
5. 300K 3500
6. 80K 500
Score
15
0
6.67
0
0
2.27
18OPT Framework
- To realize k-constrained optimization over
databases, this paper develops the OPT
framework. - Objective To Optimize G with the help of indices
as access methods over tuples in D. - Discrete State Search From the view of using
indices, we are to search the maximizing tuples
on the index nodes as discrete states. - Continuous Function Optimization From the view
of maximizing goal functions, we are to optimize
G.
19Evaluate query as its nature suggests!
Function optimization of G
Optimize G over D
Discrete state search over D
20B Tree Structure
Indices
Value Space
21Some definitions first..
- States States in a search graph represent
localities of values at different granularity
from coarse to fine, and eventually reach tuples
in the database. - Region State
- Tuple State
- Transitions While states of space give
locations in the map, transitions further
capture possible paths followed to reach our
destination of query answers. - Example for two states u and v, there is a
transition (u, v) if v ? Next(u)
22We view compound index as discrete space
Price (k)
600
1
350
2
5
250
4
3
100
6
size
3000
1500
4000
4500
23We view compound index as discrete space
Price (k)
Mij (ai, bj)
b1
250-600
0-250
600
b3
b2
M11
1
350
100-250
0-100
350-600
250-350
M32
M23
M33
b6
2
b7
M22
5
250
5
2
1
4
3
100
M76
M66
M77
M55
M56
M75
6
size
3000
1500
4000
4500
1
5
4
2
a1
M67
3000-4500
0-3000
a3
a2
1500-3000
0-1500
4000-6000
3000-4000
a6
a7
5
1
24We view compound index as discrete space
conceptually, combined space
Price (k)
Mij (ai, bj)
b1
250-600
0-250
600
b3
b2
M11
1
350
100-250
0-100
350-600
250-350
b6
2
b7
5
250
5
2
1
4
3
100
M66
M77
M67
M76
M55
M56
M75
6
size
3000
1500
4000
4500
4
1
5
2
a1
3000-4500
0-3000
a3
a2
1500-3000
0-1500
4000-6000
3000-4000
a6
a7
5
1
25- Challenge 1 What is the search mechanism?
26Encoding the problem into shortest path is
challenging
K-constrained optimization
Find a tuple with maximal score
A Shortest path
Find a path with minimal distance
gt A Gives Shortest Path to testable goal. gt
The goal is to find optimal tuple states with
maximal G-Score.
27Transformation needed.
- How to encode a tuple to a path?
- Adding a virtual target t only reachable through
tuples - How to encode maximal tuple with minimal path?
- Quality of path depends solely on the tuple it
passes by - For tuple state t
- D(t, t) - G(t)
- For two states r, u
- D(r, u) 0
M11
0
0
M22
M32
M23
M33
0
0
M66
M67
M76
M77
M75
M56
M55
0
0
1
5
4
2
- G(1)
- G(4)
t
28- Challenge 2 How to guide the search?
29Functional Optimization perspective
- Function optimization measures quality of states
- Function optimization aspects
- Defines Proper Heuristics
- Identifies a set of initial states to start
search.
30Structure of Procedure OPT
- Input G(x1,,xm) and domain of values dom xi
e xi1,xi2 - Output ltO,Ugt OPT(G,dom)
- where Ogives local optima
- UUpper Bound Score
- OPTPOINT gives O Component of OPT
- OPTMAX gives U Component of OPT
- Approaches
- Analytical Method
- Seach based (ExHill
- Climbing)
- Template Based
31States and Transitions
High Medium Low
Figure illustrates different states have
different promises. Search should favor the
choice of M77 over M67 because its more promising.
321. Define admissible heuristics Measure tightest
upper bound
- To guarantee completeness
- A requires admissible heuristics, i.e., estimate
optimistically - To ensure admissible heuristics
- Function optimization gives tightest upper bound
- Analytical approaches
- Numeric analysis package
H(region) OPTMAX(G, region) i.e., maximal value
of G in the region
33Consider Example
600
1
M77
M67
350
2
5
250
4
3
100
6
3000
1500
4000
4500
- h(M67) gives U0
- However if we follow the link from M67 to M77, we
can reach Tuple 1 with score 15.
342. Configure descending space disconnect uphills
- To guarantee optimality
- A requires descending heuristics
- To ensure descending heuristics
- Remove uphill links
M11
M66
M77
M67
M76
M55
M75
M56
4
1
5
2
35Find right start point Start from local optima
- To guarantee correctness
- Every tuple state must be reachable from start
states - Taking only downhills requires start with high
points - To ensure reachability
- Initial states should contain all local optima
M11
M66
M55
M75
M56
M77
M67
M76
4
1
2
5
36Putting together Executing A on the
configured space
top-down
M11
M22
M32
M23
M33
M67
M76
M57
M66
M77
M55
M75
M56
4
1
5
2
- Search is implemented as priority queue driven
traversal
37Need of States and Transitions
- Example . Given a set of states constructed from
the set of index graph I, the search, in
principle, should follow those transitions to
look for the tuple states maximizing the goal
function.. The search may follow the path - M11 ? M33 ? M77 ? 1 ? Top-down search
- M57 ? M77 ? 1 ? Bottom-Up Search
38OPT Search Algorithm
M11
M66
M55
M75
M56
M77
M67
M76
4
1
2
5
39Optimality of OPT
- OPT may result in different costs if started at
different initial states. - Top down-gt More hops Bottom up-gtLess hops
- Preference goes to Bottom Up but what if
- Goal functions G1/(X-Y)21, any value
satisfying - XY maximizes the function.
40Experiments
- Comparison vs.
- Boolean then ranking
- Ranking then boolean
- Metrics node accessed Nl Nt
- Settings
- Benchmark queries over real dataset
- Controlled queries over synthetic dataset
41Benchmark queries
- Datasets
- 19,706 real estate listing crawled online
- Queries
- Q1 size bedrms/ price-450k
40kltpricelt50k - Q2 size ebedrms / price-350k
pricelt400ksizegt4000 - Q3 size/price bedrms3 ? bedrms4
Q1
Q2
Q3
42Controlled queries
- Datasets
- Three randomly generated datasets of 100k points
- Uniform, gaussian, logvariatenormal
- Queries
- Linear average queries (eg, 0.4a 0.6b)
- Nearest neighbor queries (eg, (x-3)2 (y-4)2)
- Join queries (0.4R.a 0.6S.b R.cR.d)
43Conclusion
- Problem
- Study K-constrained optimization queries as
boolean ranking - Abstraction
- Encode K-constrained optimization into shortest
path problem - Framework
- Develop OPT to process K-constrained optimization
44- References
- Boolean Ranking Querying a Database by
K-Constrained Optimization. Z. Zhang, S. Hwang,
K. C.-C. Chang, M. Wang, C. Lang, and Y. Chang.
In Proceedings of the 2006 ACM SIGMOD Conference
(SIGMOD 2006), pages 359-370, Chicago, June 2006 - www.wikipedia.org
45Thank you!