Title: Optimization of Sequence Queries in Database Systems
1Optimization of Sequence Queries in Database
Systems
- Reza Sadri Carlo Zaniolo
- reza_at_cs.ucla.edu zaniolo_at_cs.ucla.edu
- sadri_at_procom.com
-
- Amir Zarkesh Jafar Adibi
- azarkesh_at_u4cast.com jabibi_at_u4cast.com
2Time series Analysis
- Many Applications
- Querying purchase patterns for marketing
- Stock market analysis
- Studying meteorological data
- Whats needed
- Expressive query language for finding complex
patterns in database sequences - Effcient and scalable implementation Query
Optimization
3State of The Art
- ADT (e.g.. Informix Datablades) Not flexible
enough, no Optimization - SEQ Enhanced ADTs (e. g. sets and sequences)
with their own query language - SRQL Adding sequence algebra operators to
relational model
4SQL-TS
- A query language for finding complex patterns in
sequences - Minimal extension of SQLonly the from clause
affected - A new Query optimization techniqe based on
extensions of the Knuth, Morris Pratt (KMP)
string-search algorithm
5SQL-TS Example
- Having a table quote (name, date, price) of stock
prices, find all the stocks that went up by 15
or more one day and went down by 20 or more the
next day and retrieve the change in the value of
the stock. - SELECT X.name, (Z.price -X.price) AS
price_change, X.date AS date - FROM quote
- CLUSTER BY name
- SEQUENCE BY date
- AS (X, Y, Z)
- WHERE Y.price gt 1.15 X.price
- AND Z.price lt 0.8 Y.price
6SQL-TS Example Data
sorted in each group
grouped but not sorted
7Optimized string searchKMP
Consider text array text and pattern array p i
1 2 3 4 5 6 7 8
9 10 11 texti a b a b a
b c a b c a j 1 2
3 4 5 6 patternj a b a b c
a
- After failing, use the information acquired so
to - - backtrack to shift(j), rather than i1,
and - - only check pattern values after next(j)
- But in SQL-TS we have general predicates star
patterns
8Equality predicates KMP
- Find companies whose closing stock price in
- three consecutive days was 10, 11, and 15.
- SELECT X.name
- FROM quote CLUSTER BY name
- SEQUENCE BY date AS (X, Y, Z)
- WHERE X.price 10 AND Y.price11
- AND Z.price15
9shift and next
- Success for first j-1 elements of pattern.
Failure for jth element (when input is at i) - Any shift less than shift(j) is guaranteed to
lead to failure, - Match elements in the pattern starting at next(j)
10Optimal Pattern Search (OPS)
Search path for naive algorithm vs optimized
algorithm
11Matrices q and j Input tested on pj is now
tested against pk
pj succeeded
pj failed
Combing values of these lower triangular matrices
( j ³ k), We derive the values of next(j) and
shift (j)
12STAR Patterns
- SELECT X.NEXT.date, X.NEXT.price,
- S.previous.date, S.previous.price
- FROM quote
- CLUSTER BY name,
- SEQUENCE BY date
- AS (X, Y, Z, T, U, V, S)
- WHERE
- X.name'IBM AND X.price gt X.previous.price
- AND 30 lt Y.price AND Y.price lt 40
- AND Z.price lt Z.previous.price AND T.price gt
T.previous.price - AND 35 lt U.price AND U.price lt 40
- AND V.price lt V.previous.price AND S.price lt
30
13Handling Star Patterns
- Same input, Transitions on Original Pattern vs.
Transitions on Pattern after the index set back
j-k - ?21
- ?
- ?31 ? ?32
- ? ?
- ?41 ? ?42 ? ?43
- ? ?
- Example Elements j and k are star predicates and
?jk is U - U ? ?j,k1
- ?
- ?j1,k ?j1,k1
14Possible Transitions from ?jk
15Implication Graph
16FSM Simulation used to Optimize Star Patterns
- Relaxed Double Bottom
- Only considering increases and decreases that are
more than 2
17Relaxed Double Bottom Ninty fold improvement
18Conclusions
- Disjunctive queries, partial ordered domains,
aggregates also treated in this approach - Old applicationsmore power flexibility than
Datablades ADTs of commercial DBMSs - Ongoing implementation, by building on the
user-defined aggregates supported in AXL.