Optimization of Sequence Queries in Database Systems - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Optimization of Sequence Queries in Database Systems

Description:

... have general predicates & star patterns. Equality predicates: KMP ... Elements j and k are star predicates and jk is U: U j,k ... Star Patterns ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 19
Provided by: rezas2
Category:

less

Transcript and Presenter's Notes

Title: Optimization of Sequence Queries in Database Systems


1
Optimization of Sequence Queries in Database
Systems
  • Reza Sadri Carlo Zaniolo
  • reza_at_cs.ucla.edu zaniolo_at_cs.ucla.edu
  • sadri_at_procom.com
  • Amir Zarkesh Jafar Adibi
  • azarkesh_at_u4cast.com jabibi_at_u4cast.com

2
Time series Analysis
  • Many Applications
  • Querying purchase patterns for marketing
  • Stock market analysis
  • Studying meteorological data
  • Whats needed
  • Expressive query language for finding complex
    patterns in database sequences
  • Effcient and scalable implementation Query
    Optimization

3
State of The Art
  • ADT (e.g.. Informix Datablades) Not flexible
    enough, no Optimization
  • SEQ Enhanced ADTs (e. g. sets and sequences)
    with their own query language
  • SRQL Adding sequence algebra operators to
    relational model

4
SQL-TS
  • A query language for finding complex patterns in
    sequences
  • Minimal extension of SQLonly the from clause
    affected
  • A new Query optimization techniqe based on
    extensions of the Knuth, Morris Pratt (KMP)
    string-search algorithm

5
SQL-TS Example
  • Having a table quote (name, date, price) of stock
    prices, find all the stocks that went up by 15
    or more one day and went down by 20 or more the
    next day and retrieve the change in the value of
    the stock.
  • SELECT X.name, (Z.price -X.price) AS
    price_change, X.date AS date
  • FROM quote
  • CLUSTER BY name
  • SEQUENCE BY date
  • AS (X, Y, Z)
  • WHERE Y.price gt 1.15 X.price
  • AND Z.price lt 0.8 Y.price

6
SQL-TS Example Data
sorted in each group
grouped but not sorted
7
Optimized string searchKMP
Consider text array text and pattern array p i
1 2 3 4 5 6 7 8
9 10 11 texti a b a b a
b c a b c a j 1 2
3 4 5 6 patternj a b a b c
a
  • After failing, use the information acquired so
    to
  • - backtrack to shift(j), rather than i1,
    and
  • - only check pattern values after next(j)
  • But in SQL-TS we have general predicates star
    patterns

8
Equality predicates KMP
  • Find companies whose closing stock price in
  • three consecutive days was 10, 11, and 15.
  • SELECT X.name
  • FROM quote CLUSTER BY name
  • SEQUENCE BY date AS (X, Y, Z)
  • WHERE X.price 10 AND Y.price11
  • AND Z.price15

9
shift and next
  • Success for first j-1 elements of pattern.
    Failure for jth element (when input is at i)
  • Any shift less than shift(j) is guaranteed to
    lead to failure,
  • Match elements in the pattern starting at next(j)

10
Optimal Pattern Search (OPS)
Search path for naive algorithm vs optimized
algorithm
11
Matrices q and j Input tested on pj is now
tested against pk
pj succeeded
pj failed
Combing values of these lower triangular matrices
( j ³ k), We derive the values of next(j) and
shift (j)
12
STAR Patterns
  • SELECT X.NEXT.date, X.NEXT.price,
  • S.previous.date, S.previous.price
  • FROM quote
  • CLUSTER BY name,
  • SEQUENCE BY date
  • AS (X, Y, Z, T, U, V, S)
  • WHERE
  • X.name'IBM AND X.price gt X.previous.price
  • AND 30 lt Y.price AND Y.price lt 40
  • AND Z.price lt Z.previous.price AND T.price gt
    T.previous.price
  • AND 35 lt U.price AND U.price lt 40
  • AND V.price lt V.previous.price AND S.price lt
    30

13
Handling Star Patterns
  • Same input, Transitions on Original Pattern vs.
    Transitions on Pattern after the index set back
    j-k
  • ?21
  • ?
  • ?31 ? ?32
  • ? ?
  • ?41 ? ?42 ? ?43
  • ? ?
  • Example Elements j and k are star predicates and
    ?jk is U
  • U ? ?j,k1
  • ?
  • ?j1,k ?j1,k1

14
Possible Transitions from ?jk
15
Implication Graph
16
FSM Simulation used to Optimize Star Patterns
  • Relaxed Double Bottom
  • Only considering increases and decreases that are
    more than 2

17
Relaxed Double Bottom Ninty fold improvement
18
Conclusions
  • Disjunctive queries, partial ordered domains,
    aggregates also treated in this approach
  • Old applicationsmore power flexibility than
    Datablades ADTs of commercial DBMSs
  • Ongoing implementation, by building on the
    user-defined aggregates supported in AXL.
Write a Comment
User Comments (0)
About PowerShow.com