BUILDING A DATABASE SYSTEM FOR ORDER - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

BUILDING A DATABASE SYSTEM FOR ORDER

Description:

BUILDING A DATABASE SYSTEM. FOR ORDER. New England Database ... Trueblood and Lovett. Apress, 2001. NEDS April 2002 Lerner and Shasha. Motivation. Top N ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 29
Provided by: ler83
Category:

less

Transcript and Presenter's Notes

Title: BUILDING A DATABASE SYSTEM FOR ORDER


1
BUILDING A DATABASE SYSTEM FOR ORDER
  • New England Database Seminars April 2002
  • Alberto Lerner ENST ParisDennis Shasha NYU
    lerner,shasha_at_cs.nyu.edu

2
Agenda
  • Motivation
  • SQL Order
  • Transformations
  • Conclusion

3
MotivationThe need for ordered data
  • Some queries rely on order
  • Examples
  • Moving averages
  • Top N
  • Rank
  • SQL can handle it. Can it really?

4
MotivationMoving Averages algorithmically
linear
Sales(month, total) SELECT t1.month1 AS
forecastMonth, (t1.total t2.total
t3.total)/3 AS 3MonthMovingAverageFROM
Sales AS t1, Sales AS t2, Sales AS t3WHERE
t1.month t2.month - 1 AND t1.month
t3.month 2 Can optimizer make a 3-way (in
general, n-way) join linear time?
Ref Data Mining and Statistical Analysis Using
SQL Trueblood and LovettApress, 2001
5
MotivationTop N
Employee(Id, salary) SELECT DISTINCT count(),
t1.salaryFROM Employee AS t1, Employee AS
t2WHERE t1.salary lt t2.salaryGROUP BY
t1.salaryHAVING count() lt N How many elements
of cross-product have salaries at least as large
as t1.salary? Will optimizer see essential
sort-count trick?
Ref SQL for Smarties Joe CelkoMorgan Kauffman,
1995
6
MotivationProblems Extending SQL with Order
  • Queries are hard to read
  • Cost of execution is often non-linear (would not
    pass basic algorithms course)
  • Few operators preserve order, so optimization
    hard.

7
Agenda
  • Motivation
  • SQL Order
  • Transformations
  • Conclusion

8
SQL OrderDesirable Features
  • Express order-dependent predicates and clauses
    in a readable, clear way
  • Make optimization opportunities explicit (by
    getting rid of complex idioms, see above)
  • Execution in linear (or n log n) time when
    possible

9
SQL Orderthree steps in solution
  • Give SQL a vector-oriented semantics Database
    is a set of array-tables arrables variables in
    the queries do not refer to a single tuple at a
    time anymore, but to a whole column vector
  • Provide new vector-to-vector functions
    Supporting order-based manipulations of column
    vectors
  • Streaming new data may need special treatment.

10
SQL OrderMoving Averages
Sales(month, total) SELECT month, avgs(8,
total)FROM Sales ASSUMING
ORDER month
avgs vector-to-vector function, order-dependant
and size-preserving
order to be used on vector-to-vector functions
  • Execution (Sales is an arrable)
  • FROM clause enforces the order in ASSUMING
    clause
  • SELECT clause for each month yields the moving
    average (window size 8) ending at that month. No
    8-way join.

11
SQL OrderTop N
Employee(ID, salary) SELECT first(N, salary)
FROM Employee ASSUMING ORDER
Salary
first vector-to-vector function, order-dependant
and non size-preserving
  • Execution
  • FROM clause orders arrable by Salary
  • SELECT clause applies first() to the salary
    vector, yielding first N values of that vector
    given the order. Could get the top earning IDs by
    saying first(N, ID).

12
SQL OrderRanking
SalesReport(salesPerson, territory,
total) SELECT territory, salesPerson, total,
rank(total)FROM SalesReport WHERE
rank(total) lt N
rank vector-to-vector function, non
order-dependant and size-preserving
  • Execution
  • FROM clause assuming is NOT needed.
  • rank is applied to the total vector and maps
    each position into an integer.

13
SQL OrderVector-to-Vector Functions
size-preserving
non size-preserving
prev, next, , avgs(), prds(), sums(),
deltas(), ratios(), reverse,
drop, first, last
order-dependant
rank, tile
min, max, avg, count
non order-dependant
14
SQL OrderComplex queries Best spread
In a given day, what would be the maximum
difference between a buying and selling point of
each security? Ticks(ID, price, tradeDate,
timestamp, ) SELECT ID, max(price
mins(price))FROM Ticks ASSUMING ORDER
timestampWHERE tradeDate 99/99/99GROUP BY
ID
max
bestspread
running min
min
  • Execution
  • For each security, compute the running minimum
    vector for price and then subtract from the price
    vector itself result is a vector of spreads.
  • Note that max min would overstate spread.

15
SQL OrderComplex queries Crossing averages
part I
When does the 21-day average cross the 5-month
average? Market(ID, closePrice, tradeDate,
)TradedStocks(ID, Exchange,) INSERT INTO temp
FROMSELECT ID, tradeDate, avgs(21 days,
closePrice) AS a21, avgs(5 months,
closePrice) AS a5, prev(avgs(21
days, closePrice)) AS pa21,
prev(avgs(5 months, closePrice)) AS pa5FROM
TradedStocks NATURAL JOIN Market
ASSUMING ORDER tradeDateGROUP BY ID
16
SQL OrderComplex queries Crossing averages
part I
  • Execution
  • FROM clause order-preserving join
  • GROUP BY clause groups are defined based on the
    value of the Id column
  • SELECT clause functions are applied
    non-grouped columns become vector fields so that
    target cardinality is met. Violates first normal
    form??

Vectorfield
groups in ID and non-grouped column
grouped ID and non-grouped column
two columns withthe same cardinality
17
SQL OrderComplex queries Crossing averages
part II
Get the result from the resulting non first
normal form relation temp SELECT ID,
tradeDateFROM flatten(temp)WHERE a21 gt a5
AND pa21 lt pa5
  • Execution
  • FROM clause flatten transforms temp into a
    first normal form relation (for row r, every
    vector field in r MUST have the same
    cardinality). Could have been placed at end of
    previous query.
  • Standard query processing after that.

18
SQL OrderRelated Work Research
  • SEQUIN Seshadri et al.
  • Sequences are first-class objects
  • Difficult to mix tables and sequences.
  • SRQL Ramakrishnan et al.
  • Elegant algebra and language
  • No work on transformations.
  • SQL-TS Sadri et al.
  • Language for finding patterns in sequence
  • But Not everything is a pattern!

19
SQL OrderRelated Works Products
  • RISQL Red Brick
  • Some vector-to-vector, order-dependent,
    size-preserving functions
  • Low-hanging fruit approach to language design.
  • Analysis Functions Oracle 9i
  • Quite complete set of vector-to-vector functions
  • But Can only be used in the select clause poor
    optimization (our preliminary study)
  • KSQL Kx Systems
  • Arrable extension to SQL but syntactically
    incompatible.
  • No cost-based optimization.

20
Agenda
  • Motivation
  • SQL Order
  • Transformations
  • Conclusion

21
TransformationsEarly sorting order preserving
operators
SELECT ts.ID, ts.Exchange, avgs(10,
hq.ClosePrice)FROM TradedStocks AS ts NATURAL
JOIN HistoricQuotes AS hq
ASSUMING ORDER hq.TradeDateGROUP BY Id
avgs
avgs
avgs
g-by
sort
g-by
op
avgs
op
sort
g-by
g-by
op
op
sort
op
(1) Sort then joinpreserving order
(2) Preserve existingorder
(3) Join then sortbefore grouping
(4) Join then sortafter grouping
22
TransformationsEarly sorting order preserving
operators
23
TransformationsUDFs evaluation order
Gene(geneId, seq)SELECT t1.geneId, t2.geneId,
dist(t1.seq, t2.seq)FROM Gene AS t1, Gene AS
tWHERE dist(t1.seq, t2.seq) lt 5 AND
posA(t1.seq, t2.seq) posA asks whether sequences
have Nucleo A in same position. Dist gives edit
distance between two Sequences.
dist
posA
Switch dynamicallybetween (1) and (2) depending
on the execution history
posA
dist
(2)
(1)
(3)
24
TransformationsUDFs Evaluation Order
25
TransformationsOrder preserving joins
select lineitem.orderid, avgs(10, lineitem.qty),
lineitem.lineid from order, lineitem assuming
order lineid where order.date gt 45 and order.date
lt 55 and lineitem.orderid order.orderid
  • Basic strategy 1 restrict based on date. Create
    hash on order. Run through lineitem, performing
    the join and pulling out the qty.
  • Basic strategy 2 Arrange for lineitem.orderid
    to be an index into order. Then restrict order
    based on date giving a bit vector. The bit
    vector, indexed by lineitem.orderid, gives the
    relevant lineitem rows.
  • The relevant order rows are then fetched using
    the surviving
  • lineitem.orderid.
  • Strategy 2 is often 3-10 times faster.

26
Transformations Building Blocks
  • Order optimization
  • Simmens et al. 96 push-down sorts over joins,
    and combining and avoiding sorts
  • Order preserving operators
  • KSQL joins on vector
  • Claussen et al. 00 OP hash-based join
  • Push-down aggregating functions
  • Chaudhuri and Shim 94, Yan and Larson 94
    evaluate aggregation before joins
  • UDF evaluation
  • Hellerstein and Stonebraker 93 evaluate UDF
    according to its ((output/input) 1)/cost per
    tuple
  • Porto et al. 00 take correlation into account

27
Agenda
  • Motivation
  • SQL Order
  • Transformations
  • Conclusion

28
Conclusion
  • Arrable-based approach to ordered databases may
    be scary dependency on order, vector-to-vector
    functions but its expressive and fast.
  • SQL extension that includes order is possible and
    reasonably simple.
  • Optimization possibilities are vast.
Write a Comment
User Comments (0)
About PowerShow.com