Title: AQuery A Database System for Order
1AQuery A Database System for Order
- Dennis Shashajoint work with Alberto Lerner
- lerner,shasha_at_cs.nyu.edu
2Idea
- Whatever can be done on a table can be done on an
ordered table (arrable). Not vice-versa. - AQuery query language on arrables
- Expresses many queries easily
- Easy to optimize
3Query 1
- Find the packets whose length is greater than
twice the average packet length over the last 1
hour
SELECT FROM Packets ASSUMING ORDER
timeWHERE len gt 2avgs(range(3600,time),len))
vectors
ordering vector for this query
Semantics are column-oriented as opposedto
row-oriented
4Vector Expressions
... WHERE len gt 2avgs(range(3600,time),len))
time 3 4 7 10 14 22 23
v1 0 1 2 3 3 1 2
v2 10 7.5 9 14.7 13.5 6 7.3
len 10 5 12 32 5 7 10
WHERE F F F T F F F
3 positionsand thecurrent
v1range(3600,time)windows ranges
v2avgs(v1,len)last hour avg len
len gt 2v2filter out false positions
Using range 10 here for the sake of the example
5Query 2
- Find when more than 20 type A squirrels were at
Jennifers backyard. Suppose a flag 1 signals
squirrel entry, and 1, exit.
time ...
region ...
sID ...
SquirrelSensor
flag ...
type ...
sID ...
SquirrelType
SELECT timeindex(sums(flag)gt20)FROM SquirrelSe
nsor SS, SquirrelType ST ASSUMING ORDER time
WHERE SS.sIDST.sID AND ST.type
A AND SS.region JWyard
6Vector Indexing
SELECT timeindex(sums(flag)gt20) ...
squirrelin
squirrelout
flag 1 1 -1 1 1 ... 1 1 -1 -1
sums 1 2 1 2 3 ... 20 21 20 19
gt20 F F F F F ... F T F F
time
timei
Jennifers backyard
i-th positionis true
7Query 3
- Find when 3 different squirrels within a
pair-wise distance of 5 meters from each other
chirp within 10 seconds of each other
time ...
loc ...
sID ...
SquirrelChirps
SELECT S1.sID, S1.loc, S2.sID, S2.loc, S3.sID,
S3.locFROM SquirrelChirps S1, SquirrelChirps S2,
SquirrelChirps S3 WHERE S1.sIDltgts2.sID AND
S1.sIDltgts3.sID AND s2.sIDltgts3.SID
AND S1.time-S2.time lt 10 AND S1.time-S3.time lt 10
AND S2.time-S3.time lt 10 AND distance(S1.loc,
S2.loc)lt5 AND distance(S1.loc,S3.loc)lt5 AND
distance(S2.loc,S3.loc)lt5
8Vector Fields
r
gbya (r)
Flatten(gbya (r))
b 1 2 3 4 5 6 7
a z y y x z x y
b 1 5 2 3 7 4 6
b 1 5 2 3 7 4 6
a z z y y y x x
a z y x
4
6
Non-grouped columns become vectorfields
respecting order. Flatten brings the arrable back
to 1NF
9Query 4 use of vector fields
- Create a log of flow information. A flow from src
to dest ends after a 2-minutes silence
dest ...
src ...
pID ...
Packets
len ...
time ...
SELECT source, dest, count(),
sum(len)FROM Packets ASSUMING ORDER time
GROUP BY source, dest, sums(deltas(time)) gt
120)
10And Streams?
- AQuery has no special facilities for streaming
data, but it is expressive enough. - Idea for streaming data is to split the tables
into tables that are indexed with old data and a
buffer table with recent data. - Optimizer works over both transparently.