PIRS: Query Verification on Data Streams - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

PIRS: Query Verification on Data Streams

Description:

Query Assurance on Data Streams. Ke Yi (AT&T Labs, now at HKUST) Feifei Li (Boston U, now at Florida State) Marios Hadjieleftheriou (AT&T Labs) Divesh Srivastava (AT ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 41
Provided by: MaXi1158
Category:

less

Transcript and Presenter's Notes

Title: PIRS: Query Verification on Data Streams


1
PIRS Query Verification on Data Streams
  • Ke Yi, Hong Kong University of Science and
    Technology
  • Feifei Li, Florida State University
  • Marios Hadjieleftheriou, ATT Labs
  • George Kollios, Boston University
  • Divesh Srivastava, ATT Labs

work done while the 1st and 2nd authors were
working at ATT labs.
2
Publishing Data and Outsourcing Query Service
Network
0 1 1 0 0 1 1 1 0
IP Traffic Streamcoming from
Gigascopeanalysis tool by
Results
statistics
3
Revisiting the CISCO ATT Example
Network
Gigascope
IP Traffic Stream
0 1 1 0 0 1 1 1 0
statistics
lawyers sign the trust agreement
Could we help? (computer scientists)
4
Concrete Example
IP Stream
. . .
pm
p3
p2
p1
srcIP, destIP, packet_size
  • Continuous Query
  • SELECT SUM(packet_size) FROM IP_trace
  • GROUP BY srcIP, destIP
  • Answer

Groups
1 2 3 . . . n
5 10KB 2KB 150KB . . . 5KB
10 11KB 130KB 1MB . . . 20KB
13 . . .
Time
5
Continuous Query Verification (CQV) on Data
Streams
Group 1
Group 2
  • Client register query
  • Server reports answer
  • upon request

Group 3
Server maintains exact answer


Source of streams

Client maintains synopsis X
Both client and server monitor the same stream
SELECT SUM(packet_size) From IP_Trace GROUP BY
src_ip, dest_ip
6
The Model for the Stream
T3
T1
T2
agg_attribute group_id
11
91
7i

S
0
VT
0
0
0

9
0
7
10
V1
V2
V3
Vn
Vi
7
Continuous Query Verification CQV
T1
T2
T3
91
7i
11

S
Update X
Update V
0
VT
0
0
0

9
0
7
10
XT
V1
V2
V3
Vn
Vi
Synopsis
0
0
2
0

9
0
5
10
V1
V2
V3
Vn
Vi
8
PIRS Polynomial Identity Random Synopsis
choose prime p
chose a random number
raise alarm if not equal
o/w no alarm
9
Incremental Update to PIRS
T1
T2
91
7i

11
S
update to v1
update to vi
update to v1
An update to group i with value u could be done
in logu time (exponential by squaring)
10
It Solves CQV problem!
Theorem Given any
PIRS raises an alarm
with probability at least 1-d
a polynomial with 1 as the leading coefficient is
completely determined by its zeroes
Due to the fundamental theorem of algebra.
Since we have pgtm/ d choices for a the
probability that X(V)X(W) is at most d
11
Optimality of PIRS
Theorem PIRS occupies O(log m/d log n) bits of
space (3 words only at most, i.e., p, a, X(V)),
spends O(1) time to process a tuple for count
query, or O(log u) time to process a tuple for
sum query.
Theorem Any synopsis for solving the CQV problem
with error probability at most d has to keep
?(log minn,m/d) bits.
12
Multiple Queries
Q1
Q2
Q1
Q2
V1..n2
V1..n1
V1..(n1n2)
X1
X2
X
Theorem our synopses use constant space for
multiple queries.
91,8

S
update to v1
update to v8
13
Handle the Load Shedding
  • Semantic Load Shedding drop tuples from certain
    groups
  • Small number of groups having errors
  • Random Load Shedding
  • All groups have small amount of errors

14
CQV with Semantic Load Shedding
Randomly drop certain tuples according to groups
91
7i
2j
11
4k

51
Server claims at most ? number of groups have
errors
To detect if more than ? groups having errors!
We have designed synopses using O(? log 1/d log
n) bits of space and achieve the error
probability at most d
15
PIRS? An Exact Solution
b(8)2
Alarm
v8
If at least one layer raises alarms

PIRS
PIRS
PIRS
k buckets
Alarm
log 1/d

If at least buckets raise alarms

PIRS
PIRS
PIRS
16
PIRS? An Exact Solution
Theorem PIRS? requires O(?2 log1/d logn) bits,
spends O( log1/d ) time to process a tuple and
solves CQV with semantic load shedding.
17
Intuition on Approximation
the approximation
probability to raise alarm
the ideal synopsis
number of errors
?
?-
?
18
PIRS? An Approximate Solution
Theorem PIRS? requires O(? log1/d logn) bits,
spends O(? log1/d ) time to process a tuple.
19
CQV with Random Load Shedding
Randomly drop tuples
All groups have small errors
To detect if any group has error greater than a
claimed threshold
Theorem Any synopsis solves this problem with
error probability at most d requires at least
?(n) bits (reducing to the problem of estimating
infinite frequency moment the number of
occurrence of the most frequent item).
20
Sliding Window and Other Queries
  • It is easy to extend PIRS to work with sliding
    window model since it is decomposable, i.e.,
    X(v1v2)X(v1)X(v2).
  • Other queries that can be transformed into Group
    By aggregation queries.
  • Details in the paper.

21
Some Experiments
  • We use real streams
  • World Cup Data (WC)
  • IP traces from the ATT network (IP)
  • We perform the following query
  • WC Aggregate on response size and group by
    client id/object id (50M groups)
  • IP Aggregate on packet size and group by source
    IP/destination IP (7M groups)
  • Hardware for the client
  • 2.8GHz Intel Pentium 4 CPU
  • 512 MB memory
  • Linux Machine

22
Detection Accuracy
Over 100,000 random attacks, PIRS identifies all
of them.
23
Memory Usage of Exact
Exacts memory usage is linear and expensive.
PIRS using only constant 3 words (27 bytes) at
all time.
24
Update Time (per tuple) of Exact
Cache misses and memory swap
  1. Exact is fast when memory usage is small.
  2. It becomes extremely slow due to cache misses and
    memory swap operations.

25
Running Time Analysis
Average Update Time
WC IPs
Count 0.98 µs 0.98 µs
Sum 8.01 µs 6.69 µs
IPs exhibits smaller update cost for sum query as
the average value of u is smaller than that of WC
26
Multiple Queries Exact Memory Usage
Exacts memory usage is linear w.r.t number of
queries and increasing over time.
PIRS always using only constant 3 words (27
bytes).
27
Multiple Queries Exact Update Time Per Tuple
28
Multiple Queries PIRS Update Time Per Tuple
29
The Library
Download PIRS and other synopses
at http//www.cs.fsu.edu/lifeifei/pirs/
30
Conclusion
  • Space and Update efficient synopsis for verifying
    continuous group-by aggregation queries on
    streaming data
  • Could be generalized to handle selection query,
    and sliding-window semantics
  • How about more complicated queries?

31
Thanks!
  • Questions

32
Problem and Goals
  • Assumption
  • Client and DSMS observe the same stream
  • Problem
  • Client needs to verify the results
  • Goals
  • Be memory, update efficient
  • Tolerance for a limited number of errors
  • Tolerance for small errors
  • Support multiple queries

33
Related Techniques to PIRS
  • Incremental Cryptography
  • Block operation (insert, delete), cannot support
    arithmetic operation
  • Program Verification
  • Server may pass the program execution but simply
    return random outputs
  • Fingerprinting Technique
  • PIRS is a fingerprinting technique

34
CQV with Semantic Load Shedding
35
PIRS? An Approximate Solution
Theorem PIRS? 1.raises no alarm with
probability at least 1- d on any
2.raises an alarm with probability at least 1- d
on any
For any cgt-lnln20.367
Using the intuition of coupon collector
problem and the Chernoff bound.
36
PIRS? An Approximate Solution
Alarm
If majority layers raise alarms
bi2
vi

PIRS
PIRS
PIRS
k buckets
Alarm

log 1/d
If all k buckets raise alarms

PIRS
PIRS
PIRS
37
Information Disclosure on Multiple Attacks
PIRS X(V) on r
R
Insight server could potentially gets rid of d
portion of seeds from each notified failed attack!
Learns nothing about r
38
Information Disclosure on Multiple Attacks
Bob
Theorem For the total of k attacks made by Bob
to PIRS, the probability that none of them
succeeds is at least 1-kd.
39
Proof of the Optimality
40
Proof of the Optimality
Write a Comment
User Comments (0)
About PowerShow.com