G td - PowerPoint PPT Presentation

About This Presentation
Title:

G td

Description:

G?????? ??t?d?? #1. ???se???st??? ?p?t? ?s? ???t?se?? se ... Haar, Daubechies-4, Daubechies-6, Coifman, Morlet, Gabor) Haar wavelets: p?? ap?? ???p???s? ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 51
Provided by: yannisk
Category:
Tags: gabor

less

Transcript and Presenter's Notes

Title: G td


1
???se???st??? ?p?t?µ?s? ???t?se?? se ?p?????st???
S?st?µata ?e????? ???µa?a?
G?????? ??t?d?? ATT Labs-Research http//www.res
earch.att.com/info/kotidis
2
Outline
  • ??sa????
  • efa?µ???? p??se???st???? ap?t?µ?s?? e??t?se??
  • ???sµ?? t?? p??ß??µat??
  • Haar Wavelets
  • ???sµ??, pa?ade??µata
  • ??a? ap??? on-line a??????µ??
  • ???se???st???? ?p?????sµ?? Wavelets (VLDB2001)
  • JL-embeddings, sketches
  • ?p?????sµ?? wavelets µ?s? sketches
  • ?fa?µ???? µe p?a?µat??? ded?µ??a
  • ?e?te??? a??????µ?? (STOC2002, VLDB2002)
  • S?µpe??sµata

3
?ed?µ??a ? ?????s??G??s?
  • S?????t??s? ?a? a????s? p????f???a? p??sf??e?
    st?at????? p?e????t?µa ??a ep??e???se??
  • a????s? µe??d??? a?????, a????s? a????? pe?at??
  • s?s??t?s? µe p????sµ?a?? ?a?a?t???st???,
    ?ate?????µe?? µ???et????
  • ??a?e???s?µe? p?s?t?te?, s?et??? a???? ???µ??
    d?a????s??
  • ? e??e?a a??pt??? t?? t?µ?a t?? t??ep??????????
    ??e? ep?f??e? epa??stas? st? ???µ? d?µ??????a?
    ?a? d?a????s?? ded?µ????
  • s???? ?pe?ßa??e? t?? ap????e?t???? d??at?t?te?
    t?? ?pa????t?? s?st?µ?t??

4
???ef????? d??t?? (ATT)
  • ?e?t???? s?st?µa e??????, a????s??
  • 200-300 e?at?µµ???a ???se?? t?? ?µ??a (gt60GB)
  • 200 d?se?at?µµ???a e???af?? (50??)
  • ? ap?t?µ?s? e??t?se?? e??a? ?????ß??a
  • Communities Of Interest p??a e??a? ta 10
    ???µe?a µe t? µe?a??te?? s????t?ta ???se?? ap? t?
    9733340865?
  • p??a ?ta? ? ?ata??µ? t?? ?pe?ast???? ???se?? a??
    ?e???af??? pe????? t??? te?e?ta???? ??? µ??e??
  • p??a e??a? ? µ?s? d????e?a e??? t??ef???µat??
    st?? 10 µe?a??te?e? p??e?? t?? ???a??

5
IP-d??t??
Backbone router
Gateway router
Access router
  • ?e??ss?te?a ded?µ??a
  • ?a??te??? ???µ?? d?a????s??
  • ?.?. CISCO NetFlow 150 records/day/router
  • ? ap?st??? t?? ded?µ???? e??a? as?µf???/ad??at?
  • µ???? ?a? 97 t?? ded?µ???? ?????ta? st? µetaf???

6
???se???st??? ap?t?µ?s? e??t?se??
???t?s?
????ß?? ap??t?s?
GB/TB
  • ????ße?? apa?t?se?? de? e??a? p??t?te
    apa?a?t?te?!
  • ??a a????? a????s? µa? e?d?af????? ?????? ??
    ?s????? t?se??
  • se e??t?se?? ?µad?p???s?? a???ße?a sta p??ta
    s?µa?t??? ??f?a e??a? a??et?
  • ???? p?s?st? ap? ta s??????? t??ef???µata
    ?????ta? st?? ?tt????

7
?p??p???µ??? µ??t??? ded?µ????
  • ???a?a? ai, 1?i?N
  • a???µ?? ???se?? ap? t? ???µe?? i (N1010)
  • a???µ?? pa??t?? ap? IP-d?e????s? i (N232)

(973) 360-7212, 6 (973) 360-8347, 7 (973)
360-8408, 1 (973) 360-7212, 1 (973) 360-8404,
9 (973) 360-8404, 1 (973) 360-7212, 7 (973)
360-8347, 1
?,di
ded?µ??a
8
?? p??ß??µa
  • ?e????af? t?? p??a?a a se ???? ltlt ?
  • ?pe?e??as?a se ??a p??asµa
  • ???µ???s? se p?a?µat??? ?????
  • ???se???st??? ap?t?µ?s? e??t?se?? µ?sa se
    p???a????sµ??a ???a ??????.

S?µe?? pa?at???s??
ded?µ??a
?
sketch(KB/MB)
9
Outline
  • ??sa????
  • efa?µ???? p??se???st???? ap?t?µ?s?? e??t?se??
  • ???sµ?? t?? p??ß??µat??
  • Haar Wavelets
  • ???sµ??, pa?ade??µata
  • ??a? ap??? on-line a??????µ??
  • ???se???st???? ?p?????sµ?? Wavelets (VLDB2001)
  • JL-embeddings, sketches
  • ?p?????sµ?? wavelets µ?s? sketches
  • ?fa?µ???? µe p?a?µat??? ded?µ??a
  • ?e?te??? a??????µ?? (STOC2002, VLDB2002)
  • S?µpe??sµata

10
??sa???? sta Wavelet
  • Wavelets µa??µat???? µetas??µat?sµ??
    ???s?µ?p????ta? p???a????sµ??? ß?s? (p.?. Haar,
    Daubechies-4, Daubechies-6, Coifman, Morlet,
    Gabor)
  • Haar wavelets p?? ap?? ???p???s?
  • a?ad??µ???? ?p?????sµ?? d?af???? ?a? a????sµ?t??
    a?? d?ad??? tµ?µata

Resolution Averages
Wavelets
a 2, 2, 0, 6, 4, 2, 2, 0
----
3
2, 3, 3, 1
0, 3, -1, -1
2
1
0
  • ???sµ?? epe?te??eta? e????a ??a p???d??stata
    ded?µ??a

11
S?µp?es? µ?s? Wavelet
  • ??at?µe ?ltlt? t?µ?? (t?? µe?a??te?e?)
  • ?? ?2

2.25, -0.25, 0.5, -1, 0, 3, -1, -1
12
On-line a??????µ?? (ap?? µ??t???)
  • ???p??µe t?? p??a?a ap? a??ste?? p??? ta de???
  • ??at?µe ta ? µe?a??te?a wavelet se s??? ?a? logN
    ap? ta e?e??? st? µ??µ?

S???? (t?p-?)
-


-
-
-


a1
a2
a3
a4
a5
a6
a7
a8
?gtgtd?a??s?µ? µ??µ?
13
?a?? ?a? ?s??µa ??a
  • ?a ?-µe?a??te?a wavelets µp????? ?a ?p?????st???
    µe µ??µ? O(BlogN)
  • IEEE TKDE ???e ?tete?µ???st???? a??????µ?? p??
    ?p??????e? t? µe?a??te?? (e?t?? t?? µ.?.) wavelet
    st? ?e???? µ??t??? ??e???eta? ?(N/polylog(N))
    µ??µ?

14
Ge???? ?ate????s?
  • Ta ???s?µ?p???s??µe randomized a??????µ???
  • p??se??????? t? ??s? µe µe???? p??a??t?ta
    ep?t???a?
  • ???se???st???? ?p?????sµ?? t?? wavelets µe
  • sf??µa (a????st???) 1?e (p.?. 10)
  • p??a??t?ta ep?t???a? 1-d (p.?. 99)
  • p???-???a???µ???? apa?t?se?? µ??µ?? (?a?
    p???p????t?ta)

poly(logN, log(1/d), e)
15
?a?at???s? 1
  • ?? wavelet wl e??a? t? es?te???? ????µe?? t??
    ded?µ???? a µe ??a d????sµa ß?s?? ?i

?1
?2
?3
?4
?????a?????? ß?s?
wavelets
?5
?6
?7
?8
16
?a?at???s? 2
  • ?? es?te???? ????µe?? 2 µ??ad?a??? d?a??sµ?t??
    µp??e? ?a ?p?????ste? ap? t?? ap?stas? t???

lta,bgt cos(a,b) 1-dist2(a,b)/2
17
??a ???? ???
  • ?pe?????s? t?? ded?µ???? ?a? t?? wavelet-ß?s??
    st? RN (N1 s?µe?a)

18
JL-embeddings
  • Johnson Lindenstrauss 84
  • ? s?µe?a µp????? ?a ape?????st??? se
    ??((log?)/e2) d?ast?se?? ?ste ?? µeta?? t???
    ap?st?se?? ?a d?at?????ta? µe sf??µa ?e

19
Sketches
  • e.g Alon96 es?te???? ????µe?? t?? a µe
    O(log(N/?)/?2) ?e?d?t??a?a -1,1 d?a??sµata

2
ai
sketch(a)
r1i
8
1
-1
-1
1
-1
1
1
1
-2
r2i
-1
1
1
-1
1
1
-1
-1
r3i
0
1
1
-1
1
-1
-1
1
-1
20
?d??t?te? t?? Sketches
sketch(a)
?1 ?2 ?3 ?4 ?5 . . . . . . . ??
To Xi2 e??a? unbiased estimate t?? ???µa?-2 t??
a
21
Boosting d??µes?? µ?s??-????
  • ?jSa?rj ?, ??j2Sa?2?2

?1 ?2 ?3 ?4 ?5 . . . . . . . ??
µ
??µ
22
Boosting d??µes?? µ?s??-????
  • ?jSa?rj ?, ??j2Sa?2?2

?1 ?2 ?3 ?4 ?5 . . . . . . . ??
µ
Prob?-?2 ? 4/µ1/2 ?2 gt 1-2-?/2
e
d
??µ
?? µ???? t?? a µp??e? ?a ?p?????ste? µe a???ße?a
e, µe p??a??t?ta ep?t???a? 1-d
23
Sketch e??? Wavelet
B
A
C
?l
1
-1
rk
1
1
-1
1
-1
-1
-1
1
1
1
-1
-1
1
-1
1
-1
  • 2nd order Reed-Muller codes ??a ta a????sµata se
    ?(log3(N))

24
Wavelets from Sketches
p??a?a?
N
wavelet d????sµa-ß?s?
25
?e????? a??????µ?? (vldb2001)
  • ??s?d?? sketch(a), i
  • ???d?? wavelet wi
  • ?p?????se t? sketch(??) t?? d?a??sµat??-ß?s??
  • ?p?????se ??2 ap? t? sketch(a)
  • ?p?????se sketch(aa/Y½)sketch(a)/Y½
  • ?p?????se cos(a,?)1-dist2(a,??)/2 µ?s? t??
    sketch(a-??)
  • ep?st?e?e w?½cos(a,??)
  • ???µ? O(Blog2(N)log(N/?)/?/?3)

?e?d. µetaß??t??
sketch
26
S??????? ????te?t?????
data stream
seeds
sketch
wavelets
Queries
27
Outline
  • ??sa????
  • efa?µ???? p??se???st???? ap?t?µ?s?? e??t?se??
  • ???sµ?? t?? p??ß??µat??
  • Haar Wavelets
  • ???sµ??, pa?ade??µata
  • ??a? ap??? on-line a??????µ??
  • ???se???st???? ?p?????sµ?? Wavelets (VLDB2001)
  • JL-embeddings, sketches
  • ?p?????sµ?? wavelets µ?s? sketches
  • ?fa?µ???? µe p?a?µat??? ded?µ??a
  • ?e?te??? a??????µ?? (STOC2002, VLDB2002)
  • S?µpe??sµata

28
?e???µata (t??ef????? d??t??)
  • CDRs ap? 7 µ??e? t?? Feß?. 2001
  • ai ???se?? ap? t? npa-nxx i
  • N65,536
  • Sketch size 3,952 words

29
S?????s? µe Off-line a??????µ?
  • Top-7 wavelets pe??????? 90 t?? e????e?a?
  • ?p????pa 65529 wavelets p??? µ????

30
S?????s? µe stat??? p??ep?????
31
?pe?????s? st? RN
ded?µ??a
wavelets
32
G?aµµ???t?ta t?? s??ts??
  • ?a?at???se?? ap? d?af??et??? s?st?µata µp????? ?a
    s??d?ast???

33
S??d?asµ?? ?ata?eµ?µ???? µet??se??
S??????? ??? µ?sa ap? t? d??t?? ???µ?




34
?pe?t?se??
  • STOC 2002 paper ?e?a???a ap? sketches
  • ?p?????sµ?? histograms, wavelets µ?s? sketches se
    sub-linear time,space µe µ???µ??µ relative error
  • efa?µ???? Exploratory Data Analysis,
    visualization, databases ?.a.

35
Random Subset Sums (VLDB2002)
  • ?p??e?e t? a? µe p??a??t?ta 50

a?2?j-?, a? rji1
a8?
ai
? ? ? ? ?
36
?atas?e?? t?? RSS
  • Extended Hamming Code

log(N)1 seed
1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0
1 0 1 0 1 0 1
1 2 2 3 1 2 2 3
1 0 1 1
x

1 0 0 1 1 0 0 1
(mod2)
rss a0, a3, a4, a7
37
?e?te??? ???????µ?? (VDLB2002)
  • G???e ?p???d?p?te d??st?µa sa? ?????sµa ?(logN)
    d?ad???? d?ast?µ?t??
  • ???e d?ad??? d??st?µa p??se????eta? ?e?1 µ?s? t??
    RSS

ai
38
Deciles of on-going Calls
39
S?µpe??sµata
  • ???se???st??? ap?t?µ?s? e??t?se?? ep???µ?t? se
    p????? efa?µ????
  • ta??tat? ap????s? se efa?µ???? a????s??
  • µ??? ??s? ?ta? ? s?????t??s? t?? ded?µ???? e??a?
    ad??at?
  • ??? µ???d?? (sketches/RSS) ??a s???pt???
    pe????af? µe
  • µ???? ????, ??a p??asµa, e????se?? p?st?t?ta?
    (e,d)
  • ??????? ap?t?µ?s?
  • S??d?asµ?? ?ata?eµ?µ???? µet??se?? se s?st?µata
    e??e?a? ???µa?a?
  • lossless ??a ?p???d?p?te ??aµµ??? s??d?asµ?

40
????d??stat? ?????s? ?ed?µ????
  • Cubetrees, Dwarf, SIGMOD-97, 98, 02
  • ap?d?t???? d?µ?? ??????s??
  • DynaMat, best paper award SIGMOD-99, TODS-01
  • a?t?µat? ep?????, ??????s? µe ß?s? ta ?p?????ta
    resources (ap????e?t???? ?????, ??????
    ?p?????sµ??), e??µ???s?
  • Data mining (VLDB-98, 01)
  • ??ta??a?? ded?µ???? µ?s? XML, ICDE-03 ?.a.

Viewproduct,store
41
???a??st?!
42
Exponential fading
  • Exp fading b'?a(1-?)b
  • ?p? ??aµµ???t?ta h(b)?h(a)(1-?)h(b)

43
Conventional View (Haar Wavelets)
44
???e? efa?µ????
  • ?p????? ?a ???s?µ?p??????? a?t? ??a ta a?????
    s?µata se p?????? a??????µ?? a????s?? ded?µ????
  • ?.?. SVD (Information Retrieval LSI)

45
Wavelet Transform
  • JPEG-2000
  • F?s??????a (a?t????? e????a? ap? ???ast???)
  • Many applications Data Compression, Noise
    Reduction, Edge Detection (image processing)
  • Databases selectivity estimation
    Matias98-00,Chakrabarti00, Gilbert00, aggregate
    OLAP queries Vitter99, etc
  • Fast Transform O(N) space/time
  • Few good-terms phenomenon
  • Just few coefficients retain most of the energy

46
IP Example
47
Main Result
  • Parameters
  • ? seek inner products within (1??)
  • ? failure probability
  • ? guarantees hold only when cosine is greater
    than ?
  • if wl2 ? (??/B)a2 can be estimated reliably
  • If there is a top-B wavelet representation with
    psedo-energy at least ?a2 then with probability
    (1-?) we can find an approximate B-term
    representation with pseudo-energy at least
    (1-?)?a2 with space and per-item time cost
  • O(Blog2(N)log(N/?)/?/?3)

48
???a pa?ade??µata
  • ?a??????a s?st?µata ?.?
  • stat?st??? ??a t?? ?s??ata??µ? t?? ded?µ????
  • Stat?st??? ??a query optimization

?e?t??????
SQL Query
Optimizer
S???pt???pe????af?
49
Chebyshevs Inequality
  • PX-EX gt k lt VARX/k2
  • ( X? X2)
  • ?µe?? EX2 A2, VARX2E(X2-EX2)2 lt
    A22
  • ??s?? ???? Y?(?12 ?22 ?µ2 )/µ
  • EY? A2
  • VARY? lt A22/µ
  • ??a PY?- A2gt eA2 lt (A22/µ)/e2 A221/(µe2)
  • ? PY?- A2lteA2 gt1-1/(µe2)

50
1st Chernoff Bound
  • ?st? V t.m 1 a? t? ? e??a? µ?sa sta ???a,
  • PV1p sta?e?? ??a e,µ sta?e?? Poisson
    Trials
  • ??? ? trials V
  • ?st? ? ? a???µ?? t?? ep?t????? ?SUM(V)
  • ? p??a??t?ta ap?t???a?
  • PX lt (1-d)?p lt e-?pd2/2
  • G?a 1-d1/2 (? d??µes?? ?a e??a? ?????)
  • PXgt ½ ?p gt1-e-?p/8
  • ?st? ?7/8 -gt 1/(µe2) 1/8 ? PY- A2gt eA2
    gt1-d
  • ?p?? esqrt(8/µ) ?a? d e-?7/8 (ta ???a ???a
    e??a? pa??µ??a)
Write a Comment
User Comments (0)
About PowerShow.com