Statistical Change Detection for MultiDimensional Data - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Statistical Change Detection for MultiDimensional Data

Description:

Culturing the specimen of E. coli bacteria and then testing its ... Samples from KDE with scott's plug-in bandwidth. Samples from KDE with our EM bandwidth ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 22
Provided by: yao8
Category:

less

Transcript and Presenter's Notes

Title: Statistical Change Detection for MultiDimensional Data


1
Statistical Change Detection for
Multi-Dimensional Data
Presented by Xiuyao Song Authors Xiuyao Song,
Mingxi Wu, Chris Jermaine, Sanjay Ranka
2
Motivation example antibiotic
resistance pattern
  • Culturing the specimen of E. coli bacteria and
    then testing its resistant rate to multiple
    antibiotic drugs.
  • Typical data could be like (R resistant S
    susceptible, U undetermined)
  • drug1 drug2 drug3 drug4
  • R S U
    R
  • We have a baseline data set and a recently
    observed data set.
  • Question Does E.Coli show different resistance
    pattern recently?
  • If a change is detected, it might be caused by
    the presence of new E. Coli strains. We will
    raise an alarm for further investigations.

3
Problem definition
Multi-dimensional space
data set S
data set S
baseline data
recently observed data
Question FS FS ?
4
Related work
  • For uni-dimensional data, many existed tests,
    such as K-S test, chi-square test
  • Only two tests to detect a generic distributional
    change in multi-dimensional space.
  • Kdq-tree by Dasu et al relies discretization
    scheme, suffer from curse of dimensionality.
  • Cross-match by Rosenbaum computationally
    expensive due to maximum matching algorithm

5
hypothesis test framework
data set S
data set S
null hypothesis H0 FS FS
Null distribution ?
6
Density test high-level overview
data set S
data set S
Step 1 Gaussian kernel density estimate of S1.
Step 3 derive the null distribution
Kernel Density Estimate
null distribution
Step 4 calculate the critical value and make a
decision.
KS1
7
Step 1 Kernel Density Estimate (KDE)
--bandwidth selection
  • Plug-in bandwidth asymptotically efficient, but
    not accurate.
  • Data-driven bandwidth converge better to the
    true distribution.

bandwidth
correctness of density test can always be
guaranteed. power of test is increased when
estimate is accurate.
8
Choose bandwidth by MLE/EM (maximum likelihood
estimation / Expectation Maximization)
kernel
adding constraint
9
Effectiveness of EM bandwidth
Samples from the real distribution
Samples from KDE with scotts plug-in bandwidth
Samples from KDE with our EM bandwidth
10
Step 2 define and calculate
data set S
data set S
Kernel Density Estimate
KS1
11
Step 3 derive the null distribution
? normal By Central Limit Theorem
?1normal
?2 normal
Need to be estimated
Tk be r.v. with distribution FS
12
Estimating
13
Step 4 calculate critical value and
make a decision
estimated null distribution ?
14
Density test all 4 steps
data set S
data set S
Step 1 Gaussian kernel density estimate of S1.
Step 3 derive the null distribution
Kernel Density Estimate
null distribution
Step 4 calculate the critical value and make a
decision.
KS1
15
Run density test in 2 directions
the test is not symmetric, 2-way test may
increase the power. E.g.
FS
FS
S
S
16
False positive
Data consists of low-D group and high-D
group. User-given p value is 8
17
false negative on low-D group
false neg. ()
type of changes
18
false negative on high-D group
false neg. ()
type of changes
19
Scalability
density test has amortizable time cost (one-time
cost 84)
20
Conclusion
  • Our density test
  • can correctly bound the type I error
  • is most powerful on all 5 changes
  • can easily scale to large data sets and has an
    amortizable time cost

Poster session ( 15)
21
Thanks for your attention!
Write a Comment
User Comments (0)
About PowerShow.com