Title: Adaptive Cleaning for RFID Data Streams
1Adaptive Cleaning for RFID Data Streams
- Shawn Jeffery Minos Garofalakis
Michael Franklin - UC Berkeley Intel Research Berkeley
UC Berkeley - Presented by Hamid Haidarian Shahri
2Where Are We? Look at the Signs!
3Looking at Signs Before Jumping In
- S. Chaudhuri, U. Dayal, "An Overview of Data
Warehousing and OLAP Technology," SIGMOD Record,
1997. - 800 citations
- DW and information integration
- Data cleaning term publicized
- Identified its importance in integration
- Extensive research followed
4VLDB 2001
- Session R12 DATA QUALITY CLEANING
- Declarative data cleaning language, model, and
algorithms Helena Galhardas (INRIA
Rocquencourt), Daniela Florescu (Propel), Dennis
Shasha (NYU), Eric Simon, and Cristian-Augustin
Saita (INRIA Rocquencourt) - Potter's wheel an interactive data cleaning
system Vijayshankar Raman and Joseph M.
Hellerstein (University of California at
Berkeley) - Update propagation strategies for improving the
quality of data on the Web Alexandros Labrinidis
and Nick Roussopoulos (University of Maryland)
5Data Cleaning Previous Work - 2006
- Hamid Haidarian Shahri, S.H. Shahri, Eliminating
Duplicates in Information Integration An
Adaptive, Extensible Framework," IEEE Intelligent
Systems, Vol. 21, No. 5, 2006.
6Putting Things into Context
- Data cleaning required after integration
- No unified standard across sources
- NOW sensor/hardware errors inevitable research
opportunity - Data modeling (Amol Deshpande)
- An important use case is cleaning
-
7VLDB 2006 Three weeks ago
- Research Session 5 Sensor Data (dedicated to
cleaning!) - Title Adaptive Cleaning for RFID Data Streams
- Authors Shawn R. Jeffery, Minos Garofalakis,
Michael J. Franklin - Title A Deferred Cleansing Method for RFID Data
Analytics - Authors Jun Rao, Sangeeta Doraiswamy, Hetal
Thakkar, Latha S. Colby - Title Online Outlier Detection in Sensor Data
Using Non-Parametric Models - Authors Sharmila Subramaniam, Themis Palpana,
Dimitris Papadopoulos, Vana Kalogeraki, Dimitrios
Gunopulos
8RFID Radio Frequency IDentification
9RFID data is dirty
- A simple experiment
- 2 RFID-enabled shelves
- 10 static tags
- 5 mobile tags
10RFID Data Cleaning
- RFID data has many dropped readings
- Typically, use a smoothing filter to interpolate
SELECT distinct tag_id FROM RFID_stream RANGE 5
sec GROUP BY tag_id
But, how to set the size of the window?
Smoothed output
Raw readings
Time
11Window Size for RFID Smoothing
Fido moving
Fido resting
Reality
Raw readings
Small window
Large window
? Need to balance completeness vs. capturing tag
movement
12Truly Declarative Smoothing
- Problem window size non-declarative
- Application wants a clean stream of data
- Window size is how to get it
- Solution adapt the window size in response to
data
13Itinerary
- Introduction RFID data cleaning
- A statistical sampling perspective
- SMURF
- Per-tag cleaning
- Multi-tag cleaning
- Ongoing work
- Conclusions
14A Statistical Sampling Perspective
- Key Insight
- RFID data ?
- random sample of present tags
- Map RFID smoothing to a sampling experiment
15RFIDs Gory Details
Antenna reader
Read Cycle (Epoch)
Tag List
(For Alien readers)
16RFID Smoothing to Sampling
? Now use sampling theory to drive adaptation!
17SMURF
- Statistical Smoothing for Unreliable RFID Data
- Adapts window based on statistical properties
- Mechanisms for
- Per-tag and multi-tag cleaning
18Per-Tag Smoothing Model and Background
- Use a binomial sampling model
1
Si
pi
piavg
(Read rate of tag i)
0
Time (epochs)
Smoothing Window wi Bernoulli trials
19Per-Tag Smoothing Completeness
- If the tag is there, read it with high
probability - ? Want a large window
1
pi
0
Time (epochs)
Reading with a low pi
Expand the window
20Per-Tag Smoothing Completeness
Desired window size for tag i
With probability 1- ?
Expected epochs needed to read
21Per-Tag Smoothing Transitions
- Detect transitions as statistically significant
changes in the data
The tag has likely left by this point
1
pi
0
Time (epochs)
E1
E2
E3
E4
E5
E6
E7
E8
E9
E0
Statistically significant difference
Flag a transition and shrink the window
22Per-Tag Smoothing Transitions
- Statistically significant
observed readings
expected readings
Is the difference statistically significant?
23SMURF in Action
Fido moving
Fido resting
SMURF
? Experiments with real and simulated data show
similar results
24Multi-tag Cleaning
- Some applications only need aggregates
- E.g., count of items on each shelf
- Dont need to track each tag!
- Use statistical mechanisms for both
- Aggregate computation
- Window adaptation
25Aggregate Computation
- ?estimators (Horvitz-Thompson)
- Count
- Ptag i seen in a window of size w
- ?Use small windows to capture movement
- ?Use the estimator to compensate for lost readings
26Window Adaptation
- Upper bound window similar to per-tag
- Transition based on variance within subwindows
Nw
Count
Nw
Time (epochs)
27Multi-tag Scenario
28Ongoing Work Spatial Smoothing
- With multiple readers, more complicated
Two rooms, two readers per room
C
A
B
D
Reinforcement ? A? B? A U B? A B?
Arbitration ? A? C?
U
? All are addressed by statistical framework!
29Beyond RFID
Other sensor data
- ?-estimator for other aggregates
- Use SMURF for sensor networks
- Use SMURF in general streaming systems (e.g.,
TelegraphCQ) - Remove RANGE clause from CQL
Other streaming data
30Related Work
- Commercial RFID middleware
- Smoothing filters need to set smoothing window
- RFID-related work
- Rao et al., StreamClean complementary
- Intel Seattle, HiFi, ESP static window size
- BBQ, MauveDB
- Heavyweight, model-based
- SMURF is non-parametric, sampling-based
- Statistical filters (digital signal processing
DB) - Non-linear digital filters inspired SMURF design
31Conclusions
- Current smoothing filters not adequate
- Not declarative!
- SMURF Declarative smoothing filter
- Uses statistical sampling to adapt window size
32Thanks!