Title: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data
1Distributed Regressionan Efficient Framework
for Modeling Sensor Network Data
- Carlos Guestrin
- Peter Bodik
- Romain Thibaux
- Mark Paskin
- Samuel Madden
2Data collection paradigm
Base Station
Query
New Query
SQL-style query
Goal Push beyond simple data gathering devices
paradigm
3Data is highly correlated
Example temperature datafrom 10 nearby sensors
- Slow changes over time
- Measurements correlated
- Build lower dimensional representation
- Compression for data transmission
- Provide nodes with local view of global state
-
Redundancy Structure
4The regression problem
- Given, basis functions
- Find coeffs ww1,,wk
- Precisely, minimize the residual error
5Regression solution
6Global temperature is complex
Temperature surface is complex ? Need complex
basis functions? Lots of communication?
7What are we missing?
Temperature surface is complex but Lots of local
structure!
8Kernel regression
- Distributed algorithm for obtaining coefficients
- Simple communication along a spanning tree
- Robust to lost messages
Need global optimization to find optimal
coefficients
9Kernel regression ? Sparse matrices
Kernel basis functions have local support
10Gaussian Elimination
A is sparse ) Efficient Gaussian elimination
After Gaussian elimination, solve linear system
by k simple divisions
11Distributed regression
Complete system Ab
Sensor 2 can locally compute w2, w3
12. Specify regions .
1
2
3
4
5
1
Distributed Regression Solve global kernel
regression problem with simple local communication
13Communication pattern
Kernels form a tree structure ? Communication
along a spanning tree
Communication along spanning tree using junction
tree data structure
- High quality links may not align with kernel
topology - Kernels may not form a tree structure
14Distributed junction trees
, K6
- Any spanning tree transformed to a junction tree
- Communication along junction tree guaranteed to
obtain optimal parameters -
- Different spanning trees lead to different
junction trees with different computation and
communication complexity - See Paskin and Guestrin 04 for spanning tree
optimization
1
3
2
6
K1,
, K6
4
5
15Robustness
- Robustness is key in sensor networks
- Nodes may be added to the network or fail
- Communication is unreliable
- Link qualities change over time
- Distributed regression messages are robust
- Lost messages correspond to lost measurements
- Must make spanning tree and junction tree
algorithms robust - See Paskin and Guestrin 04 for details
16Locally, nodes obtain global view
17Temperature model for lab data
18Convergence and robustness
19Incremental changes
Initializing with noon temperatures
At 6pm, initializing from noon results
Offline solution
Distributed regression reliable communication
Distributed regression 50 packets lost
20Residual error varies over time
Average over regions
Regression with linear spatial components
Constant in time
Linear in time
Quadratic in time
21Effect of time window
22Communication complexity
23Extensions and applications
- Adaptive sampling
- Outlier and faulty sensor detection
- Contour finding
- Adaptive data modeling
- Basis function selection
- Model-based bit compression
- Bounds on bit precision for Gaussian elimination
applicable - Hierarchical models
- Unifying with wavelet-based approaches
- Currently applying similar ideas to probabilistic
inference, actuator control, - See Paskin and Guestrin 04 for details
24Conclusions
- General distributed regression algorithm for
sensor networks - Robust to node and message losses
- Kernel regression is an effective model for wide
range of sensor network data - Provide basis for new more complex sensor network
applications
25Distributed regression
Complete system Ab
Sensor 2 can locally compute w2, w3