Probabilistic Aggregation in Distributed Networks - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Probabilistic Aggregation in Distributed Networks

Description:

High frequency residual: ARMA modeling. ARMA stands for AutoRegressive and Moving Average model, which is a standard ... ARMA forecasting for transient oscillation ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 20
Provided by: HongZ8
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Aggregation in Distributed Networks


1
Probabilistic Aggregation in Distributed Networks
  • Ling Huang, Ben Zhao,
  • Anthony Joseph and John Kubiatowicz
  • hling, ravenben, adj, kubitrong_at_eecs.berkeley.ed
    u
  • June, 2004

2
Outline
  • Background
  • Motivation
  • Statistical properties of real life data streams
  • Problem of existing approaches
  • Our Approach
  • Reduce communication overhead
  • Recover from loss
  • Evaluation
  • Conclusion and future work

3
Background
  • Aggregate functions
  • MIN, MAX, AVG, COUNT, , etc.
  • In-Network hierarchical processing
  • Query propagation
  • Tree construction
  • Aggregates computed epoch by epoch
  • Addressing fault-tolerance
  • Multi-root
  • Multi-tree
  • Reliable transmission

4
Motivation
  • Data aggregation is an important function for all
    network infrastructures
  • Sensor networks
  • P2P networks
  • Network monitoring and intrusion detection
    systems
  • Exact result not achievable in face of loss and
    faults
  • High cost when adding fault-tolerance
  • Low communication overhead, accurate
    approximation is crucial
  • But, its difficult to achieve

5
Observation Comparison of Data Streams
Three real-world data traces and a random trace
6
Statistical Properties of Data Streams
Relative Increment is defined as
There is temporal correlation in real data
stream, by which we can leverage to maintain
aggregate data accuracy, while reducing
communication overhead and recovering from data
loss.
Density estimation for relative increment
7
Problems in Existing Approaches
  • Few approach exploits the temporal properties and
    is designed to handle data loss
  • Simple last-value algorithm for data loss
    recovery in TAG
  • Multi-root/tree make things worse by consuming
    more resource
  • Fragile for large process groups
  • Need all relevant nodes for participation
  • Difficult to trade accuracy for communication
    overhead
  • Good applications need this tradeoff
  • Only need approximation
  • But, minimize resource consumption
  • Centralize solution of adaptive filtering
    proposed by Olston et.al.

8
Our Approach
  • Probabilistic data aggregation a scalable and
    robust approach
  • Exploit and leverage statistical properties of
    data stream in temporal domain
  • Apply statistical algorithms to data aggregation
  • Develop protocol that handles loss and failures
    as essential part of normal operations
  • Nodes participate in aggregation and
    communication according to statistical sampling
    algorithm
  • In the absence of data, estimate value using time
    series algorithms
  • Differentiate between voluntary and involuntary
    Loss

9
Reducing Communication Overhead
  • Trade off between accuracy and resource
    consumption
  • Allow selective participation of nodes while
    maintaining aggregate accuracy
  • Node participates in the operation with certain
    probability, which is the design parameter of the
    algorithm
  • Sampling strategies
  • Uniform Sampling all nodes use the identical
    sampling rate
  • Subtree-size based Sampling sampling rate of a
    node is proportional to the size of its subtree
  • Variance based sampling a sensor only reports a
    new value if it is above or below a threshold
    percentage its last reported value.

10
Performance of Sampling algorithms
  • As fewer nodes participate, overall accuracy
    decreases for all algorithms.
  • Uniform sampling performs worst.
  • Variance based sampling is most accurate,

11
Observation Long-Term Pattern in Data
Daily patterns in a weekly data stream
Data source bandwidth measurements for the CUDI
network interface on an Abilene router with
5-minute average.
12
Two Level Representation of Data
Monday Data
The data stream can be decomposed into two
layers the long trend (pattern), which changes
slowly the residual, high frequency but low
amplitude.
13
Recovering From Loss
  • Traditional Approaches
  • Last seen data as approximation for current epoch
  • Linear Prediction
  • Two-Level data representation and prediction
  • Long term trend B-spline estimation
  • High frequency residual ARMA modeling
  • ARMA stands for AutoRegressive and Moving Average
    model, which is a standard time series technique
    to model chaotic data stream

14
Two-Level Data Prediction
  • B-spline modeling for long term trend
  • Piecewise continuous, low-degree B-spline can
    represent complex shapes
  • Least-square B-spline regression for two-level
    decomposition
  • B-Spline extension for future forecasting
  • ARMA forecasting for transient oscillation
  • System Identification to determine the order of
    the model
  • Parameter estimation by optimization algorithm
  • Low complexity recursive equation for future
    forecasting
  • Statistical properties for the calibration of
    prediction results

15
Performance of Prediction Algorithms
Performance of Prediction Algorithms For MAX
Operation in Lossless Environment
16
Performance of Prediction Algorithms
Performance of prediction algorithms in lossy
environments. Average loss rate of the network is
20. The ration of loss rate between wide-area
links and local links is 31.
17
Summary of Results
  • All prediction algorithms are effective in
    improving the accuracy of aggregation results
  • Two-level prediction approach perform the best in
    all situations
  • Achieve more than 90 of accuracy even under each
    node nonparticipation with rate up to 60
  • Is effective even in a high loss environment

18
Conclusion and Future Work
  • Apply statistical algorithms to data aggregation
    system
  • quantify the statistical properties of real-world
    measurement data
  • propose the concept of probabilistic
    participation of nodes
  • propose multi-level prediction mechanism to
    recover from sampling and data loss
  • Uniqueness multi-level prediction enables high
    accuracy even under high loss and voluntary
    non-participation
  • Future Work
  • Develop online algorithm and exploit tradeoff
    between prediction accuracy and computation and
    storage cost
  • Build real system for applications with health
    monitoring, traffic measurement and router
    statistics aggregation
  • Real system implementation and Deployment

19
The Danger of Prediction
Write a Comment
User Comments (0)
About PowerShow.com