An Information Theoretic Approach to Network Trace Compression - PowerPoint PPT Presentation

About This Presentation
Title:

An Information Theoretic Approach to Network Trace Compression

Description:

1. An Information Theoretic Approach to Network Trace Compression ... universal data compression algorithms: LZ77, gzip, winzip. 7. Entropy & Compression (III) ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 20
Provided by: Yong79
Category:

less

Transcript and Presenter's Notes

Title: An Information Theoretic Approach to Network Trace Compression


1
An Information Theoretic Approach to Network
Trace Compression
  • Y. Liu, D. Towsley, J. Weng and D. Goeckel

2
Outline
  • network monitoring/measurement
  • information theory compression
  • single point trace compression
  • joint network trace compression
  • future work

3
Motivation
  • service providers, service users
  • monitoring
  • anomaly detection
  • debugging
  • traffic engineering
  • pricing, peering, service level agreements
  • architecture design
  • application design

4
Network of Network Sensors
  • network monitoring sensing a network
  • embedded vs. exogenous
  • single point vs. distributed
  • different granularities
  • full traffic trace packet headers
  • flow level record timing, volume
  • summary statistics byte/packet counts
  • challenges
  • growing scales high speed link, large topology
  • constrained resources processing, storage,
    transmission
  • 30G headers/hour at UMass gateway
  • solutions
  • sampling temporal/spatial
  • compression marginal/distributed

5
Entropy Compression (I)
  • Shannon entropy of discrete r.v.
  • compression of i.i.d. sequence by source coding
  • coding
  • expected code length
  • info. theoretic bound
  • Shannon/Huffman coding
  • assign short codeword to frequent outcome
  • achieve the H(X) bound

6
Entropy Compression (II)
  • joint entropy
  • entropy rate of stochastic process
  • exploit auto-correlation
  • lower bound on bits per sample of X
  • compression ratio H(X)/M, M original size per
    sample
  • Lempel-Ziv Coding
  • asymptotically achieve entropy rate of
    stationary process
  • universal data compression algorithms LZ77,
    gzip, winzip

7
Entropy Compression (III)
  • joint entropy rate of set of stochastic
    processes
  • joint data compression
  • exploit cross-correlation between sources
  • joint compression ratio
  • Slepian-Wolf Coding
  • distributed compression encode each process
    individually, achieve joint entropy rate in limit
  • require knowledge of cross-correlation structure

8
Network Trace Compression
  • naïve way treat as byte stream, compress by
    generic tools
  • gzip compress UMass traces by a factor of 2
  • network traces are highly structured data
  • multiple fields per packet
  • diversity in information richness
  • correlation among fields
  • multiple packets per flow
  • packets within a flow share information
  • temporal correlation
  • multiple monitors traversed by a flow
  • most fields unchanged within the network
  • spatial correlation
  • network trace models
  • quantify information content of network traces
  • serves as lower bounds/guidelines for
    compression algorithms

9
Packet Header Trace
0
16
31
time stamp (sec.)
Timing
time stamp (sub-sec.)
total length
ToS
vers.
HLen
IPID
flags
fragment offset
TTL
protocol
header checksum
IP Header
source IP address
destination IP address
destination port
source port
data sequence number
acknowledgment number
TCP Header
window size
Hlen
TCP flags
urgent pointer
checksum
10
Header Field Entropy
0
16
31
time stamp (sec.)
Timing
T Time
time stamp (sub-sec.)
total length
ToS
vers.
HLen
IPID
flags
fragment offset
TTL
protocol
header checksum
IP Header
source IP address
T flow id
destination IP address
destination port
source port
data sequence number
acknowledgment number
TCP Header
window size
Hlen
TCP flags
urgent pointer
checksum
11
Single Point Compression
T0
T0
T1
T1
T3
T0
Tn
Tn
Tm
T0
  • temporal correlation introduced by flows
  • packets from same flow closely spaced in time
  • they share header information
  • bits per flow id H(T)

12
Flow Level Model
  • Poisson flow arrival rate L flow
    inter-arrival independent packet inter-arrival
  • K flow length
  • bits per flow
  • bits per second? ? H(F)
  • marginal compression ratio

13
Empirical Results single point
  • 1 hour UMass gateway traces
  • Sept. 22, 2004 to Oct. 23, 2004
  • 1am, 10am, 1pm

14
Distributed Network Monitoring
  • single flow recorded by multiple monitors
  • spatial correlation traces collected at
    distributed monitors are correlated
  • marginal node viewbits/sec to represent flows
    seen by one node, bound on single point
    compression
  • network system viewbits/sec to represent flows
    cross the network, bound on joint compression
  • joint compression ratio quantify gain of joint
    compression

15
Baseline Joint Entropy Model
  • perfect network
  • fixed routes/constant link delay/no packet loss
  • flow classes based on routes
  • flows arrive with rate
  • of monitors traversed
  • bits per flow record
  • info. rate at node v
  • network view info. rate
  • joint compression ratio

16
Joint Trace Compression
  • Results from synthetic networks

17
Open Issues
  • how many more bits for network characteristics
  • variable delay/loss/route change
  • distributed compression algorithms
  • lossless v.s. lossy
  • joint routing and compression in trace aggregation

18
Future work
  • develop compression algorithms
  • single point compression
  • distributed joint compression
  • different levels of details
  • full packet traces
  • Netflow data
  • SNMP data
  • entropy based applications
  • network monitor placement
  • network anomaly detection

19
Questions Comments
  • ???
Write a Comment
User Comments (0)
About PowerShow.com