A Data Stream Management System for Network Traffic Management PowerPoint PPT Presentation

presentation player overlay
1 / 18
About This Presentation
Transcript and Presenter's Notes

Title: A Data Stream Management System for Network Traffic Management


1
A Data Stream Management System for Network
Traffic Management
Shivnath Babu
Stanford University
Lakshminarayanan Subramanian Univ. California,
Berkeley
Jennifer Widom Stanford University
NRDM, Santa Barbara, CA, May 25, 2001
2
Network Traffic Management
  • Large networks are growing complex and difficult
    to manage
  • Increasing demands, overprovisioning, hardware
    changes, manual configuration
  • Lack of information to configure network for
    effective usage
  • Network traffic management is becoming an
    important part of
  • the Internet infrastructure
  • Collect data
  • E.g., packet traces, network-flow data, SNMP
    data
  • Process data
  • E.g., compute link utilization, per-hop delays,
    traffic demands
  • Deploy mechanisms to control traffic
  • E.g., change routing parameters
  • Data management forms a core part of traffic
    management

3
Traffic Management Data Collection
  • Many data sources
  • Packet and flow traces
  • Router forwarding tables and configuration data
  • SNMP data
  • Active measurements of packet delay, link
    utilization
  • Data is collected continuously
  • Networks need to be 247 for everything
  • Huge and fast-growing databases
  • Many current traffic management systems store
    collected data in file systems or data warehouses

4
Traffic Management Data Processing
  • Sophisticated data processing is required
  • Measuring link utilization
  • Aggregate packet traces
  • Maintaining network topology
  • Join SNMP data from different network elements
  • Deriving traffic demands
  • Join network flow traces, router forwarding
    tables and configuration data, and SNMP data
  • Anomaly detection, traffic modeling, traffic
    prediction, and many others
  • Most current traffic management systems process
    data using ad-hoc scripts or software toolkits

5
Challenge in Data Management Online Data
Processing
  • Most current traffic management applications
    process data offline
  • Huge volume of data
  • Complex processing involved
  • Offline processing is indeed appropriate for some
    applications
  • E.g., capacity planning, determining pricing
    plans
  • Many traffic management applications need online
    processing
  • E.g., congestion cause detection, resource
    allocation for guaranteed QoS, detecting
    denial-of-service attacks, detecting
    Service-Level Agreement violations, admission
    control and traffic policing

6
Online Processing
  • Whats wrong with using a file system and
    procedural processing?
  • Difficult to maintain and reuse (not a long term
    solution)
  • Whats wrong with using a Database Management
    System (DBMS)?
  • DBMS expects all data to be managed as persistent
    data sets
  • DBMS assumes one-time queries against stored
    and finite data

7
A Data Stream Management System (DSMS) for Online
Processing
  • Data Streams are the appropriate model for online
    processing
  • Data is changing frequently (often exclusively
    though insertions)
  • It is impractical to operate on same data
    multiple times
  • Continuous queries -- issued once and run
    forever
  • Performance
  • Need continuous-query optimization
  • Need adaptive query-optimization
  • A Data Stream Management System for traffic
    management
  • Idea Support online processing with continuous
    queries over data streams

8
A Data Stream Management System for Online
Processing (contd)
Applications based on online processing
Continuous Queries
Data Management System
Streams
9
Continuous Query over a Single Data Stream
Data Stream
  • Many options with different ramifications
  • Stream is infinite, append-only (e.g., packet
    traces)
  • size of A is unbounded for a filter query --
    cannot store A
  • Stream out A -- but self-join query requires
    unbounded intermediate
  • state to compute A
  • Updates to tuples in A -- e.g., aggregation
    query
  • Stream has updates, deletions (e.g., SNMP data)
  • often require more intermediate state to
    compute A

10
Operator Architecture in a DSMS
  • Stream
  • Append-only semantics Result tuples that wont
    change later
  • Update semantics Updates to current result
  • Store Result tuples that could change later
  • Scratch Intermediate state to compute future
    results
  • Throw Unneeded data

11
Example Queries from Traffic Management
  • Single packet trace input data stream (IP
    headers over a link)
  • Continuous query 1 Link utilization (total
    bytes sent over the link)
  • Store -- sum of packet lengths
  • Stream -- empty
  • Scratch -- empty
  • Continuous query 2 Number of flows per protocol

Per-Protocol flows counter
Packet Trace
Stream
12
Example Queries from Traffic Management (contd)
  • Continuous query 3 Join packet traces collected
    from different points in the network to measure
    packet delays (or identify routes)

HT 1
Packet trace 1
Scratch
Stream
HT 2
Symmetric Hash-Join
Packet trace 2
  • Efficient intermediate state management
  • Intermediate state is unbounded theoretically
  • Use of constraints can reduce intermediate
    state
  • Can reclaim memory after each match
  • Approximate answers can further reduce
    intermediate state
  • Can you trade precision for state?

13
Examples Queries from Traffic Management (contd)
  • Continuous query 4 Identify top 5 (source IP
    address, destination IP address) Pairs with
    maximum bandwidth consumption over a link
  • Non-trivial query over a stream
  • Number of distinct Pairs can vary
  • Bandwidth consumption of each Pair can vary
  • How much intermediate state is needed?

Bandwidth Consumption Of Pairs
Count Distinct Pairs
Stream
Top 5 Pairs
Packet trace
14
Further Challenges in Data Management
Distributed Stream Processing
  • Data is collected from different points in a
    network
  • Structure of an Internet Service Provider imposes
    restrictions
  • Core routers are sensitive (so are the network
    operators ?)
  • Sending collected data to a central processing
    site is harmful
  • Additional load on the network
  • Hinders real-time processing
  • Wont scale with the network and traffic
  • Truly distributed processing is infeasible for
    many queries
  • Goal minimize communication traffic
  • Trade communication traffic for precision

15
Example Queries from Traffic Management (contd)
  • Continuous query 5 Identify top 5 of
    destination IP addresses with maximum bandwidth
    consumption (to detect denial-of-service attacks)

CQ 5 local
CQ 5 local
CQ 5 global
Stream
Stream
Stream
CQ 5 local
  • Hierarchical processing structure could also be
    useful

16
Summary of Basic Problems and Techniques
  • Continuous queries over data streams is a unique
    combination of
  • Online processing
  • Storage constraints -- amount of memory
    available is bounded
  • Query result size may be unbounded
  • Intermediate state may be unbounded
  • Relevant techniques
  • Online data structures (not build-and-throw)
  • Summarization samples, histograms, wavelets,
    fractals
  • Adaptivity
  • Data characteristics
  • Flow rates
  • Amount of memory

17
Some Simplifying Assumptions
  • In talk, but not necessarily in work
  • Traffic management data is clean
  • Data is dirty incomplete, inconsistent
  • Temporal uncertainties
  • Could be reduced as the importance of traffic
    management is realized
  • Traffic management data is tuple-oriented
  • Often true
  • Implications for query language

18
Conclusions
  • Traffic management requires efficient data
    management
  • Many traffic management applications benefit from
    online data processing
  • Case for a Data Stream Management System (DSMS)
  • Provides continuous queries over data streams for
    online processing
  • Many interesting research issues
  • Work is in progress
  • Additional references
  • S. Babu and J. Widom. Continuous queries over
    data streams
  • http//dbpubs.stanford.edu/pub/2001-9
  • STREAM project homepage
  • http//www-db.stanford.edu/stream
Write a Comment
User Comments (0)
About PowerShow.com