Title: A Data Stream Management System for Network Traffic Management
1A Data Stream Management System for Network
Traffic Management
Shivnath Babu
Stanford University
Lakshminarayanan Subramanian Univ. California,
Berkeley
Jennifer Widom Stanford University
NRDM, Santa Barbara, CA, May 25, 2001
2Network Traffic Management
- Large networks are growing complex and difficult
to manage - Increasing demands, overprovisioning, hardware
changes, manual configuration - Lack of information to configure network for
effective usage
- Network traffic management is becoming an
important part of - the Internet infrastructure
- Collect data
- E.g., packet traces, network-flow data, SNMP
data - Process data
- E.g., compute link utilization, per-hop delays,
traffic demands - Deploy mechanisms to control traffic
- E.g., change routing parameters
- Data management forms a core part of traffic
management
3Traffic Management Data Collection
- Many data sources
- Packet and flow traces
- Router forwarding tables and configuration data
- SNMP data
- Active measurements of packet delay, link
utilization - Data is collected continuously
- Networks need to be 247 for everything
- Huge and fast-growing databases
- Many current traffic management systems store
collected data in file systems or data warehouses
4Traffic Management Data Processing
- Sophisticated data processing is required
- Measuring link utilization
- Aggregate packet traces
- Maintaining network topology
- Join SNMP data from different network elements
- Deriving traffic demands
- Join network flow traces, router forwarding
tables and configuration data, and SNMP data - Anomaly detection, traffic modeling, traffic
prediction, and many others - Most current traffic management systems process
data using ad-hoc scripts or software toolkits
5Challenge in Data Management Online Data
Processing
- Most current traffic management applications
process data offline - Huge volume of data
- Complex processing involved
- Offline processing is indeed appropriate for some
applications - E.g., capacity planning, determining pricing
plans - Many traffic management applications need online
processing - E.g., congestion cause detection, resource
allocation for guaranteed QoS, detecting
denial-of-service attacks, detecting
Service-Level Agreement violations, admission
control and traffic policing
6Online Processing
- Whats wrong with using a file system and
procedural processing? - Difficult to maintain and reuse (not a long term
solution) - Whats wrong with using a Database Management
System (DBMS)? - DBMS expects all data to be managed as persistent
data sets - DBMS assumes one-time queries against stored
and finite data
7A Data Stream Management System (DSMS) for Online
Processing
- Data Streams are the appropriate model for online
processing - Data is changing frequently (often exclusively
though insertions) - It is impractical to operate on same data
multiple times - Continuous queries -- issued once and run
forever - Performance
- Need continuous-query optimization
- Need adaptive query-optimization
- A Data Stream Management System for traffic
management - Idea Support online processing with continuous
queries over data streams
8A Data Stream Management System for Online
Processing (contd)
Applications based on online processing
Continuous Queries
Data Management System
Streams
9Continuous Query over a Single Data Stream
Data Stream
- Many options with different ramifications
- Stream is infinite, append-only (e.g., packet
traces) - size of A is unbounded for a filter query --
cannot store A - Stream out A -- but self-join query requires
unbounded intermediate - state to compute A
- Updates to tuples in A -- e.g., aggregation
query - Stream has updates, deletions (e.g., SNMP data)
- often require more intermediate state to
compute A
10Operator Architecture in a DSMS
- Stream
- Append-only semantics Result tuples that wont
change later - Update semantics Updates to current result
- Store Result tuples that could change later
- Scratch Intermediate state to compute future
results - Throw Unneeded data
11Example Queries from Traffic Management
- Single packet trace input data stream (IP
headers over a link) - Continuous query 1 Link utilization (total
bytes sent over the link) - Store -- sum of packet lengths
- Stream -- empty
- Scratch -- empty
- Continuous query 2 Number of flows per protocol
Per-Protocol flows counter
Packet Trace
Stream
12Example Queries from Traffic Management (contd)
- Continuous query 3 Join packet traces collected
from different points in the network to measure
packet delays (or identify routes)
HT 1
Packet trace 1
Scratch
Stream
HT 2
Symmetric Hash-Join
Packet trace 2
- Efficient intermediate state management
- Intermediate state is unbounded theoretically
- Use of constraints can reduce intermediate
state - Can reclaim memory after each match
- Approximate answers can further reduce
intermediate state - Can you trade precision for state?
13Examples Queries from Traffic Management (contd)
- Continuous query 4 Identify top 5 (source IP
address, destination IP address) Pairs with
maximum bandwidth consumption over a link - Non-trivial query over a stream
- Number of distinct Pairs can vary
- Bandwidth consumption of each Pair can vary
- How much intermediate state is needed?
Bandwidth Consumption Of Pairs
Count Distinct Pairs
Stream
Top 5 Pairs
Packet trace
14Further Challenges in Data Management
Distributed Stream Processing
- Data is collected from different points in a
network - Structure of an Internet Service Provider imposes
restrictions - Core routers are sensitive (so are the network
operators ?) - Sending collected data to a central processing
site is harmful - Additional load on the network
- Hinders real-time processing
- Wont scale with the network and traffic
- Truly distributed processing is infeasible for
many queries - Goal minimize communication traffic
- Trade communication traffic for precision
15Example Queries from Traffic Management (contd)
- Continuous query 5 Identify top 5 of
destination IP addresses with maximum bandwidth
consumption (to detect denial-of-service attacks)
CQ 5 local
CQ 5 local
CQ 5 global
Stream
Stream
Stream
CQ 5 local
- Hierarchical processing structure could also be
useful
16Summary of Basic Problems and Techniques
- Continuous queries over data streams is a unique
combination of - Online processing
- Storage constraints -- amount of memory
available is bounded - Query result size may be unbounded
- Intermediate state may be unbounded
- Relevant techniques
- Online data structures (not build-and-throw)
- Summarization samples, histograms, wavelets,
fractals - Adaptivity
- Data characteristics
- Flow rates
- Amount of memory
17Some Simplifying Assumptions
- In talk, but not necessarily in work
- Traffic management data is clean
- Data is dirty incomplete, inconsistent
- Temporal uncertainties
- Could be reduced as the importance of traffic
management is realized - Traffic management data is tuple-oriented
- Often true
- Implications for query language
18Conclusions
- Traffic management requires efficient data
management - Many traffic management applications benefit from
online data processing - Case for a Data Stream Management System (DSMS)
- Provides continuous queries over data streams for
online processing - Many interesting research issues
- Work is in progress
- Additional references
- S. Babu and J. Widom. Continuous queries over
data streams - http//dbpubs.stanford.edu/pub/2001-9
- STREAM project homepage
- http//www-db.stanford.edu/stream
-