Title: Supporting realtime
1Supporting real-time offline network traffic
analysis
Chung-Min Chen Munir Cochinwala Allen Mcintosh
Marc Pucci Telcordia Technologies Applied
Research Morristown, NJ, USA
2Outline
- OSS Requirements
- Work Proposal
- Stream Data Management Issues
- Traffic Warehouse
- Tribeca a stream database manager
3OSS Requirements
- OSS Data time frame/ resp. time
- Traffic control seconds minutes
- monitoring
- Service level 15 min. hours
- agreement
- Capacity planning weeks - months
4Work proposal (system overview)
LAN
R
LAN
WAN
R
SNMP agent
EMS
SNMP agent
BPF
tcpdump
adaptor
Stream Engine
DBMS
Live SQL
Live Monitor
Warehouse
Live Monitor
Live Monitor
client
5Real-time traffic analysis state-of-industry
- Ad hoc or canned programs/scripts
- Slow deployment
- No data sharing
- Hard to maintain and little reuse
- Traditional DBMS
- Can beat high line speed (e.g., OC48)?
- Cumbersome in programming (write into DB then
query) - Semantic mismatch between stream and relation
6Stream Data Management
- stream as a first class object (like
relation) - Stream
- a continuous, unbounded sequence of records with
a total ordering - Issues
- Stream algebra
- Data types
- Query language
- Implementation
7Stream Algebra
- Operators
- Selection relatively easy
- Join can be defined nicely (assuming unbounded
buffer) - Demultiplex/multiplex the result could be
multiple streams - Operands
- Stream stream
- Stream relation
8Data Types
- BLOB
- leave the burden to the application developers
- Conventional relational data types
- Need adaptors to convert from raw types to
relational types - Native support for structured binary object (SBO)
- Separate fields at bit level
- Most flexible efficient, but require
re-implementation of the database type system
9Stream Query Language
- How to handle multi-stream output, e.g. group-by?
- select avg(ip_stream.packet_size)
- from ip_stream
- group by ip_stream.source_ip_addr
- How to handle indefinitely waiting in join?
- select from s1, s2
- where s1.packet_id s2.packet_id
- Time window clause, temporal attributes/operators,
10Implementation Issues
- Bounded buffer management
- Time-constrained query processing must beat the
buffer refresh rate - Storage I/O bandwidth requirement (OC48 or
higher?) - Migration of data processing to disk
- Data loss incomplete query
11Traffic Warehouse
- Repository of traffic data for off-line analysis
- Efficient navigation across protocol stack
other business table dimensions - Storage (cluster, parallelism)
- Distributed warehouse approach
- Chen et al. SIGMOD2000
- HTTP, FTP, TCP . IP
- tcpdump, HTTP server logs
- Caceres et al. IEEE Comm. 2000 ATT WorldNet
data warehouse
12Tribeca VLDB96,USENIX98
- Singe stream input (no join)
- Supported operators
- Selection
- Projection
- Aggregates
- Mux/demux multi-stream output
- Time window
- User-defined data type and extraction functions
(in C) - Tested on ATM cell traces
- Achieved 5-7MB/s (30-40k rec/s ) processing rate
on a Sun Sparc10 - former contributors M. Sullivan, Y. Saraiya, A.
Heybey
13Tribeca example query
- Q1 Count the accumulated number of large IP
packets ( gt 250 bytes) transmitted over the link. - Q2 Find the number avg length of TCP/IP
packets for every successive 5 second time
window. Save to a file.
14Tribeca example query
demux on VCI
s1
source_stream s1 is live,
atm_link_1476, AtmCellTrace result_stream r1
is file res1 stream_demux s1.atm.vci p1
atm cells
15Tribeca example query
P2 IP packets
demux
mux
s1
assemble extract
source_stream s1 is live,
atm_link_1476, AtmCellTrace result_stream r1
is file res1 stream_demux s1.atm.vci
p1 stream_proj p1.assemble_ip p2 stream_mux
p2 p3
p3
atm cells
assemble_ip is a user-defined function
16Tribeca example query
IP packets
demux
mux
s1
assemble extract
source_stream s1 is live,
atm_link_1476, AtmCellTrace result_stream r1
is file res1 stream_demux s1.atm.vci
p1 stream_proj p1.assemble_ip p2 stream_mux
p2 p3 stream_qual p3.length.geq 250
p4 stream_agg p4.count
atm cells
length gt 250
p4
count
display
17Tribeca example query
IP packets
demux
mux
s1
assemble extract
source_stream s1 is live,
atm_link_1476, AtmCellTrace result_stream r1
is file res1 stream_demux s1.atm.vci
p1 stream_proj p1.assemble_ip p2 stream_mux
p2 p3 stream_qual p3.length.geq 250
p4 stream_agg p4.count stream_qual p3.type.eq
TCP p5 stream_agg p5.count, p5.length.avg on
fixed window 5 sec r1
atm cells
length gt 250
count
display
p5
type TCP
fixed 5 sec window
count, avg (length)
r1 (save to file)
18Tribeca
- data type inheritance (IP - TCP, UDP)
- window fixed vs. moving user-defined delimiter
- record fixed length, variable length, framing
- implementation optimization
- dual buffers
- minimize data copying passing pointers instead
19Related Activities
- CAIDA
- SLAC
- NLANR
- XIWT
- ATT,HP,Sun,Telcordia,
- passive Internet traffic collection at major
Internet backbone routers
20Related Work
- Tangram Parker90,92
- a model captures streams, sets and parallelism
- more a state machine than a query language
- SEQ Seshadri95,96
- static sequences
- Datacycle Bowen92
- information filtering on broadcast data