Title: Architectural Impact of Stateful Networking Applications
1Architectural Impact ofStateful Networking
Applications
- Javier Verdú, Jorge García
- Mario Nemirovsky, Mateo Valero
- The 1st Symposium on Architectures forNetworking
and Communications Systems - Princeton, New Jersey, USA
- October 26-28, 2005
ANCS - I
2Trends of Internet
- Important growth of Internet Traffic
- Consequent Traffic Aggregation increment
- Low packet/flow temporal locality
- End-point routers appliances execute stateful
apps - Upper layer packet processing
- Larger workloads per packet
- Facing new security issues
- Improvement of attacks methods
- Need to spread the knowledge futher than a packet
3Stateful Application Model
Granularity Levels
State Lifetime
-
4Research Limitations on Stateful Apps
- Pool of Benchmark Suites for Network Processors
- CommBench
- NetBench
- NpBench
- NPForum
- Lack of Stateful Benchmarks
- Most of them are stateless benchmarks
- Creating new benchmarks
- Reliability???
- State size
- State management
5Talk Outline
- Introduction
- Network Traffic Properties
- Description of Environment
- Architectural Impact Analysis
- Summary
6Network Traffic Properties
- Traffic Aggregation Level
- Unique Flow rate in a given window
vs
7Network Traffic Properties
- Traffic Aggregation Level
- Unique Flow rate in a given window
- Intra-Flow Temporal Distribution
- How the packets are exchanged?
vs
8Network Traffic Properties
- Traffic Aggregation Level
- Unique Flow rate in a given window
- Intra-Flow Temporal Distribution
- How the packets are exchanged?
- Inter-Flow Temporal Distribution
- Packet rate between packets of the same flow
vs
vs
9Benchmark Selection (I)
- Snort is tuned with four different configurations
- Stream4
- Prevents Stick/Snot attacks
- Flow-Portscan
- Detects portscanning attacks
- SfPortscan
- Detects a variety of portscanning attacks
- Merged Engines
- The combination of the above engines
- Argus is a monitoring/billing benchmark
- Currently it is included in NO benchmark suite
- Open source application
- http//www.qosient.com
- Equivalent to the commercial tool Cisco NetFlow
10Benchmark Selection ( II)
- Obviously, stateless applications keep no
flowstate - The state size may vary a lot between
applications - The state management also may be quite different
11Evaluation Methodology
- Instrumented Binary Code ATOM
- Trace-driven simulation Modified version of
SMTSim Simulator - Simulation length
- Warming period
- 10K Packets
- Processing period
- 50K Packets
- Packet selection for the flow lifetime studies
- Towards analysis of actual application behavior
- The baseline is an ample configuration
- ROB Size 256 entries
- No significant improvements with larger ROBs
- Physical Regs 192 int, 192 FP
- No stress due to lack of regs
- Perceptron Branch Predictor
- The most powerful configuration
12Architectural Impact Analysis
- Computational complexity
- Available Parallelism
- Impact of Bottlenecks
- Branch Prediction
- Data Cache Behavior
13Computational Complexity (I)
- There are no significant differences among
benchmarks - Roughly 35 - 45 of memory accesses
- Argus is more memory intesive
14Computational Complexity ( II)
- The instruction mix is similar along all the
packets - Some applications generate the hardest workload
in the first packets - Other applications show almost constant workload
15Available Parallelism
- Processor configuration modified towards avoiding
any constraint - The ILP is independent of the app category
- It is inherent to the application itself
- The evaluated apps show low ILP 3,7 IPC
16Impact of Bottlenecks
- Stateful apps show very lower performance
- Roughly 0,6 IPC on average
- The importance of the packet processing
- Constant vs concentrated workload
- Memory Impact
- 3x 19x of speed up
17Branch Prediction (I)
- High branch prediction accuracy on average
- But we have two branch categories
- Flow independent similar among packets -gt easy
to predict - Flow dependent flow related -gt sensitive to
traffic properties
18Branch Prediction ( II)
- A single active connection
- Higher accuracy and no variations among n-th
packets - High traffic aggregation level
- Lower accuracy and vairations among n-th packets
- Negative aliasign due to flow dependent branches
- Most of our applications hide this effect due to
concentrated workload
No traffic aggregation level
High traffic aggregation level
19Data Cache Behavior (I)
- Stateful apps need reduced DL1 to get steady
miss rate - Taking advantage of flow independent memory
references - Almost 100 of DL2 accesses are misses
- It is unable to keep the state of the active
flows - Larger flow-states emphasize network properties
impact - Getting higher steady state even with low traffic
aggregation - The intra-flow distribution may be more helpful
20Data Cache Behavior ( II)
- Negative effects of the memory concentrated in
the first packets - Constant workload applications show similar miss
rate for every packet - Extra miss rates for data structures maintainance
- Merged Engines from 1,5 to 5 on average
21Summary (I)
- We present the architectural impact of Stateful
Networking Applications - An important new type of applications
- The behavior along the packets of a TCP
connection - Constant workload for the packets of a connection
- Workload concentrated in the first packets of a
connection - Analysis of network traffic properties
- Branch prediction and data cache are sensitive to
them
22Summary ( II)
- Reduced IPC on average
- L2 is unable to maintain the required states of
active flows - Branch prediction also may improve once solved
memory bottleneck - Other stateful applications may present different
valuable results, but - The critical bottlenecks even may be more
stressed - Our concern is
- To have more sample applications to evaluate
- To analyse the apps in a more realistic
environment - Running simultaneously a number of applications
23 24Traffic Traces
- Filtered Traffic Trace
- Bidirectional TCP connections
- Generating Synthetic Traffic Traces
- Mixing different traffic traces
- microTimestamp sorting based
- We are assuming a set of traces with the same
bandwidth link - In our case MRA link
- Avoiding the aliasing of IP addresses among
aggregated traces - The set of traces are originally sanitized
- The resulting traffic trace shows roughly 1Gbps
- 170K active flows
- Achieved from the original OC12 MRA link (622Mbps)