Architectural Impact of Stateful Networking Applications - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Architectural Impact of Stateful Networking Applications

Description:

Low packet/flow temporal locality. End-point routers & appliances execute stateful apps ... Larger flow-states emphasize network properties impact ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 25
Provided by: javie8
Category:

less

Transcript and Presenter's Notes

Title: Architectural Impact of Stateful Networking Applications


1
Architectural Impact ofStateful Networking
Applications
  • Javier Verdú, Jorge García
  • Mario Nemirovsky, Mateo Valero
  • The 1st Symposium on Architectures forNetworking
    and Communications Systems
  • Princeton, New Jersey, USA
  • October 26-28, 2005

ANCS - I
2
Trends of Internet
  • Important growth of Internet Traffic
  • Consequent Traffic Aggregation increment
  • Low packet/flow temporal locality
  • End-point routers appliances execute stateful
    apps
  • Upper layer packet processing
  • Larger workloads per packet
  • Facing new security issues
  • Improvement of attacks methods
  • Need to spread the knowledge futher than a packet

3
Stateful Application Model
Granularity Levels
State Lifetime
-

4
Research Limitations on Stateful Apps
  • Pool of Benchmark Suites for Network Processors
  • CommBench
  • NetBench
  • NpBench
  • NPForum
  • Lack of Stateful Benchmarks
  • Most of them are stateless benchmarks
  • Creating new benchmarks
  • Reliability???
  • State size
  • State management

5
Talk Outline
  • Introduction
  • Network Traffic Properties
  • Description of Environment
  • Architectural Impact Analysis
  • Summary

6
Network Traffic Properties
  • Traffic Aggregation Level
  • Unique Flow rate in a given window

vs
7
Network Traffic Properties
  • Traffic Aggregation Level
  • Unique Flow rate in a given window
  • Intra-Flow Temporal Distribution
  • How the packets are exchanged?

vs
8
Network Traffic Properties
  • Traffic Aggregation Level
  • Unique Flow rate in a given window
  • Intra-Flow Temporal Distribution
  • How the packets are exchanged?
  • Inter-Flow Temporal Distribution
  • Packet rate between packets of the same flow

vs
vs
9
Benchmark Selection (I)
  • Snort is tuned with four different configurations
  • Stream4
  • Prevents Stick/Snot attacks
  • Flow-Portscan
  • Detects portscanning attacks
  • SfPortscan
  • Detects a variety of portscanning attacks
  • Merged Engines
  • The combination of the above engines
  • Argus is a monitoring/billing benchmark
  • Currently it is included in NO benchmark suite
  • Open source application
  • http//www.qosient.com
  • Equivalent to the commercial tool Cisco NetFlow

10
Benchmark Selection ( II)
  • Obviously, stateless applications keep no
    flowstate
  • The state size may vary a lot between
    applications
  • The state management also may be quite different

11
Evaluation Methodology
  • Instrumented Binary Code ATOM
  • Trace-driven simulation Modified version of
    SMTSim Simulator
  • Simulation length
  • Warming period
  • 10K Packets
  • Processing period
  • 50K Packets
  • Packet selection for the flow lifetime studies
  • Towards analysis of actual application behavior
  • The baseline is an ample configuration
  • ROB Size 256 entries
  • No significant improvements with larger ROBs
  • Physical Regs 192 int, 192 FP
  • No stress due to lack of regs
  • Perceptron Branch Predictor
  • The most powerful configuration

12
Architectural Impact Analysis
  • Computational complexity
  • Available Parallelism
  • Impact of Bottlenecks
  • Branch Prediction
  • Data Cache Behavior

13
Computational Complexity (I)
  • There are no significant differences among
    benchmarks
  • Roughly 35 - 45 of memory accesses
  • Argus is more memory intesive

14
Computational Complexity ( II)
  • The instruction mix is similar along all the
    packets
  • Some applications generate the hardest workload
    in the first packets
  • Other applications show almost constant workload

15
Available Parallelism
  • Processor configuration modified towards avoiding
    any constraint
  • The ILP is independent of the app category
  • It is inherent to the application itself
  • The evaluated apps show low ILP 3,7 IPC

16
Impact of Bottlenecks
  • Stateful apps show very lower performance
  • Roughly 0,6 IPC on average
  • The importance of the packet processing
  • Constant vs concentrated workload
  • Memory Impact
  • 3x 19x of speed up

17
Branch Prediction (I)
  • High branch prediction accuracy on average
  • But we have two branch categories
  • Flow independent similar among packets -gt easy
    to predict
  • Flow dependent flow related -gt sensitive to
    traffic properties

18
Branch Prediction ( II)
  • A single active connection
  • Higher accuracy and no variations among n-th
    packets
  • High traffic aggregation level
  • Lower accuracy and vairations among n-th packets
  • Negative aliasign due to flow dependent branches
  • Most of our applications hide this effect due to
    concentrated workload

No traffic aggregation level
High traffic aggregation level
19
Data Cache Behavior (I)
  • Stateful apps need reduced DL1 to get steady
    miss rate
  • Taking advantage of flow independent memory
    references
  • Almost 100 of DL2 accesses are misses
  • It is unable to keep the state of the active
    flows
  • Larger flow-states emphasize network properties
    impact
  • Getting higher steady state even with low traffic
    aggregation
  • The intra-flow distribution may be more helpful

20
Data Cache Behavior ( II)
  • Negative effects of the memory concentrated in
    the first packets
  • Constant workload applications show similar miss
    rate for every packet
  • Extra miss rates for data structures maintainance
  • Merged Engines from 1,5 to 5 on average

21
Summary (I)
  • We present the architectural impact of Stateful
    Networking Applications
  • An important new type of applications
  • The behavior along the packets of a TCP
    connection
  • Constant workload for the packets of a connection
  • Workload concentrated in the first packets of a
    connection
  • Analysis of network traffic properties
  • Branch prediction and data cache are sensitive to
    them

22
Summary ( II)
  • Reduced IPC on average
  • L2 is unable to maintain the required states of
    active flows
  • Branch prediction also may improve once solved
    memory bottleneck
  • Other stateful applications may present different
    valuable results, but
  • The critical bottlenecks even may be more
    stressed
  • Our concern is
  • To have more sample applications to evaluate
  • To analyse the apps in a more realistic
    environment
  • Running simultaneously a number of applications

23
  • Questions...

24
Traffic Traces
  • Filtered Traffic Trace
  • Bidirectional TCP connections
  • Generating Synthetic Traffic Traces
  • Mixing different traffic traces
  • microTimestamp sorting based
  • We are assuming a set of traces with the same
    bandwidth link
  • In our case MRA link
  • Avoiding the aliasing of IP addresses among
    aggregated traces
  • The set of traces are originally sanitized
  • The resulting traffic trace shows roughly 1Gbps
  • 170K active flows
  • Achieved from the original OC12 MRA link (622Mbps)
Write a Comment
User Comments (0)
About PowerShow.com