Algorithmic Architecture Tradeoffs in Network Processor Design PowerPoint PPT Presentation

presentation player overlay
1 / 46
About This Presentation
Transcript and Presenter's Notes

Title: Algorithmic Architecture Tradeoffs in Network Processor Design


1
Algorithmic - Architecture Trade-offs in
Network Processor Design
  • Introduction
  • IP Packet processing
  • Requirements Existing solutions
  • Algorithm Architecture Tradeoffs
  • Exploitation of RAM resources
  • Conclusion

2
Network Processing definition
  • An integrated circuit
  • software programmable
  • optimized for conducting fast processing tasks on
    data streams in packets at wire speed
  • offload path management and control tasks to
    other components
  • From Grey Bird, NC State Univ, USA

Main() for i1ilt100i
3
Network Processing layers
OSI/ISO()
Title
Description
Layer 7
Application
Appl protocols, user-interface
Layer 6
Presentation
Appl specific format transfer
Layer 5
Session
Connection to process, billing
Layer 4
Transport
Flow control point to point
NP
Layer 3
Network
Connection, switching of links
Layer 2
Data link
Signaling, block transfer
Layer 1
Physical
Transmission, coding, modul.
() OSI/ISO open system interconnection from
International Standard Organization
4
Network Processing layers
NP
5
Network Processing packets
bit
Packet 1
Packet 2
Packet 3
Packet 4
Packet 4
  • Statistics
  • From
  • To

NP
Pass
Pass
Pass
Drop
Store
Packet 4
Packet 3
Packet 4
Packet 1
Network Instruction access control list (ACL)
for packet classification
6
Network Processing packets
Network Data Base
Channel 1 OC192 (2.5Gb/s)
Packet 1
Packet 2
Packet 3
Packet 4
Packet 4
Packet 1
Firewall
Redundant
Encrypt
Packet 3
Packet 4
1a
4a
1b
4b
Needs for fast look-up table Encryption
instruction Must adapt to mixed protocols Must be
able to remove redundancy
Channel B specific 1Gb/s
Channel A OC12 (600Mb/s)
7
Performances
µP Microprocessor FPGA Field Programmable Gate
Array DSP Digital Signal Processing NP
Network Processor
NP is 10x faster than µP
8
Network Processing where is the difference
Network Processor Focused on packets Decision
pipeline Network instruction set Fast binary
decision Real-time
Microprocessor General purpose Cache based
pipeline Wide instruction set Various
mathematics Multi-task
9
(No Transcript)
10
(No Transcript)
11
Filtering Classification
  • The classifier determines the flow an incoming
    packet belongs to looking at one or more fields
    of the packet header.
  • The classification problem can be solved by
    several search approaches
  • bitmap intersection
  • fat inverted segment trees
  • heap on trie data structures

12
Link Scheduling
  • A link scheduler is a kind of arbiter that must
    decide which of the buffered packets will be
    transferred next through an outgoing link of a
    networking node.
  • There are several features by which schedulers
    may be distinguished
  • Fairness
  • Efficiency
  • Worst-case behavior
  • Quality of service guarantees
  • Utilization

13
(No Transcript)
14
Queuing
  • After a packet has been admitted for a possible
    transmission,it must be buffered in the system
    until it will be either chosen by the link
    scheduler for transmission or be discarded in
    case of a congested link.
  • In order to balance the separation of flows and
    the number of flows that can be managed,
    different approaches are possible
  • Single Queue
  • Separate Queue
  • Shared Memory

15
(No Transcript)
16
(No Transcript)
17
Node Architecture
18
Evaluation Models
  • Algorithm and hardware blocks as well as the
    traffic traces are discussed which are used for
    the evaluation of options for QoS preservation in
    multi-service access networks.
  • Required models
  • Algorithm Models
  • Architecture Models
  • Traffic Generation Models

19
Algorithm Models
  • Reproduce the behavior of algorithms for packet
    processing tasks.
  • Behavior is not bounded by assuming any
    properties of computing resources

20
Architecture Models
  • Imitate the timing behavior of hardware building
    blocks which can be used to implement a network
    processor for multi-service access networks. The
    timing together with the statistics generated by
  • algorithm models are used to estimate the load of
    hardware resources.

21
Traffic Generation Models
  • Network traffic must be modeled to stimulate the
    network processor.
  • Inter-arrival time of packets determines the
    frequency of packet processing events.
  • Length of the packet and other packet header
    information decide the QoS a packet will receive.

22
Performance models of Algorithms
  • It is assumed Packets have passed a header
    parser as well as filter, forwarding, and
    classifier stages when they enter the policer.
  • These stages are not modeled for the evaluation
    since
  • their outcome header fields, next hop
    address/link, and a QoS class identifier is
    constant independently of the chosen algorithms.
  • Do not affect the QoS preservation behavior of
    the packet processor in terms of packet delay and
    buffer space.
  • Candidates for special hardware blocks

23
Performance models of Algorithms
  • Policer
  • Nested token bucket policer for green profiles
    and a single yellow profile
  • Link Scheduler
  • WFQ-based scheduling SCFQ , SPFQ , MD-SCFQ
  • Deficit Round-Robin packet scheduling
  • Queue Manager
  • CYQ QM with central yellow queue and tail-drop
  • CYQ-enh. Enhanced QM with central yellow queue
    and tail-drop
  • CYQ-RED CYQ-enh. with RED congestion avoidance
    for yellow traffic
  • YQ-Fair Fair QM with per-flow yellow queues

24
Performance models of Algorithms Statistics
  • Information output by Algorithm Models
  • Information about operation performed
  • CPU- like
  • Other CPU like instructions
  • Branch, Register Copy, Address Offset
    Calculations
  • Memory Accesses
  • Priority Queue Operations
  • Dynamic Memory Allocation

25
Performance models of Algorithms Statistics
  • Asses QoS properties of System
  • Queue Lengths
  • Dropped packets
  • Bucket levels
  • Virtual Time Evolution
  • Delay due to queuing scheduling

26
Counting methodology
  • Off-line counting of operations) For a given
    elementary packet processing
  • task (dequeue, enqueue, policing, etc.) specified
    by a programming language
  • Detect the basic blocks.
  • Determine the control flow between the basic
    blocks.
  • Beginning from the entry point of the control
    flow of the overall task, determine the sets of
    active variables at the entry point and at the
    branch point of every basic block.
  • For every basic block
  • Extract code dealing with priority queues and
    dynamic memory management.Count priority queue
    and dynamic memory allocation operations.
  • I n the remaining code fragment
  • Detect variables and constants which belong to
    the context informationand which are not active.
  • Count the required memory accesses and address
    offset calculations to read these variables from
    memory.
  • Count the required CPU operations.
  • Detect the context variables which have been set
    in the basic block.
  • Count the required write accesses and address
    offset calculations to write these variables back
    to memory at the end of the basic block.
  • Assign the determined counter values to the
    basic block.

27
Architecture Models
  • CPU Timing Model
  • RAM timing model
  • Priority Queue model
  • Dynamic Memory Allocation Model

28
  • Exploitation of RAM Resources
  • Specific characteristics and Application areas
    for different RAM types
  • Impact of memory controller on overall system
    performance
  • current DRAM performance influence
  • Throughput of interface to processor
  • Delay properties of underlying memory core

29
  • Performance Bottlenecks of RAMs
  • DelayRead Write Memory Access
  • SRAM
  • Type of Access
  • DRAM
  • State of RAM
  • Placement of data in RAM
  • Order of Accesses

30
DRAM Timing
31
(No Transcript)
32
SRAM Organization Operation
  • Same operation modes as SDRAM
  • Optimized for speed
  • Implementation of caches
  • Increased pin count
  • More Silicon Area
  • Compared to DRAM
  • Worst case power dissipation higher
  • More expensive

33
Synchronous DRAMS
  • VC SDRAM
  • Enhanced SDRAM
  • Synchronous Graphic RAM
  • DDR SDRAM and SGRAM
  • Direct Rambus DRAM

34
Memory Controller
  • Other design choices
  • Integrated memory controller
  • Stand Alone Controller
  • Synthesizable Block

35
Memory Modelling
36
Stimuli
  • Choice of traffic patterns
  • Public Traffic Traces from the internet
  • Real Traffic Sources
  • Statistical source models

37
System for Evaluation
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com