A Distributed Scalable Multithreaded Network Processor - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

A Distributed Scalable Multithreaded Network Processor

Description:

T3 dequeues Q2 and sends packet to switch fabric interface (IOU). T4 reads packets from the fabric and stores them in the packet storage memory and writes to Q3. ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 26
Provided by: ber110
Category:

less

Transcript and Presenter's Notes

Title: A Distributed Scalable Multithreaded Network Processor


1
A Distributed Scalable Multi-threaded Network
Processor
  • Behnam Robatmili
  • University of Tehran

2
Outline
  • Introduction
  • Design Principle
  • Architecture
  • Thread Unit
  • Decode Issue
  • Functional Units
  • Switches
  • Software Model
  • Conclusions

3
Introduction
Network Processing Limits Higher Line Speed More
Complicated and Flexible Protocols
4
The Best Solution
  • Application Specific Instruction Processors
    (ASIP), or Specific Purpose Processors (SPP)
    called Network Processors.
  • This means using special purpose processors,
    which are optimized for specific tasks.
  • The best examples DSPs and I/O processors.

5
Comparison between different methods of
implementations
6
Comparison between implementation different
technologies
7
Design Principle
  • Designing a processor consists of designing its
    instruction-set and architecture or the data path
    and controller.
  • software and hardware properties of such a device
    are very important.
  • One must study processing services required in
    networking.

8
Instruction Set Issues
  • Pattern Matching As prefix matching or exact
    matching which is used in lookup and
    classification and most other parts of routers
    frequently.
  • Lookup Hash tables and tree or trie-based lookup
    which is needed in forwarding engines and many
    other parts.
  • Computation Calculating CRC and Checksums which
    is needed for lower layers processing that must
    be done for input packets and efficiently.

9
Instruction Set Issues2
  • Data Manipulation Bit-wise and byte-wise access
    to data for extracting fields of protocols. It is
    used in parsing packets and implementing
    protocols.
  • Queuing Packet queuing and Quality of Service.
    Output or input packets must be queued before
    getting their needed services.
  • Control Processing Managing tables (Route tables
    or lookup tables) and using timers for
    implementing protocols and QOS.
  • Segmentation and Reassembling Many time a large
    packet must be break to smaller pieces of data
    called cells like ATM cells and then processing
    must be perform on these cells. These actions may
    be needed before the switch node in many routers
    in fast path.

10
Intel IXP Additional Instructions
  • Simple RISC cores augmented with
  • FIND_BSET Finds first bit set in any 16 bit
    register field.
  • HASHx_64 Performs 64 bit hash.
  • CTX_ARB Swaps contents of arguments and wake on
    events.

11
Miscellaneous Example Instructions
  • port oriented bit-wise move instructions like
    move port1_at_bit1 port2_at_bit2
  • block transfer instructions to solve to problems
    associated with DMA
  • compute and jump subb r1 1h L1
  • FSM-Support and Thread-Switch support

12
Architecture Issues
  • GOLDEN RULE
  • In a packet processing system most of processing
    preformed on a packet is independent of other
    packets.
  • NP architectures can utilize parallelism in level
    of Instructions (ILP) and in level of packets
    (PLP).

13
Architecture Example (Agree)
14
Architecture Example (TOP Core)
15
Architecture Example (Intel IXP)
16
Design Architecture
A Simultaneous Multi Threading Processor
17
Architecture
  • In this processor
  • there are separate register files, program
    counters, stack and status registers in Thread
    Unit for each thread.
  • Instruction cache and functional units are shared
    between hardware threads.

18
Thread Unit
Performs Instruction Level Parallelism for each
Thread. Hold separate parts for each Thread.
19
Decode Issue
20
Functional Units
21
Dispatcher Collector
  • Simple Switch Fabric to perform TLP among
    different Threads in the system.
  • Access and control Functional Units
  • Unique input Frames.
  • Dispatcher Instruction Frame
  • Collector Result Frame

22
Processor Stages
23
Software Model
Different Packet Processing Operations in
different static Threads. Processes communicate
with each others using Queue Units.
24
Sample Scenario
  • T1 reads a packets from NIC and stores them in
    the data memory, writes to Q1.
  • T2 dequeues Q1 and performs IP lookup operation
    on the packet and puts in Q2.
  • T3 dequeues Q2 and sends packet to switch fabric
    interface (IOU).
  • T4 reads packets from the fabric and stores them
    in the packet storage memory and writes to Q3.
  • T5 dequeues Q3, applies QOS and if it should be
    sent, sends it to NIC.

25
Conclusions
  • Advantages
  • Flexible
  • Distributed
  • Performance
  • Disadvantages
  • Additional Information sent and received
Write a Comment
User Comments (0)
About PowerShow.com