Network Processors and Web Servers - PowerPoint PPT Presentation

About This Presentation
Title:

Network Processors and Web Servers

Description:

Shared Memory Architecture SRAM is not cache, but stores frequently accessed ... Accelerator. Host CPU (IOP or iA) SAR'ing. Classification. Metering. Policing ... – PowerPoint PPT presentation

Number of Views:230
Avg rating:3.0/5.0
Slides: 24
Provided by: Zhiy7
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Network Processors and Web Servers


1
Network Processors and Web Servers
  • CS 213
  • LECTURE 17
  • From IBM Technical Report

2
Intel IXP2XXX Network Processor Architecture and
Programming
Prof. Laxmi Bhuyan Computer Science UC
Riverside
3
72
IXP2400
MEv2 2
MEv2 1
DDRAM
Rbuf 64 _at_ 128B
S P I 3 or C S I X
32b
MEv2 3
MEv2 4
Intel XScale Core 32K IC 32K DC
G A S K E T
Tbuf 64 _at_ 128B
PCI (64b) 66 MHz
32b
64b
MEv2 6
MEv2 5
Hash 64/48/128
Scratch 16KB
MEv2 7
MEv2 8
QDR SRAM 1
QDR SRAM 2
CSRs -Fast_wr -UART -Timers -GPIO -BootROM/Slow
Port
E/D Q
E/D Q
Shared Memory Architecture SRAM is not cache,
but stores frequently accessed data Packet
Header goes to ME and payload goes to DRAM
Combined and sent out after processing
18
18
18
18
4
IXP2400 Full-Duplex OC-48 System Implementation
S D R A M
5
IXP2400 Chaining
Limited Control Memory per ME, so pipelining is
necssary Research Parallel/Pipeline Scheduling
of Application Task Graphs
Control Plane Processor
PCI 64/66
IXP2400 Processor
IXP2400 Processor
IXP2400 Processor
2.5Gbs CSIX-L1
2.5 Gbs CSIX-L1
2.5Gbs CSIX-L1
2.5Gbs SPI3
D R A M
Q DR
Q DR
D R A M
Q DR
Q DR
D R A M
Q DR
Q DR
QDR SRAM Queues Tables
QDR SRAM Queues Tables
QDR SRAM Queues Tables
DDRPacket Memory
DDRPacket Memory
DDRPacket Memory
6
18
18
18
IXP2800
Stripe
RDRAM 1
RDRAM 3
RDRAM 2
MEv2 2
MEv2 3
MEv2 4
MEv2 1
Rbuf 64 _at_ 128B
S P I 4 or C S I X
16b
MEv2 7
MEv2 6
MEv2 5
MEv2 8
Intel XScale Core 32K IC 32K DC
G A S K E T
PCI (64b) 66 MHz
Tbuf 64 _at_ 128B
64b
16b
MEv2 10
MEv2 11
MEv2 12
MEv2 9
Hash 48/64/128
Scratch 16KB
MEv2 15
MEv2 14
MEv2 13
QDR SRAM 2
QDR SRAM 1
QDR SRAM 3
MEv2 16
QDR SRAM 4
CSRs -Fast_wr -UART -Timers -GPIO -BootROM/SlowPo
rt
E/D Q
E/D Q
E/D Q
E/D Q
18
18
18
18
18
18
18
18
7
IXP2800 and IXP2400 Comparison
IXP2400
IXP2800
600/400MHz
1.4/1.0 GHz/ 650 MHz
Frequency
1 channel DDR DRAM - 150MHz Up to 2GB
3 channels RDRAM 800/1066MHz Up to 2GB
DRAM Memory
2 channels QDR (or co-processor)
4 channels QDR (or co-processor)
SRAM Memory
Separate 32 bit Tx Rx configurable to SPI-3,
UTOPIA 3 or CSIX_L1
Separate 16 bit Tx Rx configurable to SPI-4 P2
or CSIX_L1
Media Interface
8 (MEv2)
16 (MEv2)
Number of MicroEngines
Dual chip full duplex OC48
Dual chip full duplex OC192
Performance
8
MicroEngine v2
D-Push Bus
S-Push Bus
From Next Neighbor
Control Store 4K/8K Instructions
Local Memory 640 words
128 GPR
128 GPR
128 Next Neighbor
128 S Xfer In
128 D Xfer In
LM Addr 1
2 per CTX
B_op
A_op
LM Addr 0
Prev B
Prev A
P-Random
B_Operand
A_Operand
CRC Unit
Multiply
Lock 0-15
Status and LRU Logic (6-bit)
TAGs 0-15
32-bit ExecutionData Path
Find first bit
CAM
CRC remain
Add, shift, logical
Status
Entry
OtherLocal CSRs
ALU_Out
To Next Neighbor
Timers
128 S Xfer Out
128 D Xfer Out
Timestamp
D-Pull Bus
S-Pull Bus
9
Microengine v2 Features Part 1
  • Clock Rates
  • IXP2400 600/400 MHz
  • IXP2800 - 1.4/1.0 GHz/ 650 MHz
  • Control Store
  • IXP2400 4K Instruction store
  • IXP2800 8K Instruction store
  • Configurable to 4 or 8 threads
  • Each thread has its own program counter,
    registers, signal and wakeup events
  • Generalized Thread Signaling (15 signals per
    thread)
  • Local Storage Options
  • 256 GPRs
  • 256 Transfer Registers
  • 128 Next Neighbor Registers
  • 640 - 32bit words of local memory

10
Microengine v2 Features Part 2
  • CAM (Content Addressable Memory)
  • Performs parallel lookup on 16 - 32bit entries
  • Reports a 9-bit lookup result
  • 4 State bits (software controlled, no impact to
    hardware)
  • Hit entry number that hit Miss LRU entry
  • 4-bit index of Cam entry (Hit) or LRU (Miss)
  • Improves usage of multiple threads on same data
  • CRC hardware
  • IXP2400 - Provides CRC_16, CRC_32
  • IXP2800 - Provides CRC_16, CRC_32, iSCSI, CRC_10
    and CRC_5
  • Accelerates CRC computation for ATM AAL/SAR, ATM
    OAM and Storage applications
  • Multiply hardware
  • Supports 8x24, 16x16 and 32x32
  • Accelerates metering in QoS algorithms
  • DiffServ, MPLS
  • Pseudo Random Number generation
  • Accelerates RED, WRED algorithms
  • 64-bit Time-stamp and 16-bit Profile count

11
Intel XScale Core Overview
  • High-performance, Low-power, 32-bit Embedded RISC
    processor
  • Clock rate
  • IXP2400 600 MHz
  • IXP2800 700/500/325 MHz
  • 32 Kbyte instruction cache
  • 32 Kbyte data cache
  • 2 Kbyte mini-data cache
  • Write buffer
  • Memory management unit

12
Web Server Architecture
13
Dispatching Algorithms
  • Strategies to select the target server of the web
    clusters
  • Static Fastest solution to prevent web server
    bottleneck, but do not consider the current state
    of the servers
  • Dynamic Outperform static algorithms by using
    intelligent decisions, but collecting state
    information and analyzing them cause expensive
    overheads
  • Requirements (1) Low computational complexity
    (2) Full compatibility with web standards (3)
    state information must be readily available
    without much overhead

14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Cluster based Architecture
Needs a Web Switch
18
Distributed Architecture
19
Two Approaches
  • Depends on which OSI protocol layer at which the
    web switch routes inbound packets
  • layer-4 switch Determines the target server
    when TCP SYN packet is received. Also called
    content-blind routing because the server
    selection policy is not based on http contents at
    the application level
  • layer-7 switch (Web Switch) The switch first
    establishes a complete TCP connection with the
    client, examines http request at the application
    level and then selects a server. Can support
    sophisticated dispatching policies, but large
    latency for moving to application level Also
    called Content-aware switches or Layer 5 switches
    in TCP/IP protocol.

20
(No Transcript)
21
Web Switch or Layer 5/7 Switch or Content Aware
Switch
www.yahoo.com
Internet
Image Server
APP. DATA
TCP
IP
Application Server
Switch
GET /cgi-bin/form HTTP/1.1 Host www.yahoo.com
HTML Server
  • Layer 4 switch
  • Content blind
  • Storage overhead
  • Difficult to administer
  • Content-aware (Layer 5/7) switch
  • Partition the servers database over different
    nodes
  • Increase the performance due to improved hit rate
  • Server can be specialized for certain types of
    request

22
Latency
23
Throughput
Write a Comment
User Comments (0)
About PowerShow.com