Title: Network Processors and Web Servers
1Network Processors and Web Servers
- CS 213
- LECTURE 17
- From IBM Technical Report
2Intel IXP2XXX Network Processor Architecture and
Programming
Prof. Laxmi Bhuyan Computer Science UC
Riverside
372
IXP2400
MEv2 2
MEv2 1
DDRAM
Rbuf 64 _at_ 128B
S P I 3 or C S I X
32b
MEv2 3
MEv2 4
Intel XScale Core 32K IC 32K DC
G A S K E T
Tbuf 64 _at_ 128B
PCI (64b) 66 MHz
32b
64b
MEv2 6
MEv2 5
Hash 64/48/128
Scratch 16KB
MEv2 7
MEv2 8
QDR SRAM 1
QDR SRAM 2
CSRs -Fast_wr -UART -Timers -GPIO -BootROM/Slow
Port
E/D Q
E/D Q
Shared Memory Architecture SRAM is not cache,
but stores frequently accessed data Packet
Header goes to ME and payload goes to DRAM
Combined and sent out after processing
18
18
18
18
4IXP2400 Full-Duplex OC-48 System Implementation
S D R A M
5IXP2400 Chaining
Limited Control Memory per ME, so pipelining is
necssary Research Parallel/Pipeline Scheduling
of Application Task Graphs
Control Plane Processor
PCI 64/66
IXP2400 Processor
IXP2400 Processor
IXP2400 Processor
2.5Gbs CSIX-L1
2.5 Gbs CSIX-L1
2.5Gbs CSIX-L1
2.5Gbs SPI3
D R A M
Q DR
Q DR
D R A M
Q DR
Q DR
D R A M
Q DR
Q DR
QDR SRAM Queues Tables
QDR SRAM Queues Tables
QDR SRAM Queues Tables
DDRPacket Memory
DDRPacket Memory
DDRPacket Memory
618
18
18
IXP2800
Stripe
RDRAM 1
RDRAM 3
RDRAM 2
MEv2 2
MEv2 3
MEv2 4
MEv2 1
Rbuf 64 _at_ 128B
S P I 4 or C S I X
16b
MEv2 7
MEv2 6
MEv2 5
MEv2 8
Intel XScale Core 32K IC 32K DC
G A S K E T
PCI (64b) 66 MHz
Tbuf 64 _at_ 128B
64b
16b
MEv2 10
MEv2 11
MEv2 12
MEv2 9
Hash 48/64/128
Scratch 16KB
MEv2 15
MEv2 14
MEv2 13
QDR SRAM 2
QDR SRAM 1
QDR SRAM 3
MEv2 16
QDR SRAM 4
CSRs -Fast_wr -UART -Timers -GPIO -BootROM/SlowPo
rt
E/D Q
E/D Q
E/D Q
E/D Q
18
18
18
18
18
18
18
18
7IXP2800 and IXP2400 Comparison
IXP2400
IXP2800
600/400MHz
1.4/1.0 GHz/ 650 MHz
Frequency
1 channel DDR DRAM - 150MHz Up to 2GB
3 channels RDRAM 800/1066MHz Up to 2GB
DRAM Memory
2 channels QDR (or co-processor)
4 channels QDR (or co-processor)
SRAM Memory
Separate 32 bit Tx Rx configurable to SPI-3,
UTOPIA 3 or CSIX_L1
Separate 16 bit Tx Rx configurable to SPI-4 P2
or CSIX_L1
Media Interface
8 (MEv2)
16 (MEv2)
Number of MicroEngines
Dual chip full duplex OC48
Dual chip full duplex OC192
Performance
8MicroEngine v2
D-Push Bus
S-Push Bus
From Next Neighbor
Control Store 4K/8K Instructions
Local Memory 640 words
128 GPR
128 GPR
128 Next Neighbor
128 S Xfer In
128 D Xfer In
LM Addr 1
2 per CTX
B_op
A_op
LM Addr 0
Prev B
Prev A
P-Random
B_Operand
A_Operand
CRC Unit
Multiply
Lock 0-15
Status and LRU Logic (6-bit)
TAGs 0-15
32-bit ExecutionData Path
Find first bit
CAM
CRC remain
Add, shift, logical
Status
Entry
OtherLocal CSRs
ALU_Out
To Next Neighbor
Timers
128 S Xfer Out
128 D Xfer Out
Timestamp
D-Pull Bus
S-Pull Bus
9Microengine v2 Features Part 1
- Clock Rates
- IXP2400 600/400 MHz
- IXP2800 - 1.4/1.0 GHz/ 650 MHz
- Control Store
- IXP2400 4K Instruction store
- IXP2800 8K Instruction store
- Configurable to 4 or 8 threads
- Each thread has its own program counter,
registers, signal and wakeup events - Generalized Thread Signaling (15 signals per
thread) - Local Storage Options
- 256 GPRs
- 256 Transfer Registers
- 128 Next Neighbor Registers
- 640 - 32bit words of local memory
10Microengine v2 Features Part 2
- CAM (Content Addressable Memory)
- Performs parallel lookup on 16 - 32bit entries
- Reports a 9-bit lookup result
- 4 State bits (software controlled, no impact to
hardware) - Hit entry number that hit Miss LRU entry
- 4-bit index of Cam entry (Hit) or LRU (Miss)
- Improves usage of multiple threads on same data
- CRC hardware
- IXP2400 - Provides CRC_16, CRC_32
- IXP2800 - Provides CRC_16, CRC_32, iSCSI, CRC_10
and CRC_5 - Accelerates CRC computation for ATM AAL/SAR, ATM
OAM and Storage applications - Multiply hardware
- Supports 8x24, 16x16 and 32x32
- Accelerates metering in QoS algorithms
- DiffServ, MPLS
- Pseudo Random Number generation
- Accelerates RED, WRED algorithms
- 64-bit Time-stamp and 16-bit Profile count
11Intel XScale Core Overview
- High-performance, Low-power, 32-bit Embedded RISC
processor - Clock rate
- IXP2400 600 MHz
- IXP2800 700/500/325 MHz
- 32 Kbyte instruction cache
- 32 Kbyte data cache
- 2 Kbyte mini-data cache
- Write buffer
- Memory management unit
12Web Server Architecture
13Dispatching Algorithms
- Strategies to select the target server of the web
clusters - Static Fastest solution to prevent web server
bottleneck, but do not consider the current state
of the servers - Dynamic Outperform static algorithms by using
intelligent decisions, but collecting state
information and analyzing them cause expensive
overheads - Requirements (1) Low computational complexity
(2) Full compatibility with web standards (3)
state information must be readily available
without much overhead
14(No Transcript)
15(No Transcript)
16(No Transcript)
17Cluster based Architecture
Needs a Web Switch
18Distributed Architecture
19Two Approaches
- Depends on which OSI protocol layer at which the
web switch routes inbound packets - layer-4 switch Determines the target server
when TCP SYN packet is received. Also called
content-blind routing because the server
selection policy is not based on http contents at
the application level - layer-7 switch (Web Switch) The switch first
establishes a complete TCP connection with the
client, examines http request at the application
level and then selects a server. Can support
sophisticated dispatching policies, but large
latency for moving to application level Also
called Content-aware switches or Layer 5 switches
in TCP/IP protocol.
20(No Transcript)
21Web Switch or Layer 5/7 Switch or Content Aware
Switch
www.yahoo.com
Internet
Image Server
APP. DATA
TCP
IP
Application Server
Switch
GET /cgi-bin/form HTTP/1.1 Host www.yahoo.com
HTML Server
- Layer 4 switch
- Content blind
- Storage overhead
- Difficult to administer
- Content-aware (Layer 5/7) switch
- Partition the servers database over different
nodes - Increase the performance due to improved hit rate
- Server can be specialized for certain types of
request
22Latency
23Throughput