Title: Gigabit IP Routing on Raw
1Gigabit IP Routing on Raw
- Gleb Chuvpilo, David Wentzlaff,
- and Saman Amarasinghe
- Laboratory for Computer Science
- Massachusetts Institute of Technology
2Talk at a Glance
- Project goal
- Raw processor overview
- Router design
- Rotating crossbar algorithm
- Implementation and results
- Future work
3We are on...
- Project goal
- Raw processor overview
- Router design
- Rotating crossbar algorithm
- Implementation and results
- Future work
4Project Goal
- Build a fast IP router on a
- general-purpose architecture
- WHY?
- Its flexible ? configure for changing protocols
and services! - Its cheap ? economies of scale!
5We are on...
- Project goal
- Raw processor overview
- Router design
- Rotating crossbar algorithm
- Implementation and results
- Future work
6Technology
- Good news miniaturization
- Smaller transistors
- Shorter clock cycles
- Bad news long wires
- High data propagation delays
- Low energy efficiency
7Also
- Need sufficient I/O bandwidth
- for data streams
8 9The Raw Processor
10Raw Fast Facts
- 16 MIPS-like tiles on a single die
- 2 MB SRAM on-chip
- 1,080 signal I/O pins
- 201 Gbps of external chip bandwidth
11Plus
- Raw is easily scalable
- up to 1024 tiles!
12Raw Layout
13Communication Between Tiles
- Two identical static networks
- Two identical dynamic networks
14Static Networks
- Destinations known at compile time
- Message size known at compile time
- Cycle-by-cycle switch schedule
- Three cycle nearest neighbor send-to-use latency
- No processing overhead
15Static Network Illustrated
16Dynamic Networks
- Unpredictable events
- external asynchronous interrupts
- cache misses
- 15 to 30 cycle nearest neighbor send-to-use
latency (message header processing overhead)
17We are on...
- Project goal
- Raw processor overview
- Router design
- Rotating crossbar algorithm
- Implementation and results
- Future work
18 19Given Four Networks...
20...and Sixteen Tiles
21Need
- Map semi-dynamic communication to a
programmatically static interconnect
22 23Take Four Tiles...
24Create Inputs...
25Take Four Tiles More...
26Create Outputs...
27Take Another Four Tiles...
28Create a Rotating Crossbar...
29Connect Inputs to Outputs...
30Take the Last Four Tiles...
31Create Route Lookup...
32Router!
Out1
Out2
In1
In2
In4
In3
Out4
Out3
33We are on...
- Project goal
- Raw processor overview
- Router design
- Rotating crossbar algorithm
- Implementation and results
- Future work
34Rotating Crossbar Algorithm
- Uses the static network
- create a set of possible switch configurations at
compile time - choose the necessary configuration at run-time,
depending on incoming packets - Similar to the Token Ring
- Token the right of a crossbar tile to connect
its input to any output - Master tile the tile with the token
- Slave tile other three tiles
35Rotating Crossbar Algorithm
- Uses the static network
- create a set of possible switch configurations at
compile time - choose the necessary configuration at run-time,
depending on incoming packets - Similar to the Token Ring
- Token the right of a crossbar tile to connect
its input to any output - Master tile the tile with the token
- Slave tile other three tiles
36Rotating Crossbar Illustrated
37Rotating Crossbar Illustrated
38Rotating Crossbar Illustrated
39Rotating Crossbar Illustrated
40Rotating Crossbar Illustrated
41Steps
- Main processor
- read headers
- choose configuration
- program the switch
- Switch processor
- read headers
- execute configuration
42Steps
Main
Switch
headers_request
headers
choose_config
config
route_body
confirm
43We are on...
- Project goal
- Raw processor overview
- Router design
- Rotating crossbar algorithm
- Implementation and results
- Future work
44Implementation
- Tested in the cycle-accurate simulator of the Raw
processor - Five configurations with differing internal
packet sizes - Route lookup is a switch/case statement
- Buffering is assumed to be on the input outside
of the chip - Raw prototype clock speed assumed to be 225 MHz
45Performance
Click (0.23)
46Analysis
- Performance is expected to grow 4 to 8 times with
all static networks used - Internal static network saturation is practically
unreachable - No hardware limitations have been encountered yet
47We are on...
- Project goal
- Raw processor overview
- Router design
- Rotating crossbar algorithm
- Implementation and results
- Future work
48Future Work
- Bigger multi-chip router layouts
- Faster Rotating Crossbar algorithms
- Efficient route look-up on larger routing tables
- Intermixing switch fabric with computation
- Virtual Private Networks (payload encryption)
- Intrusion Detection Systems (traffic sniffing)
- Per-flow compression (save bandwidth)
- Quality of Service
49And, More Importantly,...
50Conclusions
- Implemented a gigabit IP router on Raw
- Mapped dynamic communication to static network
- Can intermix switch fabric with computation
- High-bandwidth I/O allows performance of custom
ASIC processors
51Questions?
52(No Transcript)
53Current Design Points
- Edge router or switch fabric of a core router,
NOT a core router - WHY?
- External buffering of bandwidth-delay product ?
no Fair Queueing, etc. - Route lookup within a single tiles cache (8k
words, 32bit word) ? too small!
54Static vs. Dynamic
- Static networks stream high-bandwidth IP traffic
- Dynamic network delivers low-bandwidth control
messages - Decoupling of data and control communication