Title: Architectural Exploration:
1- Architectural Exploration
- 802.11a Transmitter
- Arvind, Nirav Dave, Steve Gerding, Mike Pellauer
- Computer Science Artificial Intelligence
Laboratory - Massachusetts Institute of Technology
- MIT-Nokia Architecture Group
- Helsinki, June 5, 2006
2Why architectural exploration
- Architects are clever people and can think of a
variety of designs - But often cannot determine which design is best
for a given metric (e.g., power) - Too short of time and manpower to go far enough
with several designs for proper evaluation
? Guess work instead of architectural exploration
New design tools can change all that
3This talk
- Architectural exploration of 802.11a transmitter
- The goal is to show that it is easy and
economical to do so in Bluespec - You dont have to know 802.11a or Bluespec to
understand the talk
4802.11a Transmitter Overview
headers
Must produce one OFDM symbol every 4 msec
24 Uncoded bits
data
5Combinational IFFT
All numbers are complex and represented as two
sixteen bit quantities. Fixed-point arithmetic is
used to reduce area, power, ...
6Design Tradeoffs- 1
- We can decrease the area by multiplexing some
circuits
It may be a win if the throughput
requirements can be met without increasing the
frequency
7Design Tradeoffs-2
- Power can be lowered by lowering the frequency
Frequency can be lowered by lowering the voltage
(power) power ? (voltage)2
8Combinational IFFTOpportunity for reuse
Reuse the same circuit three times
9Circular pipeline Reusing the Pipeline Stage
64, 4-way Muxes
Stage Counter
16 Radix 4s can be shared but not the three
permutations. Hence the need for muxes
10Superfolded circular pipeline Just one Radix-4
node!
Designs with 2, 4, and 8 Radix-4 modules make
sense too!
11Which design consumes the least energy to
transmit a symbol?
- Can we quickly code up all the alternatives?
- single source with parameters?
Not practical in traditional hardware description
languages like Verilog/VHDL
12Expressing the designs in Bluespec
13Bluespec code Radix-4 Node
- function Vector(4,Complex)
- radix4(Vector(4,Complex) t,
Vector(4,Complex) k) - Vector(4,Complex) m newVector(),
- y newVector(),
- z newVector()
- m0 k0 t0 m1 k1 t1
- m2 k2 t2 m3 k3 t3
- y0 m0 m2 y1 m0 m2
- y2 m1 m3 y3 i(m1 m3)
- z0 y0 y2 z1 y1 y3
- z2 y0 y2 z3 y1 y3
- return(z)
- endfunction
Polymorphic code works on any type of numbers
for which , and - have been defined
14Combinational IFFTCan be used as a reference
stage_f function
repeat it three times
15Bluespec Code for Combinational IFFT
function SVector(64, Complex) ifft (SVector(64,
Complex) in_data) //Declare vectors
SVector(4,SVector(64, Complex)) stage_data
replicate(newSVector)
stage_data0 in_data for
(Integer stage 0 stage lt 3 stage stage
1) stage_datai1 stage_f(stage,
stage_datai) return(stage_data3)
The code is unfolded to generate a combinational
circuit
- function SVector(64, Complex) stage_f(Bit(2)
stage, -
SVector(64, Complex) stage_in) - begin
- for (Integer i 0 i lt 16 i i 1)
- begin
- Integer idx i 4
- let twid getTwiddle(stage,
fromInteger(i)) - let y radix4(twid, stage_inidxidx3)
- stage_tempidx y0
stage_tempidx 1 y1 - stage_tempidx 2 y2
stage_tempidx 3 y3 - end
- //Permutation
- for (Integer i 0 i lt 64 i i 1)
- stage_outi stage_temppermutei
- end
- return(stage_out)
Stage function
16Synchronous pipeline
rule sync-pipeline (True) inQ.deq() sReg1
lt f1(inQ.first()) sReg2 lt f2(sReg1)
outQ.enq(f3(sReg2)) endrule
This is real IFFT code just replace f1, f2 and
f3 with stage_f code
17Folded pipeline
x
inQ
outQ
stage
sReg
function f (stage,sx) case (stage) 1 return
f1(sx) 2 return f2(sx) 3 return
f3(sx) endcase endfunction
rule folded-pipeline (True) if (stage1)
begin inQ.deq() sxIn inQ.first()
end else sxIn sReg sxOut
f(stage,sxIn) if (stage3) outQ.enq(sxOut)
else sReg lt sxOut stage lt (stage3)? 1
stage1 endrule
This is real IFFT code too ...
18Expressing these designs in Bluespec is easy
- All these designs were done in less than one day!
- Area and power estimates?
How long will it take to write these designs in
Verilog? VHDL? SystemC?
19Bluespec Tool flow
Bluespec SystemVerilog source
Bluespec Compiler
Verilog 95 RTL
C
CycleAccurate
Verilog sim
RTL synthesis
Bluespec C sim
VCD output
gates
Debussy Visualization
FPGA
Sequence Design PowerTheater
20802.11a Transmitter Synthesis results for various
IFFT designs
TSMC .18 micron numbers reported are before
place and route. Some areas will be larger after
layout.
21Algorithmic Improvements
1. All the three permutations can be made
identical ? more saving in area 2. One
multiplication can be removed from Radix-4
22802.11a Transmitter Synthesis results old vs.
new IFFT designs
???
expected
TSMC .18 micron numbers reported are before
place and route.
23802.11a Transmitter Synthesis results with new
IFFT designs
TSMC .18 micron numbers reported are before
place and route.
24802.11a Transmitter with new IFFT designs Power
Estimates
Work in progress
c3 is raw data collected by the Sequence Design
PowerTheater c4 min clock x scaling factor
c5 c4xc3/100MHz/voltage scaling(10) c6
c5x4 ?sec
25Summary
- It is essential to do architectural exploration
for better (area, power, performance, ...)
designs. - It is possible to do so with new design tools and
methodologies. - Better and faster tools for estimating area,
timing and power would dramatically increase our
capability to do architectural exploration.
Thanks