Title: DSPs for future wireless systems
1DSPs for future wireless systems
2Motivation
Baseband
Programmable
A/D
Wireless Mobile
RF Unit
D/A
device
Communications
Processor
Higher Layers
Add-on PCMCIA Network Interface Card
- Mobile Switch between standards and between
parameters - Base-station varying number of users with
different parameters
3The problem
4An approach for the solution
- Algorithms well understood at VLSI level
- Can design real-time systems.
- Pushing it higher in the chain
- Current DSPs not powerful enough for our
application - Using the IMAGINE simulator to see what kind of
architecture features would be useful in a future
DSP for such applications.
5History of my work
Multiuser channel estimation Multiuser detection
Distant Past
Algorithms
VLSI
Task-partitioning Parallelism Pipelining
FPGA
Recent Past
Conventional arithmetic On-line arithmetic
DSP
Instruction set extensions Co-processor
support Functional unit design and usage
Recent and Near Future
IMAGINE
6Contents
- Programmable architecture design using the
IMAGINE simulator - Multiuser estimation and detection implementation
- Performance comparisons and results
- Other extensions for possible integration
- Conclusions
7The IMAGINE architecture and simulator
- IMAGINE is a media signal processor
8Why the IMAGINE simulator?
- Great for media processing algorithms
- Has a VLIW-based cluster -- DSP comparisons
-
- A good base architecture 1024-pt FFT
- RSIM, SimpleScalar more general purpose
architecture simulators
9What does the simulator give us?
- Execution time for the different parts of the
code - Functional unit utilization
- Insights into the bottlenecks
- Flexibility to add and remove functional units
already present or design your own - Graphical view of the schedule on the functional
units
10Down-side
- 2 level C programming
- StreamC
- transfers streams of data between main memory and
stream register file (SRF) - KernelC
- transfers streams from the SRF to the ALU
clusters - Code optimized to the number of ALU clusters and
the size of the data - Compiler may fail register allocation if too many
variables or functional units modified
11Contents
- Programmable architecture design using the
IMAGINE simulator - Multiuser estimation and detection implementation
- Performance comparisons and results
- Other extensions for possible integration
- Conclusions
12Typical workload representation (Base-station)
- Equalization
- FFT
- Viterbi decoding
- Channel estimation
- Multiuser detection
- Viterbi/Turbo decoding
- Multiple antennas
- Long spreading codes
- Space-Time codes
Wireless LAN
W-CDMA
If you felt that life was too easy
13Estimation/Detection (64,32 sizes)
Multiuser Estimation Kernel 1,2,3
Massaging matrices for detection Kernel 4, 5
Multiuser Detection Kernel 6, 7
14Kernels
- 1. Update Update Rbb, Rbr
- 2. Mmult multiply Rbb A
- 3. Iterate gradient descent
- 4. MmultL Calculate L
- 5. MmultC Calculate C
- 6. Mf Matched Filter
- 7. Pic 1 Parallel Interference Cancellation
Stage
15Kernel 2 (mmult) for 3 ,2Divider not being
utilizedAdders have limited FU
utilizationO(N3) , O(N3) Multipliers 100
in loopReplace / with
16Kernel 2 (mmult)for 3 ,3better adder
utilization needs sufficient registers for
scaling register allocation may failcode may
also need slight tuning of variables for
optimization
17Contents
- Programmable architecture design using the
IMAGINE simulator - Multiuser estimation and detection implementation
- Performance comparisons and results
- Other extensions for possible integration
- Conclusions
18FU utilization on each cluster
Time for detection at 128 Kbps for each of 32
users at 500 MHz 4000 cycles
19Comparisons with DSPs
-2
10
-3
10
-4
10
Execution time (in seconds)
X
-5
10
Single DSP implementation
2 DSP implementation
Target data rate - 128 Kbps/user
x
Our architecture based on Imagine
-6
10
0
5
10
15
20
25
30
35
Users
20Current work
- Evaluating performance of wireless communication
algorithms such as estimation, detection and
decoding on this architecture - Studying bottlenecks, functional unit design
needed to attain real-time - The insights gained from the design can also be
applied to other processors such as DSPs.