DSPs in emerging wireless systems

About This Presentation

Title:

DSPs in emerging wireless systems

Description:

Data re-ordering for Viterbi. X(0) X(2) X(4) X(6) X(8) X(10) X(12) X(14) X(1) X(3) X(5) ... All data re-arrangement problems share a common communication pattern ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 18

Provided by: Srid

Category:

more less

Transcript and Presenter's Notes

Title: DSPs in emerging wireless systems

1
DSPs in emerging wireless systems
2
Motivation

Software solutions becoming important in the
physical layer
Multi-standard systems
Algorithms tailored to environment, SNR etc.
Flexible parameters for spreading, coding
Computations exceed real-time requirements by gt 2
orders of magnitude in current generation DSPs

3
Current approaches

HW/SW co-design
Maximize programmability in DSPs
Complex tasks on co-processors
TI C6416
Viterbi and Turbo co-processors
How is this going to scale in 4G?
Keep on adding co-processors??

4
Our approach

DSP role restricted to controlling co-processors
with increasing computational demands
Final system as inflexible as traditional ASIC
design
Investigating Scalable Wireless
Application-specific Processors (SWAPs)
Identifying bottlenecks in architectures and
identify gap w.r.t. ASICs.
Investigate solutions to bridge gap

5
Scalable Wireless A-s Processors

Multi-cluster stream-based architecture based on
Imagine media processor from Stanford
Streaming processor because
GPP architectures not good for media, wireless
streaming processor shown to be good for media
applications such as FFT and FIR.
Media and communication algorithms similar
Media architectures popular --gt wireless
architectures?

6
Scalable architectures
7
Programming model

Kernels
Computation
KERNEL example1(istreamltintgt a,
istreamltintgt b,
ostreamltintgt c)
loop_stream(a)
int ai, bi, ci
a gtgt ai
b gtgt bi
ci ai 2 bi 3
c ltlt ci

Streams
Communication
void main()
Streamltintgt a(256)
Streamltintgt b(256)
Streamltintgt c(256)
Streamltintgt d(1024)
...
example1(a, b, c)
example2(c, d)
...

8
Architecture evaluation

Benchmark kernels currently used
Matrix-vector multiplications, FFT, Viterbi
Was fine in ASIC solutions
Programmable architectures need to investigate
interaction between the kernels
May need to re-order data between the kernels

9
Rice Benchmark for wireless systems

Investigate chain of multi-user estimation,
multiuser detection and Viterbi decoding
algorithms

10
Bottlenecks in multi-cluster architectures

Packed data (subword parallelism)
Not always good to pack data
Matrix transposes (Interleaving)
Doing in ALUs may be cheaper, lower power
Cannot be avoided in packed matrices
Viterbi shuffling of path metrics and survivor
states using register exchange
Register exchange needed for parallel computations

11
DSP comparisions
12
Packing in multi-cluster architectures
Kernel (in,out) half2 a //packed a int
p,q in gtgt a p mul_low(a,a) q
mul_high(a,a) out ltlt p ltlt q
13
Matrix Transpose in Memory
14
Matrix Transpose in kernel
15
Data re-ordering for Viterbi
16
Performance loss due to re-ordering data for
parallelism
Speedup per cluster added 0.5 due to
parallelizing Viterbi trellis
17
Communication pattern

All data re-arrangement problems share a common
communication pattern
Odd-even permutation of the data
Investigating solutions to solve the problem and
bridge gap between multi-cluster and 1 cluster
systems

Write a Comment

User Comments (0)