A System Solution for High- Performance, Low Power SDR PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: A System Solution for High- Performance, Low Power SDR


1
A System Solution for High- Performance, Low
Power SDR
  • Yuan Lin1, Hyunseok Lee1, Yoav Harel1, Mark Woh1,
  • Scott Mahlke1, Trevor Mudge1 and Krisztian
    Flautner2
  • 1Advanced Computer Architecture Laboratory
  • University of Michigan
  • 2ARM, Ltd.

2
SDR Design Challenges
  • Hardware design challenges
  • High computational throughput (40 Gops)
  • Low power consumption (200mW)
  • Meet real-time requirements
  • DSP programming support
  • System-level development
  • Inter-algorithm communication
  • Algorithm-level development
  • Efficient DSP representations

3
SDR Benchmark Design Analysis
4
W-CDMA Protocol 2Mbps
5
W-CDMA Characteristics
  • Plenty of vector parallelism
  • 8 16-bit DSP algorithms
  • Multiplication is not dominant
  • No floating-point operation
  • Small instruction/data memory
  • Has periodic real-time tasks

6
802.11a Protocol 24Mbps
7
802.11a Characteristics
  • Similar to W-CDMA
  • Plenty of vector parallelism
  • No floating-point operation
  • Small instruction/data memory
  • Different from W-CDMA
  • Mostly 16-bit DSP algorithms
  • Multiplication is more dominant
  • No periodic real-time tasks

8
SDR Processor Architecture Design
9
System Architecture Design Tradeoffs
10
System Architecture Design Tradeoffs
Number of Processing Elements x SIMD width For
W-CDMA 2Mbps (51.2GOP/sec) 90nm 1V _at_400MHz
11
Our SDR System Architecture Design
  • Scalable system design
  • Standardized SoC interface
  • System interface supports multiple (potentially)
    heterogeneous PEs and memories
  • For WCDMA 802.11a
  • 4 homogeneous processing elements (PEs)
  • Dual pipelines scalar pipeline SIMD pipeline
  • Local scratchpad memory (no data cache)
  • Global scratchpad memory (64KB)
  • Controller -- ARM general purpose processor

12
System Architecture Design
13
PE Design (Area lt 1mm2 Powerlt50mW)
14
Mapping DSP Algorithms Filters
z-1
In
b
Out
spread Vin, Sin
z-1
shift z, z, up
mac z, Vin, Sin
In
b0
b1
b2
b3
Out
Z-1
Z-1
Z-1
15
Mapping DSP Algorithm Filter
spread Vin, Sin
shift z, z, up
mac z, Vin, Sin
16
Efficient Design
  • Wide SIMD width
  • Small register file with minimum ports
  • Small memories
  • Narrow system BUS
  • Data-path optimized for 8bits
  • Vector shuffle reduce memory ports

17
Processing Element (PE) Design
  • Scalar pipelines
  • 16bit data path
  • SIMD pipeline
  • 8 bit data path
  • 32x8 SIMD ALU
  • Software controlled local scratchpad memory
  • 4KB scalar memory
  • 4KB SIMD memory
  • Inter-PE communication through DMA

18
(No Transcript)
19
(No Transcript)
20
802.11a PE Mapping
21
Power Results
  • Configuration
  • 4 PEs, 1 ARM (Cortex M3) controller
  • Global scratchpad memory (64Kb)
  • 90nm (1V _at_ 400 MHZ),
  • Synthesized conservatively

22
Area Results
23
SDR Programming Language Support
24
Software Development Flow
25
SPEX (Signal Processing EXtension)
  • Implemented as a library extension to C
  • System-level development
  • Support concurrent DSP kernel function
    definitions
  • Channel variables for inter-kernel communications
  • Algorithm-level development
  • Native vector matrix variables
  • Explicit DSP variable attribute definition
  • Native vector matrix operations

26
SPEX Overview
27
SPEX Example Code Viterbi ACS
Concurrent DSP kernel definitions
void acs(void) / variable declaration /
saturated charlt64gt metrics1, metrics2
saturated charlt64gt states saturated charlt64gt
t1, t2 while (!viterbi.stop()) /
receiving data from BMC / metrics1
bmc_to_acs.receive() metrics2
bmc_to_acs.receive() / add / metrics1
states metrics2 states /
compare and select / t1 (metrics1(0,2,62),m
etrics2(0,2,62)) t2 (metrics1(1,2,63),metri
cs2(1,2,63)) states(t1ltt2) t1
states(t1gtt2) t2 / sending data to TB
/ acs_to_tb.send(states)
28
SPEX Example Code Viterbi ACS
void acs(void) / variable declaration /
saturated charlt64gt metrics1, metrics2
saturated charlt64gt states saturated charlt64gt
t1, t2 while (!viterbi.stop()) /
receiving data from BMC / metrics1
bmc_to_acs.receive() metrics2
bmc_to_acs.receive() / add / metrics1
states metrics2 states /
compare and select / t1 (metrics1(0,2,62),m
etrics2(0,2,62)) t2 (metrics1(1,2,63),metri
cs2(1,2,63)) states(t1ltt2) t1
states(t1gtt2) t2 / sending data to TB
/ acs_to_tb.send(states)
Native SIMD variable definition with explicit
attributes SPEX variable supports 1.
saturated/overflow 2. various variable
bit-width 3. vector matrices
29
SPEX Example Code Viterbi ACS
void acs(void) / variable declaration /
saturated charlt64gt metrics1, metrics2
saturated charlt64gt states saturated charlt64gt
t1, t2 while (!viterbi.stop()) /
receiving data from BMC / metrics1
bmc_to_acs.receive() metrics2
bmc_to_acs.receive() / add / metrics1
states metrics2 states /
compare and select / t1 (metrics1(0,2,62),m
etrics2(0,2,62)) t2 (metrics1(1,2,63),metri
cs2(1,2,63)) states(t1ltt2) t1
states(t1gtt2) t2 / sending data to TB
/ acs_to_tb.send(states)
Inter-kernel communication through channel
operations Channel types 1. FIFO queue 2.
Broadcast queue 3. Sync/control channel 4.
Random-read FIFO queue
30
SPEX Example Code Viterbi ACS
void acs(void) / variable declaration /
saturated charlt64gt metrics1, metrics2
saturated charlt64gt states saturated charlt64gt
t1, t2 while (!viterbi.stop()) /
receiving data from BMC / metrics1
bmc_to_acs.receive() metrics2
bmc_to_acs.receive() / add / metrics1
states metrics2 states /
compare and select / t1 (metrics1(0,2,62),m
etrics2(0,2,62)) t2 (metrics1(1,2,63),metri
cs2(1,2,63)) states(t1ltt2) t1
states(t1gtt2) t2 / sending data to TB
/ acs_to_tb.send(states)
  • SPEX vector operations
  • Supports
  • (Matlab-like C code)
  • SIMD arithmetic
  • operations
  • 2. SIMD permutation
  • 3. SIMD predication

31
Summary
  • Hardware software solutions for SDR
  • Hardware
  • 4 dual-issue asymmetric SIMD processing elements
  • Consumes 200300mW for 90nm
  • Meets the performance requirements for WCDMA
    802.11a
  • Software
  • SPEX provides efficient DSP algorithm and system
    implementation

32
Questions?
Write a Comment
User Comments (0)
About PowerShow.com