Analog Devices TigerSHARC - PowerPoint PPT Presentation

About This Presentation

Title:

Analog Devices TigerSHARC

Description:

High performance, 128-bit successor to the ADSP-2106x SHARC family ... Trellis decoding (8 Trellis butterflies per cycle) 11. Data Address Generation. ADSP-21061 ... – PowerPoint PPT presentation

Number of Views:275

Avg rating:3.0/5.0

Slides: 26

Provided by: Mik128

Category:

more less

Transcript and Presenter's Notes

Title: Analog Devices TigerSHARC

1
Analog Devices TigerSHARC DSP Family

Presented By Mike Lee and
Mike Demcoe
Date April 8th, 2002

2
TigerSHARC Architectural Overview

High performance, 128-bit successor to the
ADSP-2106x SHARC family
ADSP-TS101S, the newest TigerSHARC DSP, operates
at 250MHz!
Multiple computational units
Two compute blocks, each containing a register
file, ALU, multiplier, and shifter.
Two additional integer ALUs
Two hardware loop counter registers
Can execute up to four independent 32-bit
instructions at a time
Or, eight 16-bit instructions
Very wide word widths for high precision
arithmetic
Designed to be used in a multiple processor
environment

3
TigerSHARC Architecture Overview (cont)

BTB (Branch Target Buffer) as a means of
alleviating issues with the deep pipeline
32-instruction, 4-way set-associative cache
User controlled Branch Prediction
Three, 128-bit blocks of memory which provide
access to a program and two data operands without
causing instruction/data conflicts.
Load-store, Harvard architecture, like SHARC.
Native support for complex number instructions

4
The TS101S Architecture
5
Details of Multiple Compute Blocks

Two computational units, each containing
Register file Multi-ported to allow multiple
accesses to registers in a single clock cycle
General purpose registers!
Contains 32 words, each word being 32-bits in
length.
ALU Fixed-point and floating point
Multiplier Fixed-point and floating point
Also features MAC (multiply-and-accumulate)
capabilities
Shifter Standard logical and arithmetic shifts
as well as bit manipulation

6
The TS101S Pipeline
7
Pipelines and Instruction Related Information

ADSP-21061
Three stage pipeline
20ns instruction cycle
SISD but can put instructions in parallel
ADSP-TS101S
Eight stage pipeline with IAB
4ns instruction cycle
MIMD and can also put instructions in parallel

8
Loops, Branching and Timers

ADSP-21061
Zero-overhead hardware loop support
Delayed Branching
One timer
ADSP-TS101S
Little support for zero-overhead hardware loops
32-entry 4-way associative BTB cache with Branch
prediction
Two timers

9
Memory and Buses

ADSP-21061
1 Mbit dual ported SRAM
Shared by three buses (PM, DM, I/O)
PM and DM share a port while the I/O receives
its own
ADSP-TS101S
6 Mbit of SRAM (Quad Ported??)
User defined partitions
Each block is accessed by one 128-bit bus

10
Multiplication and other Nifty Tricks

ADSP-21061
MAC instructions (MRF and MRB)
Various precision output (32, 40, or 80 bit)
ADSP-TS101S
Each compute block has its own set of MAC
registers
8 16-bit MAC with 40-bit accumulation or 2 32-bit
MAC with 80-bit accumulation
Complex number MAC instructions
128-bit accelerator
Trellis decoding (8 Trellis butterflies per cycle)

11
Data Address Generation

ADSP-21061
2 data address generation units (DAGS)
8 circular buffers per DAG
ADSP-TS101S
2 data address generation units (IALU)
4 circular buffers per IALU
Both support modulo arithmetic, bit reversal
addressing, and post and pre-modify instructions

12
Ease of Use

ADSP-21061
Easy to use
Algebraic instruction set
Visual DSP environment
ADSP-TS101S
Similar to 21061 but know have to consider 2
compute blocks
ADI suggests leaving parallelization to their
optimizing compiler
Visual DSP environment

13
Specific DSP Algorithms and the TigerSHARC

In ENEL515 (and/or related articles) weve
studied the FIR, IIR, and FFT algorithms
TigerSHARC has a massively parallel architecture
that is tailored to performing these algorithms.

14
FIR Filter Characteristics

Think back (or forward, depending on how much
youve procrastinated) to Lab 3.
FIR Characteristics
Simple, long loop
Repetitive calculations (multiply, then add!)
Access to an array of coefficients, and an array
of delay-line values
Few data dependency issues during the calculation
of a single output
For a filter of length N, require N
multiplications and N adds to obtain a single
output value.

15
TigerSHARC and the FIR Filter

The general idea is Divide and conquer!
Take a filter of size N and split it into two
groups of N/2
Utilize the TigerSHARCs multiple computational
units and MAC instructions to perform the
algorithm in ½ the time (plus some overhead)
Two hardware loop counters to simultaneously
control the two new N/2 size FIR loops with no
overhead!
Can do all of the following SIMULTANEOUSLY!
Fetch two operands (one coefficient, one delay
line value) from two separate memory banks
Fetch the next instruction
Perform arithmetic operations on the PREVIOUS
operands!
Unlike SHARC, instruction/data clashes are
non-existant due to the numerous bus paths
linking computational units to memory space

16
TigerSHARC and the FIR Filter (continued.)

8-cycle-deep pipeline
Stalls are expensive..
Branch Target Buffer reduces performance loss
that results from branching in a deeply pipelined
processor
The long loop characteristic of the FIR filter
algorithm allows us to keep the 8-cycle-deep
pipeline full
Full pipeline means fast algorithm
FIR Filter algorithms rely heavily on data sets
that are aligned in memory
Post-increment is your friend
TigerSHARC Quad Data Accesses Supply four
aligned words to one compute block or two aligned
words to each compute block.

17
Example Instructions

X/Y Conditional Compute
if xALE do, R0R1R2
Condition codes,
AEQ, ALT, ALE, ALU, MEQ, MLT, MLE, SEQ, SLT, SF0,
SF1.
A Adder, M Multiplier, S Shifter
Memory Addessing
Indirect post-modify with update, register
offset
YR20J1J2
Indirect post-modify with update, 8-bit immediate
offset
QK10xF8XYR30
Indirect pre-modify no update, register offset
J32LK1K2
Indirect pre-modify no update, immediate offset
YR32LK10x0003333
Complex Quad 16-bit Fixed Point Multiplication
Instructions
XYXY MRa Rm Rn (UICCRJ)
XYXY RsRsdMRa, MRa Rm Rn
(UICJ)

18
FIR Code Example
19
TigerSHARC and the IIR Filter

Short, simple loop characteristic
Means loop overhead is more of a concern
Means keeping the pipeline full is tougher!
Time to unroll the loop, although ADI says to let
VisualDSP do it for you.
Again, split up the calculations on an N-tap IIR
filter into two N/2 sets operating simultaneously
Idea One computational block does feedforward
calculations, one does feedback!
Complex numbers commonly required
Hardware support for complex MAC in TigerSHARC
Again, Quad Data Access comes in handy for
aligned data
Post-increment is still your friend

20
TigerSHARC and the FFT

Does not use the same MAC modes that IIR and FIR
filters do.
Requires more complicated addressing modes
Example Bit reverse addressing
Found on both SHARC and TigerSHARC
Difficult to split onto separate computational
units and even more difficult to split amongst
distributed processors
Requires large arrays of complex variables and
fixed coefficients
Hardware complex number MAC comes in handy again!
Large arrays of aligned data Quad Data access
again!
Requires HIGH-PRECISION arithmetic
Luckily we have 64-bit fixed point arithmetic and
40-bit extended floating point arithmetic.
80-bit MAC precision
FFT Requires many intermediate values
32 GP registers in a single computational block

21
http//www.analog.com/technology/dsp/Sharc/benchma
rks.html
22
http//www.analog.com/technology/dsp/TigerSHARC/be
nchmarks.html
4ns Instruction Cycle
23
Conclusion

TigerSHARC have a very SHARC-like architecture,
except its MUCH more complex.
Highly optimized for parallelism
Major features Complex number support, multiple
computational units, high instruction throughput,
wider buses.
Performs DSP algorithms including FIR, IIR, FFT
significantly faster than SHARC!

24
References

1. http//www.analog.com/productSelection/pdf/ADSP
-21061_L_b.pdf
2. http//products.analog.com/products/info.asp?pr
oductADSP-TS101-S
3. http//www.analog.com/technology/dsp/TigerSHARC
/backgrounder.html
4. http//www.analog.com/library/dspManuals/Tigers
harc_hardware.html
5. http//www.analog.com/library/dspManuals/Tigers
harc_instruction.html
6. http//www.btid.com/procsum/tsfloat.htm
7. http//www.analog.com/library/applicationNotes/
dsp/tigerSharc/EE-147.pdf
8. http//www.analog.com/technology/dsp/TigerSHARC
/architecture.html
9. http//www.analog.com/library/dspManuals/pdf/TS
DSP_instruction/tsintr.pdf
(2-182 - 2-188)
10. ADSP-2106x SHARC Users Manual, Second
Edition
11. http//www.analog.com/library/dspManuals/pdf/
TSDSP_instruction/tsin_flw.pdf
(3-9 - 3-16)

25
Note from Dr. Smith

Information on Burg algorithm outside ICT536. It
is essentially an FIR filter used for prediction
(i.e. what FIR coefficients are needed so that
the filtered signal is "white noise" )

Write a Comment

User Comments (0)