Design of a High-Speed Asynchronous Turbo Decoder - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Design of a High-Speed Asynchronous Turbo Decoder

Description:

Static Single Track Full Buffer Standard-Cell Library (Golani'06) ... Improves the throughput because of an additional pipeline buffer ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 30
Provided by: pabe8
Category:

less

Transcript and Presenter's Notes

Title: Design of a High-Speed Asynchronous Turbo Decoder


1
Design of a High-Speed Asynchronous Turbo Decoder
Pankaj Golani, George Dimou, Mallika Prakash and
Peter A. Beerel Asynchronous CAD/VLSI Group Ming
Hsieh Electrical Engineering Department University
of Southern California ASYNC 2007 Berkeley,
California March 12th 2007
2
Motivation and Goal
  • Mainstream acceptance of asynchronous design
  • Leverage-off of ASIC standard-cell library-based
    design flow
  • Achieve significant benefits to overcome sync
    momentum
  • Our research goal for async designs
  • High-speed standard-cell flow
  • Applications where designs yield significant
    improvement
  • throughput and throughput per area
  • energy efficiency

3
Single Track Full Buffer (Ferretti02)
L
R
  • Follows 2 phase protocol
  • High performance standard cell circuit family
  • Comparison to synchronous standard-cell
  • 4.5x better latency
  • 1GHz in 0.18µm
  • 2.4X faster than synchronous
  • 2.8x more area

4
Block Processing Pipelining and Parallelism
M people pipelines
Steinhart Aquarium
Latency l
First M cases arrive at t l
Let c be the person cycle time
Subsequent M cases arrive every c time units
  • Consider two scenarios
  • Baseline
  • cycle time C1, latency L1
  • Improved
  • cycle time C2 C1/2.4, latency L2 L1/4.5
  • Questions
  • How does cycle time affect throughput?
  • How does latency affect throughput ?

5
Block Processing Combined Cycle Time and
Latency Effect
Large K throughput ratio ? cycle time ratio
Small K throughput ratio ? latency ratio
6
Talk Outline
  • Turbo coding and decoding an introduction
  • Tree soft-input soft-output (SISO) decoder
  • Synchronous turbo decoder
  • Asynchronous turbo decoder
  • Comparisons and conclusions

7
Turbo Coding Introduction
  • Error correcting codes
  • Adds redundancy
  • The input data is K bits
  • The output code word is N bits (NgtK)
  • The code rate is r K/N
  • Type of codes
  • Linear code
  • Convolutional code (CC)
  • Turbo code

8
Turbo Encoding - Introduction
Inner CC
Interleaver
Outer CC
Turbo Encoder
  • Turbo Encoding
  • Berrou, Glavieux and Thitimajshima (1993)
  • Performance close to Shannon channel capacity
  • Typically uses two convolutional codes and an
    interleaver
  • Interleaver used to improve error correction
  • increases minimum distance of code
  • creates a large block code

9
Turbo Decoding
  • Turbo decoder components
  • Two soft-in soft-out (SISO) decoders
  • one for inner CC and one for outer CC
  • soft input a priori estimates of input data
  • soft output a posterior estimates of input data
  • SISO often based on Min-Sum formulation
  • Interleaver / De-interleaver
  • maps SISO outputs to SISO inputs
  • same permutation as used in encoder
  • Iterative nature of algorithm leads to block
    processing
  • One SISO must finish before next SISO starts

Received Data memory
Inner SISO
De- interleaver
Outer SISO
Interleaver
10
The Decoding Problem
  • Requires finding paths in a graph called a
    trellis
  • Node State j of encoder at time index k
  • Edge Represents receiving a 0 or 1 in node for
    state j at time k
  • Path Represents a possible decoded sequence
  • the algorithm finds multiple paths
  • Example Trellis
  • For a 2-state encoder, encoding K bits

t 0
t K
t k
Sent bit is 1
Sent bit is 0
Decoded Sequence
0 1 0 0 0
1 0 1 0 0
11
Min-Sum SISO Problem Formulation
  • Branch and path metrics
  • Branch metric (BM)
  • indicates difference between expected and
    received values
  • Path metric
  • sum of associated branch metrics
  • Min-Sum Formulation for each time index k find
  • Minimum path metric over all paths for which bit
    k 1
  • Minimum path metric over all paths for which bit
    k 0

t 0
t k
t K
1
Sent bit is 1
Sent bit is 0
Minimum path metric when bit k 1 is 13
Minimum path metric when bit k 0 is 16
12
Talk Outline
  • Turbo coding and decoding an introduction
  • Tree SISO low-latency turbo decoder architecture
  • Synchronous turbo decoder
  • Asynchronous turbo decoder
  • Comparisons and conclusions

13
Conventional SISO - O(K) latency
  • Calculation of the minimum path can be divided
    into two phases
  • Forward state metric for time k and state j
  • Backward state metric for time k and state j
  • Data dependency loop prevents pipelining
  • Cycle time limited to latency of 2-way ACS
  • Latency is O(K)

t k
t K
t k-1
t k1
t 0
Received bit is 1
Received bit is 0
14
Tree SISO low latency architecture
  • Tree SISO (Beerel/Chugg JSAC01)
  • Calculate BMs for larger and larger segments of
    trellis.( )
  • Analogous to creating group-wise PG logic for
    tree adders
  • Tree SISO can process the entire trellis in
    parallel
  • No data dependency loops so finer pipelining
    possible
  • Latency is O(log K)

15
Remainder of Talk Outline
  • Turbo Coding an introduction
  • Turbo Decoding
  • Tree SISO low-latency turbo decoder architecture
  • Synchronous turbo decoder
  • Asynchronous turbo decoder
  • Comparisons and conclusions

16
Synchronous Base-Line Turbo Decoder
  • Synchronous turbo decoder base-line
  • IBM 0.18µm Artisan standard cell library
  • SCCC code was used with a rate of ½
  • Number of iterations performed is 6
  • Gate level pipelined to achieve high throughput
  • Performed timing-driven PR
  • Peak frequency of 475MHz
  • SISO area of 2.46mm2
  • To achieve high throughput, multiple blocks
    instantiated

17
Asynchronous Turbo Decoder
  • Static Single Track Full Buffer Standard-Cell
    Library (Golani06)
  • Total of (only) 14 cells in IBM 0.18µm process
  • Extensive spice simulations were performed
  • optimized trade-off between performance and
    robustness
  • Chip design
  • Standard ASIC place-and-route flow
    (congestion-based)
  • ECO optimization flow
  • Chip level simulation
  • Performed on critical sub-block (55K transistors)
  • Verified timing constraints
  • Measured latency and throughput using Synopsys
    Nanosim

18
Static Single Track Full Buffer (Ferretti01)
1-of-N data
Receiver
Sender
SST channel
1-of-N static single-track protocol
Holds low Drives high
Holds high Drives low
Statically drive line ? improves noise margin
19
Asynchronous Implementation Challenges - I
  • Degradation in throughput
  • Unbalanced fork and join structure
  • The token on the short branch is stalled due to
    imbalance
  • This leads to over all slowing down of the fork
    join
  • Slack matching
  • Improves the throughput because of an additional
    pipeline buffer
  • Identify fork / join bottlenecks and resolve by
    adding buffers
  • After PR long wires can also create such a
    problem
  • This can be solved by adding buffers on long
    wires using ECO flow

20
Asynchronous Implementation Challenges - II
Full Adder

Fork
Full Adder with Integrated Fork
  • SSTFB implements only point to point
    communication
  • Use dedicated Fork cells
  • Creates another pipeline stage
  • To slack match buffers are needed on the other
    paths
  • Integrate Fork within Full Adder

45 less area than full adder and fork Decreases
the number of slack matching buffers required
21
Asynchronous Implementation Challenges III
  • 60 of the design are slack matching buffers
  • Most of the time these buffers occur in linear
    chains

Slack2
Buffer
Buffer
  • To save area and power two new cells were created
  • SLACK2
  • SLACK4

17 area and 10 power improvement for
SLACK2 30 area and 19 power improvement for
SLACK4
22
Remainder of Talk Outline
  • Turbo Coding an introduction
  • Turbo Decoding
  • Tree SISO low-latency turbo decoder architecture
  • Synchronous turbo decoder
  • Asynchronous turbo decoder
  • Comparisons and conclusions

23
Comparisons
  • Synchronous
  • Peak frequency of 475MHz
  • Logic area of 2.46mm2
  • Asynchronous
  • Peak frequency of 1.15GHz
  • Logic area of 6.92mm2
  • Design time comparison
  • Synchronous 4 graduate-student months
  • Asynchronous 12 graduate-student months

24
Synch vs Async
Received Memory
M pipelined 8-bit Tree SISOs
Interleaver/ De-interleaver
Latency l
First M bits arrive at t l
K bits
Let c be the sync clock cycle time (475 MHz)
Subsequent M bits arrive every c time units
  • Two implementations
  • Synch cycle time C1 and latency L1
  • Async cycle time C2 C1/2.4
  • latency L2 L1/4.5
  • Desired comparisons
  • Throughput comparison vs block size
  • Energy comparison vs block size

25
Comparisons Throughput / Area
1.28 M3
2.13 M8
3.91 M11
  • For small block sizes asynchronous provides
    better throughput/area
  • As block size ? the two implementations become
    comparable
  • For block sizes of 512 bits synchronous cannot
    achieve async throughput

26
Comparisons Energy/Block
  • For equivalent throughputs and small block sizes
    asynchronous is more energy
    efficient than synchronous
  • Async advantages grow with larger async library
    (e.g., w/ BUF1of4)

27
Conclusions
  • Asynchronous turbo decoder vs. synchronous
    baseline
  • static STFB offers significant improvements for
    small block sizes
  • more than 2X throughput/area
  • higher peak throughput (500Mbps)
  • more energy efficient
  • well-suited for low-latency applications (e.g.
    voice)
  • High-performance async advantageous for
    applications which require
  • high performance (e.g., pipelining)
  • low latency
  • block processing for which parallelism has
    diminishing returns
  • synchronous design requires extensive parallelism
    to achieve equivalent throughput

28
Future Work
  • Library Design
  • Larger library with more than 1 size per cell
  • 1-of-4 encoding
  • Async CAD
  • Automated slack matching
  • Static timing analysis

29
Questions ??
Write a Comment
User Comments (0)
About PowerShow.com