Architectural Approaches: Hardware Accelerations for MPEG4 on Mobile Devices PowerPoint PPT Presentation

presentation player overlay
1 / 13
About This Presentation
Transcript and Presenter's Notes

Title: Architectural Approaches: Hardware Accelerations for MPEG4 on Mobile Devices


1
Architectural Approaches Hardware Accelerations
for MPEG-4 on Mobile Devices
  • 2004. 02. 13
  • C. G. Kim

2
Contents
  • Introduction
  • Motion Estimation Hardware Acceleration
  • Datapath
  • Memory Optimization
  • DCT Hardware Acceleration
  • Base line implementation
  • Recent researches
  • Shape Encoding Arithmetic Coding
  • Architectural Trends
  • Conclusion

3
Introduction (1/2)
  • Backgrounds
  • Multimedia application platforms
  • Setup boxes and desktop PCs
  • PDA and smart phones
  • MPEG-4 with compression efficiency and
    content-based advantages
  • Data intensive task
  • Motion estimation (ME)
  • DCT/IDCT
  • Context-based arithmetic encoding (CAE)
  • Mobile devices
  • HW limitations
  • Low computational power
  • Low memory capacity
  • Short battery life
  • Miniaturization requirements
  • Real time multimedia applications

4
Introduction (2/2)
  • Several Approaches
  • HW
  • ASIC (Application Specific Integrated Circuit)
    solutions
  • DSP (Digital Signal Processing) architectures
  • Field Programmable Gate Array (FPGA)
    implementations
  • Control paradigm
  • Single Instruction Multiple Data (SIMD)
  • Multiple Instruction Multiple Data (MIMD)
  • SA (Systolic Arrays)
  • Hybrid hardware acceleration design paradigm
  • More flexible dedicated HW for computationally
    intensive tasks
  • SW RISC processor-based solutions for adaptive
    control
  • Other state-of-the-art techniques (dynamic power
    management)

5
ME Hardware Acceleration (1/4)
  • BMA block-matching algorithm
  • Requiring over 50 of the computational power
  • Base operations
  • Square root, multiplication, and division
  • HW feasibility from the performance/complexity
    ratio point of view
  • Sum of absolute differences (SAD)
  • Delivering the best accuracy/complexity ratio
  • Different pel count (DPC)
  • Others
  • Pixel Difference Classification (PDC)
  • Binary Level Matching Criterion (BPROP)
  • Bit-Plane Matching Criterion (BPM)

6
ME Hardware Acceleration (2/4)
  • Datapath
  • Systolic array
  • Full search
  • Regularity in a full search strategy
  • No significant control circuitry overhead
  • 1-D or 2-D implementations with global or local
    accumulation
  • Parameters clock rate, picture size, search
    range, and block size
  • Fast heuristical search
  • Data address generation and flow control signals
    increases considerably along with the power
    inefficiency
  • Tree-architecture
  • Suitable for irregular search strategies but
  • Require unfeasible high memory bandwidth

7
ME Hardware Acceleration (3/4)
  • Reduced pel information approaches
  • Edge extraction, frame processing or pel
    subsampling
  • Apply a search strategy on reduced-bit frame
    representations
  • Binary search algorithms (fast exhaustive)
  • Employ conservative SAD estimations and SAD
    cancellation mechanisms
  • Reduce computation by skipping irrelevant
    candidate blocks

8
ME Hardware Acceleration (4/4)
  • Memory optimization
  • Targeting memory data flow rather than
    traditional memory banking optimization
  • Re-arrange and remap the content of the on-chip
    memory in order to achieve the highest memory
    access efficiency.
  • High degree of on-chip memory content re-use
  • Parallel pel information access
  • Memory access interleaving

9
DCT/IDCT Hardware Acceleration (1/2)
  • Base line implementation
  • Fast Algorithm for the Discrete Cosine Transform
    ( by W. Chen et al)
  • Using FFT (Fast Fourier Transform)
  • Can be derived in the form of matrices
  • Readily translated to hardware or software
    implementation
  • Difficult to implement a unified DCT/IDCT block
  • Complex architectures
  • Very complex signal-flow
  • numerous I/O pins
  • Irregular routing

10
DCT/IDCT Hardware Acceleration (2/2)
  • Recent researches
  • More regular DCT architectures
  • Systolic arrays, recursive structures, and
    distributed arithmetic (DA)
  • Lower level architecture
  • Adders and memory architectures
  • Mathematical property
  • DCT pruning algorithms
  • Skipping all-zero IDCT input and truncated
    multiplication
  • General low-level techniques
  • Clock gating, low-transition data paths and
    voltage scaling
  • SA-DCT algorithm
  • Trade off scalability, modularity and regularity

11
Shape Encoding Hardware Acceleration
  • Context based arithmetic coding (CAE)
  • The second most computational expensive process
    in a MPEG-4 encoder
  • Shape decoding is the most complex task in a
    MPEG-4 decoder
  • Bit data parallelism processing, bit addressing
    scheme, efficient reuse of windowed pel data

12
Architectural Trend
  • Multicore SoC
  • Many different modes, options, and switches that
    are provided within the MPEG-4 standard
  • Enough flexibility in the dedicated HW blocks
  • Heterogeneous SoC
  • Multiple cores for different classes of
    algorithms
  • Low power for mpeg-based mobile devices
  • High level approaches
  • Consider specific behavioral aspects of MPEG to
    reduce computations
  • Architectural-level memory optimization for the
    best area-speed-power trade-off
  • Reducing the amount of memory accesses through
    multiple memory splitting into sub-banks
  • Selective line activation
  • Bit-line segmentation
  • On-chip memory
  • Application specific data-flow transformations
    (memory interleaving)
  • Efficient memory bandwidth balance between
    on-chip and off-chip memory
  • Local memory architectures to avoid system bus
    conflicts and congestion

13
Conclusion
  • Focus on low-power enhancements of the HW
    solutions for MPEG-4s computationally intensive
    video compression tools
  • Technology shift from desktop platforms to mobile
    platforms
  • Area/performance design space
  • Performance/power design space
  • Hybrid power-efficient hardware acceleration
    architectures
Write a Comment
User Comments (0)
About PowerShow.com