FastVDO Unified 16Bit Framework - PowerPoint PPT Presentation

About This Presentation
Title:

FastVDO Unified 16Bit Framework

Description:

April 01 FastVDO showed how H.26L can be made fully 16-bit ... All contenders match current 32-bit performance. All offer reduced, nearly identical complexity ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 27
Provided by: pankajt1
Category:

less

Transcript and Presenter's Notes

Title: FastVDO Unified 16Bit Framework


1
FastVDO Unified 16-Bit Framework
  • Pankaj Topiwala
  • FastVDO LLC
  • Columbia, MD 21046 USA
  • pnt_at_fastvdo.com
  • JVT-B103

2
In the Beginning (April 01)
  • April 01 FastVDO showed how H.26L can be made
    fully 16-bit with no loss of performance (M16).
  • At SB 4 other proposals also supported 16-bits
  • Key lessons learned
  • Negligible performance or complexity difference
  • Quantization is very flexible! TI/Sharp showed
    that it can be manipulated to make even the tml
    transform into 16-bits
  • Quant. memory rqmts can be minimized by
    periodicity
  • Proper focus the transform which can limit
    applications
  • Quantization can be safely decoupled

3
Rapid Growth of Applications
  • But the application space of H.26L is growing
  • low-rate wireless conversational (28 kb/s 256
    kb/s),
  • mid-rate streaming, VOD (64 kb/s 1 Mb/s)
  • high-rate TV broadcasting (1- 4 Mb/s)
  • high-rate storage for future DVD (5-30 Mb/s)
  • Multirate digital cinema (mid-rate, visually
    lossless distribution, and lossless archive 30
    Mb/s, 200 Mb/s, and low Gb/s)
  • ultra-high rate HDTV (30 Mb/s low Gb/s)
  • lossless medical (similar)
  • Entertainment applications poised to dominate.

4
Desire
  • One framework that fits all applications
  • Reduce fragmentation of standard
  • Limit proliferation of inconsistent technologies
  • Improve interoperability between profiles
  • Significantly improve content reuse
  • FastVDO introduced such a framework
  • Still inadequately understood
  • Will now explain concretely

5
Motivation
  • Similar coding performance as the DCT
  • Supporting a 16-bit (or less) architecture
  • Low complexity, multiplierless implementation
    (adds, right shifts)
  • Invertible integer-to-integer mapping
  • In-place computation

6
DCT
  • Coding gain 7.57 dB Complexity 8 adds, 6
    mults in floating point
  • Integer approximation

7
Lifting Structure
8
Generic 4-Pt Lifting Transform
e
b
f
c
d
Note If a u are dyadic rationals, then this
- is exactly invertible! - has a mult-free def,
multiple ways, and - is very stingy in bit
expansion!
9
Generic 4-Pt Inverse Transform
-
-
10
Example 1 FastVDO X1 - Hadamard
au1/2 bcdefp1.
Three equivalent implementations - matrix
multiply - mult-free direct (8 adds, 0
scalings) - lifting (also mult-free 8a 2s)
11
Example 2 FastVDO X2
a1/2 bcdef1, up-1.
Three equivalent implementations - matrix
multiply - mult-free direct (8 adds, 1
scalings) - lifting (also mult-free 8a 1s)
12
Example 3 FastVDO X3
a1/2 bcdef1, p-2,u2/5.
W.K.Cham, 1989. X3 proposed by MS,
Nokia. Non-dyadic numbers mean Non-invertible
transform.
Three equivalent implementations
- matrix multiply - mult-free direct (8 adds,
2 scales) - lifting (but with mults!! or
approx. u)
13
Example 4 FastVDO X4
apu1/2 bcdef1.
Note High Coding Gain CG(X4) 7.55 dB CG(DCT)
7.57 dB
Three equivalent implementations
- matrix multiply - mult-free direct (9 adds,
2 scalings) - lifting (also mult-free 8a 3s)
14
Example 5 FastVDO X5
a1/2 bcdef1, p7/16, u3/8.
Note High Coding Gain CG(X5) 7.57 dB CG(TML)
7.57 dB CG(DCT) 7.57 dB
Three equivalent implementations - matrix
multiply - mult-free direct (9 adds, 7 scales)
- lifting (also mult-free 10a 5s)
15
Detailed Implementation of X5
16
Performance-Complexity
17
Dynamic Range
18
8 x 8 BinDCT
Coding gain 8.77 to 8.82 dB for AR(1) process
with p0.95
19
16 x 16 BinDCT
Coding gain 9.4499 dB for AR(1) process with
p0.95
20
Lessons Learned
  • All transforms considered fall under our rubric
    (other than tml)
  • No new transforms introduced in 9 months
  • Growing app. list needs transform innovation
  • Quantization is very flexible
  • Innovations have in fact been made in
    quantization
  • Sharp/TI showed that even tml can be made 16-bits
  • Quantization can be adapted to transform

21
Lessons Learned (2) - TML
  • TML transform
  • OK for low-complexity, wireless app. using fixed
    hard-wired architectures that need matrix
    multiply
  • But unfriendly integers not good for ASIC
  • Not optimized for bit preservation, high-rate
    apps.
  • Not invertible
  • Not generalizable to larger transforms
  • Satisfies one transform method only direct
    matrix multiply

22
Lessons Learned (3) - Cham
  • OK for matrix and mult-free applications
  • Notionally adds 6 bits in forward transform
  • Needs truncation for higher-bit data
  • Likely penalty for high-rate, high-bit sources
  • Testing on high-bit data critical
  • Is not invertible (lifting not dyadic rational)
  • Does not generalize to higher transforms
  • Satisfies 2 (of 3) transform methods

23
Relative Merits
  • General Comparisons
  • All contenders match current 32-bit performance
  • All offer reduced, nearly identical complexity
  • Unique Advantages
  • General framework to address broad range of needs
  • Very tight bit control
  • Demonstrated 16-bits output for 12-bit input, no
    truncation
  • Suitable for higher-bit data, and high rates
  • Related designs for higher sizes (8-pt, 16-pt)
  • Advantages of lifting improve further with size

24
Currently No Concensus
  • First address the low-complexity, low-rate
    problem
  • Consider high-rate problem later, probably with a
    different transform
  • If lossless is needed, probably a 3rd transform
  • Energy misdirected to date
  • Some proponents backed single transforms, assumed
    original
  • Tests for performance, complexity metrics
    inconclusive
  • Missing the bigger picture support a wide
    variety of apps
  • Our vision use a single framework if possible
  • Going forward focus on our individual strengths

25
Recommendations
  • Transform and Quantization can be decoupled
  • Adopt the framework
  • Prefer downloadable filters
  • Innovate in the transforms, goal of 3 transform
    methods
  • Finalize in the reflector, adopt in May
  • Tailor transform to wide variety of applications
  • Transform Activity can work directly in
    conjunction with other groups (e.g., Trans. Size,
    ABT, Interlace, Quantization, )
  • Quant can focus on transform adaptation, finer
    quantization, periodicity, etc.

26
Recommendations (2)
  • Focus transforms on high-quality, high-rate
  • Low-complexity case well understood
  • High-rate apps just emerging in JVT
  • Streaming, VOD
  • Broadcast (interlaced)
  • Film
  • Storage (DVD)
  • Higher block sizes
  • Review ABT options
  • Digital Cinema -- we have data
  • Look for synergies with low-complexity case
Write a Comment
User Comments (0)
About PowerShow.com