Title: FastVDO Unified 16Bit Framework
1FastVDO Unified 16-Bit Framework
- Pankaj Topiwala
- FastVDO LLC
- Columbia, MD 21046 USA
- pnt_at_fastvdo.com
- JVT-B103
2In the Beginning (April 01)
- April 01 FastVDO showed how H.26L can be made
fully 16-bit with no loss of performance (M16). - At SB 4 other proposals also supported 16-bits
- Key lessons learned
- Negligible performance or complexity difference
- Quantization is very flexible! TI/Sharp showed
that it can be manipulated to make even the tml
transform into 16-bits - Quant. memory rqmts can be minimized by
periodicity - Proper focus the transform which can limit
applications - Quantization can be safely decoupled
3Rapid Growth of Applications
- But the application space of H.26L is growing
- low-rate wireless conversational (28 kb/s 256
kb/s), - mid-rate streaming, VOD (64 kb/s 1 Mb/s)
- high-rate TV broadcasting (1- 4 Mb/s)
- high-rate storage for future DVD (5-30 Mb/s)
- Multirate digital cinema (mid-rate, visually
lossless distribution, and lossless archive 30
Mb/s, 200 Mb/s, and low Gb/s) - ultra-high rate HDTV (30 Mb/s low Gb/s)
- lossless medical (similar)
- Entertainment applications poised to dominate.
4Desire
- One framework that fits all applications
- Reduce fragmentation of standard
- Limit proliferation of inconsistent technologies
- Improve interoperability between profiles
- Significantly improve content reuse
- FastVDO introduced such a framework
- Still inadequately understood
- Will now explain concretely
5Motivation
- Similar coding performance as the DCT
- Supporting a 16-bit (or less) architecture
- Low complexity, multiplierless implementation
(adds, right shifts) - Invertible integer-to-integer mapping
- In-place computation
6DCT
- Coding gain 7.57 dB Complexity 8 adds, 6
mults in floating point - Integer approximation
7Lifting Structure
8Generic 4-Pt Lifting Transform
e
b
f
c
d
Note If a u are dyadic rationals, then this
- is exactly invertible! - has a mult-free def,
multiple ways, and - is very stingy in bit
expansion!
9Generic 4-Pt Inverse Transform
-
-
10Example 1 FastVDO X1 - Hadamard
au1/2 bcdefp1.
Three equivalent implementations - matrix
multiply - mult-free direct (8 adds, 0
scalings) - lifting (also mult-free 8a 2s)
11Example 2 FastVDO X2
a1/2 bcdef1, up-1.
Three equivalent implementations - matrix
multiply - mult-free direct (8 adds, 1
scalings) - lifting (also mult-free 8a 1s)
12Example 3 FastVDO X3
a1/2 bcdef1, p-2,u2/5.
W.K.Cham, 1989. X3 proposed by MS,
Nokia. Non-dyadic numbers mean Non-invertible
transform.
Three equivalent implementations
- matrix multiply - mult-free direct (8 adds,
2 scales) - lifting (but with mults!! or
approx. u)
13Example 4 FastVDO X4
apu1/2 bcdef1.
Note High Coding Gain CG(X4) 7.55 dB CG(DCT)
7.57 dB
Three equivalent implementations
- matrix multiply - mult-free direct (9 adds,
2 scalings) - lifting (also mult-free 8a 3s)
14Example 5 FastVDO X5
a1/2 bcdef1, p7/16, u3/8.
Note High Coding Gain CG(X5) 7.57 dB CG(TML)
7.57 dB CG(DCT) 7.57 dB
Three equivalent implementations - matrix
multiply - mult-free direct (9 adds, 7 scales)
- lifting (also mult-free 10a 5s)
15Detailed Implementation of X5
16Performance-Complexity
17Dynamic Range
188 x 8 BinDCT
Coding gain 8.77 to 8.82 dB for AR(1) process
with p0.95
1916 x 16 BinDCT
Coding gain 9.4499 dB for AR(1) process with
p0.95
20Lessons Learned
- All transforms considered fall under our rubric
(other than tml) - No new transforms introduced in 9 months
- Growing app. list needs transform innovation
- Quantization is very flexible
- Innovations have in fact been made in
quantization - Sharp/TI showed that even tml can be made 16-bits
- Quantization can be adapted to transform
21Lessons Learned (2) - TML
- TML transform
- OK for low-complexity, wireless app. using fixed
hard-wired architectures that need matrix
multiply - But unfriendly integers not good for ASIC
- Not optimized for bit preservation, high-rate
apps. - Not invertible
- Not generalizable to larger transforms
- Satisfies one transform method only direct
matrix multiply
22Lessons Learned (3) - Cham
- OK for matrix and mult-free applications
- Notionally adds 6 bits in forward transform
- Needs truncation for higher-bit data
- Likely penalty for high-rate, high-bit sources
- Testing on high-bit data critical
- Is not invertible (lifting not dyadic rational)
- Does not generalize to higher transforms
- Satisfies 2 (of 3) transform methods
23Relative Merits
- General Comparisons
- All contenders match current 32-bit performance
- All offer reduced, nearly identical complexity
- Unique Advantages
- General framework to address broad range of needs
- Very tight bit control
- Demonstrated 16-bits output for 12-bit input, no
truncation - Suitable for higher-bit data, and high rates
- Related designs for higher sizes (8-pt, 16-pt)
- Advantages of lifting improve further with size
24Currently No Concensus
- First address the low-complexity, low-rate
problem - Consider high-rate problem later, probably with a
different transform - If lossless is needed, probably a 3rd transform
- Energy misdirected to date
- Some proponents backed single transforms, assumed
original - Tests for performance, complexity metrics
inconclusive - Missing the bigger picture support a wide
variety of apps - Our vision use a single framework if possible
- Going forward focus on our individual strengths
25Recommendations
- Transform and Quantization can be decoupled
- Adopt the framework
- Prefer downloadable filters
- Innovate in the transforms, goal of 3 transform
methods - Finalize in the reflector, adopt in May
- Tailor transform to wide variety of applications
- Transform Activity can work directly in
conjunction with other groups (e.g., Trans. Size,
ABT, Interlace, Quantization, ) - Quant can focus on transform adaptation, finer
quantization, periodicity, etc.
26Recommendations (2)
- Focus transforms on high-quality, high-rate
- Low-complexity case well understood
- High-rate apps just emerging in JVT
- Streaming, VOD
- Broadcast (interlaced)
- Film
- Storage (DVD)
- Higher block sizes
- Review ABT options
- Digital Cinema -- we have data
- Look for synergies with low-complexity case