Chapter 3' Source Coding - PowerPoint PPT Presentation

1 / 86
About This Presentation
Title:

Chapter 3' Source Coding

Description:

This is called the mutual information between and ... for instantaneously decodable codes must look like a probability mass function ... – PowerPoint PPT presentation

Number of Views:2745
Avg rating:3.0/5.0
Slides: 87
Provided by: philli82
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3' Source Coding


1
Chapter 3. Source Coding
  • Recall the Purpose of Source Coding
  • Efficiently (i.e. minimizes the number of bits)
    represents the information source output
    digitally
  • Can be viewed as data compression
  • Start by developing a mathematical model for
    information
  • NOTE The results of chapter 2 are used since
    the information sequence can be viewed as a
    stochastic process. This is a subtle but
    important point. If we knew what we wanted to
    transmit a priori, then there would be no need to
    have a communication system. The receiver would
    know what to expect.

2
3.1 Mathematical Models for Information Sources
  • Discrete Source Letters selected from an
    alphabet with a finite (say L) number of
    elements, .
  • Binary Source Two letters in the alphabet.
    WLOG, alphabet is the set .
  • Each letter in alphabet has a probability of
    occurring at any given time of . That is,
  • At each point in time, one letter of the alphabet
    is chosen, implying,
  • We will only consider two types of discrete
    sources
  • Memoryless Assumes each letter is chosen
    independently of every other letter (past and
    present). Gives a discrete memoryless source
    (DMS)
  • Stationary The joint probabilities of any two
    sequences of arbitrary length formed by shifting
    one sequence by any amount are equal.

3
3.1 Mathematical Models for Information Sources
  • Analog Source An analog source is represented
    by a waveform, that is a sample function of
    a stochastic process
  • Unless otherwise noted, we will assume is
    stationary thus having autocorrelation function
    and power spectral density
  • When is band-limited,
    then the signal can be represented by
    the sampling theorem. The sequence that comes
    from the sampling theorem can be
    viewed as a discrete time source.
  • Note that while the stochastic process generated
    by the sampling theorem is discrete in time, it
    is generally continuous at any instant of time.
    Thus there there is the need to quantize the
    values of the sequence, producing quantization
    error.

4
3.2 A Logarithmic Measure of Information
  • Lets develop the concept of information as a
    measure of how much knowing the outcome of one
    RV, tells us about the outcome of another RV,
    .
  • Lets start with two RVs with a finite set of
    outcomes
  • We observe some outcome and wish to
    quantitatively determine the amount of
    information this occurrence provides about each
    possible outcome of the RV
  • Note the two extremes
  • If X and Y are independent then knowledge of one
    provides no knowledge of the other. (Which we
    would like to have a measure of zero
    information.)
  • If X and Y are fully dependent the knowledge of
    one provides absolute knowledge of the other.
    (Thus the measure of information should relate to
    just the probability of .)

5
3.2 A Logarithmic Measure of InformationMutual
Information
  • A suitable measure that captures this is
  • This is called the mutual information between
    and
  • The units of are determined by the
    base of the logarithms which is usually either 2
    or e. Base 2 units are called bits (binary unit)
    and base e units are called nats (natural units).
  • Note that this satisfies our intuition on a
    measure for information
  • Independent events
  • Fully dependent events

6
3.2 A Logarithmic Measure of InformationSelf-Info
rmation
  • But note that the equation for fully dependent
    events is just the information of the event
    . Thus the equation
  • is called the self-information of the event
    .
  • Note that a high probability event conveys less
    information than a low probability event. This
    may seem counter-intuitive at first but it is
    exactly what we want in a measure of self
    information.
  • Consider the following thought experiment. Which
    statement conveys more information?
  • The forecast for Phoenix AZ for July 1st is sunny
    and 95º F.
  • The forecast for Phoenix AZ for July 1st is 1 of
    snow and -5º F.
  • Note that the more shocking the statement, the
    less likely its occurring, thus the more
    information it conveys.
  • In fact, if the outcome is deterministic, then no
    information was conveyed. Hence there was no
    need to transmit the data.

E
7
3.2 A Logarithmic Measure of InformationCondition
al Self-Information
  • Lets define conditional self-information as
  • The reason for this is
  • Which provides a useful relationship for mutual
    information being the removal of conditional
    self-information from self-information.
  • Note that since both and
    this implies mutual information can be
    positive, negative or equal to zero.

8
3.2.1 Average Mutual Information and
EntropyAverage Mutual Information
  • Mutual information was defined for a pair of
    events . Now we would like to look at
    the average value of the mutual information
    across all possible pairs of events. This is the
    definition of the expectation.
  • Note While the mutual information of an event
    can be negative, the average mutual information
    is greater than or equal to zero. And equality
    to zero only occurs when X and Y are
    statistically independent.

9
3.2.1 Average Mutual Information and
EntropyEntropy
  • Similarly, we define the average self-information
    as
  • Note that average self-information is denoted by
    and this term is called the entropy of
    the source.
  • ASIDES
  • The definition of entropy (as well as all of the
    other definitions for this chapter) are not
    functions of the values that the RV takes on but
    rather functions of the pdf of the RV. This is
    called functional of the distribution.
  • The reason for the use of the term entropy for
    the average measure of self information is that
    there is a relation between this measure and the
    measure of entropy in thermodynamics.

10
3.2.1 Average Mutual Information and Entropy
Axiomatic Approach to Entropy
  • We have defined information from an intuitive
    approach. This may facilitate learning but is
    not rigorous. For example, are there other
    possible measures of information? Our approach
    does not allow us to explore an answer to that
    question. However, it is possible (and is the
    approach that Shannon took) to define entropy
    (and thus all the other information measures)
    axiomatically by defining the properties that
    entropy and RVs must satisfy. The axioms needed
    for a functional measure of information are based
    upon a symmetric function
  • Normalization
  • Continuity is a
    continuous function of p
  • Grouping
  • Under these axioms, the functional of entropy
    must be of the form

11
3.2.1 Average Mutual Information and Entropy
  • Note that when the RV is distributed uniformly,
    then
  • Also, this is the maximum value that the entropy
    will take on. That is, the entropy of a discrete
    source is a maximum when the output letters are
    equally probable.
  • Note that we will use the convention
    since

12
Figure 3.2-1 Binary entropy function.
MATLAB Code q0.001.011
H-q.log2(q)-(1-q).log2(1-q) plot(q,H)
axis square title('Entropy of a Independent
Binary Source') xlabel('Probability q')
ylabel('Entropy H(q)')
13
3.2.1 Average Mutual Information and
EntropyAverage Conditional Entropy
  • Average conditional entropy is defined as
  • and is interpreted as the information (or
    uncertainty) in X after Y is observed.
  • We can easily derive a useful relationship for
    mutual information as
  • Since this implies that
    with equality iff X and Y
    are statistically independent.
  • This can be interpreted as saying that knowledge
    of any event always increases the certainty of
    other events (or has no effect if statistically
    independent). That is, knowledge never increases
    entropy.

14
Figure 3.2-2 Conditional entropy for
binary-input,binary-output symmetric channel.
15
Figure 3.2-3 Average mutual information for
binary-input, binary-output symmetric channel.
16
3.2.1 Average Mutual Information and
EntropyMultiple RVs
  • Generalization of entropy to multiple RVs
  • Note the following visual relationship between
    all the quantities of average information
    mentioned

17
3.2.2 Information Measures for Continuous Random
Variables
  • Since information measures are functionals of the
    pdf, there is a straight forward extension to the
    information of continuous RVs. It is just the
    replacement of summations by integrations in the
    expectation
  • For example, the continuous RV version of mutual
    information is

18
3.2.2 Information Measures for Continuous Random
Variables
  • Recall that the interpretation of self
    information is the number of bits needed to
    represent an information source. For a
    continuous RV, the probability of any event
    occurring is zero, thus the entropy becomes
    infinite. However, we can still define the
    useful relationship
  • but note this is called the differential entropy
    and cannot be interpreted the same way that
    entropy of a discrete RV is interpreted.

19
3.2.2 Information Measures for Continuous Random
Variables
  • But the concept of differential entropy does
    allow us to develop a useful equation.
  • First define the average conditional entropy for
    a continuous RV as
  • Then
  • which is the same for discrete RVs.

E
20
3.3 Coding For Discrete Sources
  • We can now (finally) use the framework developed
    to date (i.e stochastic processes and information
    measures) to develop source coding for a
    communication system.
  • We will measure the efficiency of the source
    encoder by comparing the average number of bits
    per letter of the code to the entropy of the
    entropy.
  • Note The problem of coding is easy to solve if
    you can assume a DMS (i.e. statistically
    independent letters). But a DMS is rarely an
    accurate model of an information source.

21
3.3.1 Coding for Discrete Memoryless Sources
  • Given a DMS producing a symbol every seconds.
  • The alphabet of the source is
  • The probability of each symbol at any given point
    in time is
  • The entropy for the source come directly from the
    definition
  • And the entropy is largest when each symbol is
    equally probable
  • Two approaches to DMS source coding
  • Fixed-length code words
  • Variable-length code words

22
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words
  • Consider a block encoding scheme which assigns a
    unique set of bits to each symbol. Recall we
    are given there are symbols, implying there
    are a minimum of

bits per symbol required if is a power of 2
or bits per symbol required if is not a power
of 2.
  • Example 26 letters in the alphabet implies a
    fixed length code requires at least

    bits per symbol.
  • The code rate is now bits per symbol.
  • Since

23
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words
  • Efficiency The efficiency of a coding scheme is
    measured as
  • Note When the number of symbols is equal to a
    power of 2 and each symbol is equally likely to
    occur, the efficiency is 100.
  • Note When the number of symbols is not equal to
    a power of 2, even when the symbols are equally
    likely to occur, efficiency will always be less
    than 100 since
  • Thus if the number of bits needed to encode the
    alphabet is large (i.e. is large, which
    implies ) the efficiency of the
    coding is large.
  • What can we do to increase the efficiency of the
    source encoding if is not large?

24
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words
  • One way to increase the coding efficiency for
    fixed-length codes of a DMS is to artificially
    increase the number of symbols in the alphabet by
    encoding multiple ( ) symbols at a time. For
    this case there are unique code words.
  • bits accommodates code words
  • To ensure each of the code words are
    covered we must ensure
  • This can be done by setting
  • The efficiency increases since
  • Thus we can increase efficiency as much as
    possible by arbitrarily increasing .

25
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words
26
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words
  • If there is at least one unique code word per
    source symbol (or block of source symbols) then
    the coding is called noiseless.
  • There are times when you may not want to have one
    code word per symbol. Can anyone think of why
    this may be?
  • When there are fewer code words than source
    symbols (or blocks of source symbols) then
    rate-distortion approaches are used.
  • Consider for now the following
  • We want to reduce the code rate
  • Only of the most likely of the
    possible symbol blocks will be uniquely encoded.
  • The remaining blocks are
    represented by the remaining code word
  • Thus there will be a decoding error, ,each
    time one of these blocks appear. Such an error
    is called a distortion.

27
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words
  • Based upon this block encoding procedure, Shannon
    proved the following
  • Source Coding Theorem I Let be the ensemble
    of letters from a DMS with finite entropy
    . Blocks of symbols from the source are
    encoded into code words of length from a
    binary alphabet. For any , the
    probability of a block decoding failure can
    be made arbitrarily small if
  • and sufficiently large. Conversely, if
  • then becomes arbitrarily close to 1 as
    is made sufficiently large.
  • Proof omitted.

28
3.3.1 Coding for Discrete Memoryless
SourcesVariable-Length Code Words
  • Another way to increase the source encoding
    efficiency when symbols are not equally likely is
    to use variable-length code words.
  • The approach is to minimize the number of bits
    used to represent highly likely symbols (or
    blocks of symbols) and use more bits for those
    symbols (or blocks of symbols) that occur
    infrequently.
  • This type of encoding is also called entropy
    encoding since you are trying to minimize the
    entropy of your information source.
  • There are other constraints to consider as well
  • Code must be unique
  • Instantaneously decodable

29
3.3.1 Coding for Discrete Memoryless
SourcesClasses of Codes
Instantaneous Codes
Uniquely Decodable Codes
Non-Singular Codes
All Codes
30
3.3.1 Coding for Discrete Memoryless
SourcesVariable-Length Code Example
  • Example Consider the DMS with four symbols and
    associated probabilities
  • Three possible codes given below.
  • Try and decode the sequence 001001..

31
3.3.1 Coding for Discrete Memoryless
SourcesPrefix Condition and Code Trees
  • A sufficient condition for a code to be
    instantaneously decodable, is that no code word
    of length that is identical to the first bits
    of another code word whose length is greater than
    .
  • This is known as the prefix condition.
  • Note that unique codes can be visualized by code
    trees where branches represent the bit value used
    and nodes represent code words.

32
3.3.1 Coding for Discrete Memoryless
SourcesAverage Bits per Source Letter and Kraft
Inequality
  • Define the average number of bits per source
    letter as
  • where is the length of the code word
    associated with source letter
  • This is the quantity we would like to minimize.
  • The conditions for the existence of a code that
    satisfies the prefix condition is given by the
    Kraft inequality.
  • A necessary and sufficient condition for the
    existence of a binary code with code words having
    lengths that satisfy
    the prefix condition is

  • or
  • The effect of this inequality is that code
    assignments for instantaneously decodable codes
    must look like a probability mass function

33
3.3.1 Conceptualization of the Kraft Inequality
34
3.3.1 Coding for Discrete Memoryless
SourcesSource Coding Theorem II
  • Theorem Let be the ensemble of letters from
    a DMS with finite entropy and output
    letters with corresponding
    probabilities of occurrence .
    It is possible to construct a code that satisfies
    the prefix condition and has an average length
    that satisfies the inequalities
  • Unfortunately, as is the case with many proofs
    associated with information theory, the proof of
    the Source Coding Theorem II is not constructive.
    That is, it only proves the existence of a code
    to satisfy the inequalities. It does not give
    any insight into how to construct such a code.

35
3.3.1 Coding for Discrete Memoryless
SourcesHuffman Coding Algorithm
  • Huffman (1952) developed an approach for
    developing variable length codes that is optimum
    in the sense that the average number of bits
    needed to represent the source is a minimum,
    subject to the constraint that the code words
    satisfy the prefix condition.
  • Procedure
  • Order the source symbols in decreasing order of
    probabilities
  • Encode the two least probable symbols by
    assigning a value of 0 and 1 to the symbols
    arbitrarily (or systematically).
  • Tie these two symbols together, adding their
    probabilities to obtain a new symbol.
  • Are all symbols accounted for?
  • No, return to step 2
  • Yes, continue
  • The symbol code is obtained by looking at the
    tree structure developed by the above procedure.

36
Figure 3.3-4 An example of variable-length
sourceencoding for a DMS.
37
Figure 3.3-5 An alternative code for the DMS
inExample 3.3-1.
38
Example 3.3-1 An example of variable-length
source encoding for a DMS.
39
Figure 3.3-6 Huffman code for Example 3.3-2
40
3.3.1 Coding for Discrete Memoryless
SourcesExtension of Source Coding Theorem II to
Blocks of Length J
  • Extending the Source Coding Theorem II to blocks
    of length gives the inequalities
  • Thus, the average number bits per source symbol
    can be made arbitrarily close to the source
    entropy by selecting a sufficiently large block
    length.

41
Example 3.3-3 An example of variable-length
source encoding for a DMS Using Blocks.
P(x1)0.45
P(x2) 0.35
0.50
P(x3) 0.20
42
Example 3.3-3 An example of variable-length
source encoding for a DMS Using Blocks.
P(x1, x1)0.2025
P(x1, x2)0.1575
P(x2, x1)0.1575
0.5975
0.28
1.0
P(x2, x2)0.1225
0.3175
0.4025
P(x1, x3)0.09
P(x3, x1)0.09
0.16
0.20
P(x2, x3)0.07
P(x3, x2)0.07
0.11
P(x3, x3)0.04
43
Example 3.3-3 An example of variable-length
source encoding for a DMS Using Blocks.
44
3.3.2 Discrete Stationary Sources
  • Remove the condition of independence from our
    source but keep the condition of stationary.
  • Consider the entropy of a block of symbols from a
    source
  • Recall that joint probabilities can be factored
  • This leads to the entropy of a block being
    factored as
  • Which can be viewed as the entropy of a block of
    k letters

45
3.3.2 Discrete Stationary Sources
  • To get the entropy per letter for this block of
    k letters, divide by k, which gives
  • Since we can often assume this source will spit
    out an infinite number of symbols, we would like
    to consider
  • We can also define the entropy per letter as a
    function of the conditional entropy. It can be
    shown that this gives

46
3.3.2 Discrete Stationary Sources
  • Writing the Source Coding Theorem II to
    accommodate a joint PDF gives
  • Now, in the limit, this gives
  • Thus we can get arbitrarily close to encoding at
    100 efficiency by letting the block size grow.
  • NOTE Huffman coding is still applicable in this
    case.
  • NOTE You must know the joint PDF for the
    J-symbol blocks. (Which becomes more difficult
    as J increases.)

47
3.3.3 The Limpel-Ziv Algorithm
  • The joint probabilities needed for a block
    Huffman code is quite often unobtainable.
  • This provided the motivation for the development
    of the Limpel-Ziv algorithm. This technique is
    independent of the source statistics.
  • Techniques that are independent of the source
    statistics are called universal source codes.
  • Limpel-Ziv parses a discrete source into
    phrases where a phrase is defined as a sequence
    of symbols not yet seen by the algorithm.
  • These phrases are then put into a dictionary
    which will be used to reference each phrase.

48
3.3.3 The Limpel-Ziv AlgorithmExample
  • The sequence
  • Becomes
  • Now form a dictionary

10101101001001110101000011001110101100011011
1,0,10,11,01,00,100,111,010,1000,011,001,110,101,1
0001,1011
49
3.3.3 The Limpel-Ziv AlgorithmExample
  • Note that in this example there are 44 bits.
  • To encode this sequence we use 51680 bits
  • No compression occurred here.
  • This is due to the shortness of the sequence
    being encoded.
  • The longer the sequence, the better the
    compression rate, hence the better the
    efficiency.
  • Limpel-Ziv encoding is the basis for .zip based
    data compression codes.

50
3.4 Coding For Analog SourcesOptimum
Quantization
  • Now consider only information sources that are
    analog in nature
  • The output of the information source can be
    modeled as sample function of a stochastic
    process.

Analog
Information Sequence
Information Source
Source Encoder
Sample
Stochastic Process
51
3.4 Coding For Analog SourcesOptimum
Quantization
  • The basic approach is to
  • Sample evenly through time to produce the
    sequence
  • Note that if is band-limited and
    stationary, then sampling at or above the Nyquist
    rate induces no loss of information.
  • Note that each sample can take still take on an
    infinite number of heights
  • Quantize the amplitudes to limit the number of
    possible values. This provides discrete source.
  • The number of bins used to quantize is based upon
    the number of bits per sample to be used to
    enocde
  • The size of each bin used in quantizing is a
    design issue
  • If the probability of being in each bin is known,
    then entropy coding techniques can be used to
    design the coding scheme
  • Quantization induces distortion to the waveform.
    We need to be able to understand and measure this
    distortion.

52
3.4 Coding For Analog SourcesOptimum
Quantization
53
3.4.1 Rate-Distortion Function
  • As mentioned earlier, quantization induces
    distortion (i.e. a loss of information content)
    in the original signal.
  • We must define a measure for distortion.
  • Many exist, most of the form
  • We will only consider the case
  • Given a sequence of samples, we would like
    to know the average distortion per letter
  • Now, since the average distortion is a function
    of a random variables, making is a random
    variable. We define its mean as the distortion.

Stationary Assumption
54
3.4.1 Rate-Distortion Function
  • We want to minimize the rate, (in bits) to
    encode the information source with an average
    distortion . The distortion is set based upon
    a level acceptable to our application.
  • This is done through the use of mutual
    information. (Recall that an interpretation of
    mutual information is the how much knowledge of
    one random variable tells you about another. And
    it is measure in bits.) Thus we want
  • Note that this is a function of the distortion
    and it is the minimum across all conditional
    pdfs.
  • In general, and intuitively, the rate decreases
    as the acceptable distortion increases and vise
    versa.

55
3.4.1 Rate-Distortion FunctionMemoryless
Gaussian Source
  • Restrict our interest to a continuous-amplitude,
    memoryless Gaussian source. Shannon proved the
    following for this case
  • The minimum information rate necessary to
    represent the output of a discrete-time,
    continuous-amplitude memoryless Gaussian source
    based on a mean-square-error distortion measure
    per symbol is
  • where is the variance of the Gaussian
    source output.
  • Not the this implies that no information needs to
    be transmitted when the acceptable distortion is
    greater than or equal to the variance.

56
3.4.1 Rate-Distortion Function Memoryless
Gaussian Source
D00.011 R0.5log2(1/D) plot(D,R)
axis square xlabel('D/\sigma2')
ylabel('R_g(D) in bits/symbol')
57
3.4.1 Rate-Distortion FunctionTheorem Source
Coding with a Distortion Measure
  • Theorem There exists an encoding scheme that
    maps the source output into code words such that
    for any given distortion, the minimum rate,
    in bits per symbol is sufficient to reconstruct
    the source output with an average distortion that
    is arbitrarily close to .
  • Proof Omitted. See Shannon, 59 or Cover and
    Thomas
  • Thus the rate-distortion function provides a
    lower bound on the source rate for a given level
    of acceptable distortion.

58
3.4.1 Rate-Distortion FunctionDistortion-Rate
Function
  • It is also possible to write the distortion as a
    function of the rate. This yields a
    distortion-rate function.
  • Take for example the rate distortion function for
    a memoryless Gaussian source. Re-write it as the
    distortion as a function of the rate. (This
    allows you to design a system when the rate is
    fixed, instead of the accepted level of
    distortion.)
  • Expressing the distortion in decibels we have
  • Implying that each bit reduces the distortion by
    about 6 dB

59
3.4.1 Rate-Distortion FunctionUpper and Lower
Bounds
  • Development of rate-distortion functions for
    various pdfs is beyond the scope of this course.
  • It is useful though to bound the rate-distortion
    function of any discrete-time, continuous-amplitud
    e, memoryless source. Without proof, the
    following inequalities are given
  • where
  • Likewise, this bound can be solved WRT the
    distortion as a function of the rate. This
    gives
  • where
  • Note that an implication of the upper bound is
    that the Gaussian pdf has the largest
    differential entropy for a given variance.

60
3.4.2 Scalar Quantization
  • If the pdf of the signal amplitudes into the
    quantizer is known, then the encoding can be
    optimized. This is done by appropriately
    selecting the quantization levels such than the
    distortion is minimized. That is, we want to
    minimize
  • (I dont know why he changed notation.)
  • Over all possible set of quantization
    bins/levels. This is also called Lloyd-Max
    quantization.
  • Note if we want to use bits, then the number
    of levels is
  • Two approaches of interest are
  • Uniform levels
  • Non-uniform levels

61
3.4.2 Scalar QuantizationUniform Quantization
Illustration
62
3.4.2 Scalar QuantizationNon-Uniform
Quantization for 8 Bit Gaussian with Unit Variance
63
3.4.2 Scalar Quantization Non-Uniform
Quantization
  • For the non-uniform quantization case, we can
    minimize through the following analysis
  • First, write out the distortion function you want
    to minimize
  • Next, recall that a necessary condition to
    minimize any equation is that the first
    derivative must be equal to zero. Thus to
    minimize the distortion we must have the
    following conditions

64
  • Now, recall Leibniz Rule
  • Thus

0
1
0
0
1
0
65
3.4.2 Scalar Quantization Non-Uniform
Quantization
  • Similar analysis for yields
  • Interpretation of these two conditions gives

Midpoint
Center of Mass
66
3.4.2 Scalar QuantizationNon-Uniform
Quantization
  • The big picture
  • The optimum transition levels lie halfway between
    the optimum reconstruction levels. In turn, the
    optimum reconstruction levels lie at the center
    of mass of the probability density in between the
    transition levels.
  • The two equations giving these conditions are
    nonlinear and must be solved simultaneously. In
    practice, they can be solved by an iterative
    scheme such as Newtons method.
  • Properties of the Optimum Mean Square Quantizer
    (proofs omitted)
  • The quantizer output is an unbiased estimate of
    the input
  • The quantization error is orthogonal to the
    quantizer output
  • It is sufficient to design mean square quantizers
    for zero mean unity variance distributions.
  • Study tables 3.4-2 through 3.4-6

67
Figure 3.4-2 Distortion versus rate curves
fordiscrete-time memoryless Gaussian source.
68
3.4.3 Vector Quantization
  • Consider now quantization of a block of signal
    samples. This is called block or vector
    quantization.
  • Reasons for developing this approach include
  • Better performance (i.e. less distortion) can be
    obtained when through quantization of blocks.
  • Can take advantage of structure between dependent
    samples to further reduce the average bit rate.
  • The mathematical formulation of vector
    quantization is as follows
  • Given
  • n-dimensional, real-valued, continuous amplitude
    components, vector
  • Joint pdf associated with this vector
  • Find
  • Another n-dimensional vector Modeled through a
    mathematical transformation

69
Figure 3.4-3An example of quantization
intwo-dimensional space.
70
3.4.3 Vector Quantization
  • The average distortion for vector quantization
    becomes
  • where the distortion is often measured as
  • or, if the data is not distributed with an
    identity covariance matrix
  • where the matrix used is often the inverse of the
    covariance matrix of the data distribution.

71
3.4.3 Vector Quantization
  • Vector quantization can be viewed as the
    generalization of scalar quantization to
    multi-dimensions. In this light, there should be
    little surprise to learn that there are two
    conditions for optimally selecting a vector
    quantizer. These are
  • The quantization cell chosen is the one closest
    to the vector of interest
  • The vector representing a quantization cell is
    the centroid of that cell. It is the vector that
    minimizes
  • If the joint pdf is known, these two conditions
    can be found through iterative approaches.

72
3.4.3 Vector QuantizationK-Means Algorithm
  • If the pdf of the joint distribution is not
    known, an estimate of the optimum quantization
    vectors from a set of training vectors. One
    approach to this is called the K-Means algorithm.
  • K-Means Algorithm
  • Initialize by setting the iteration number
    . Choose a set of output vectors
  • Classify the training vectors
    into clusters by applying the
    nearest-neighbor rule
  • Increment your count and recompute the output
    vectors of every cluster by computing the
    centroid of the training vectors that fall in
    each cluster. Also compute the resulting average
    distortion at the th iteration.
  • Terminate the test if the change
    in the average distortions is relatively
    small. Otherwise go to step 2.

73
3.4.3 Vector QuantizationK-Means Algorithm
  • The K-Means algorithms will converge to a local
    minimum.
  • The computational burden of K-Means grows
    exponentially as a function of the input vector
    dimensionality.
  • Repeating the process with different initial
    output vectors may provide insight into the
    global minimum but at the expense of additional
    computational burden.
  • Sub-optimal algorithms exist which greatly
    mitigate the computational burden. But note that
    usually there is a separate requirement for
    memory for these approaches. (Not many free
    lunches.)
  • The output vectors of a vector quantizer are
    called the code book.

E
74
3.5 Coding Techniques for Analog Sources
  • The previous section described techniques for
    optimally discretizing (quantizing) an analog
    information source.
  • This section investigates several techniques used
    in practice to encode an analog information
    source. These can roughly be broken into three
    categories
  • Temporal Waveform Coding (Time domain)
  • PCM
  • DCPM
  • Adaptive (PCM/DCPM)
  • DM
  • Spectral Waveform Coding (Frequency domain)
  • SBC
  • ATC
  • Model-Based Coding (Model assumed on the
    structure of the data)

75
Figure 3.5-1Input-output characteristicfor a
uniform quantizer.
76
Figure 3.5-2Input-output magnitude
characteristic for alogarithmic compressor.
77
Figure 3.5-3(a) Block diagram of a DPCM
encoder. (b) DPCM decoder atthe receiver.
78
Figure 3.5-4DPCM modified by theaddition of
linearlyfiltered error sequence.
79
Figure 3.5-5Example of a quantizer with
anadaptive step size. ( Jayant, 1974. )
80
Figure 3.5-6(a) Block diagram of adelta
modulation system.(b) An equivalentrealization
of a deltamodulation system.
81
Figure 3.5-7An example of slope-overloaddistorti
on and granular noise in adelta modulation
encoder.
82
Figure 3.5-8 An example of variable-step-sizede
lta modulation encoding.
83
Figure 3.5-9An example of a deltamodulation
system withadaptive step size.
84
Figure 3.5-10 Block diagram of a waveform
synthesizer (source decoder) for an LPC system.
85
Figure 3.5-11 Block diagram model of the
generation of a speech signal.
86
Figure 3.5-12 All-pole lattice filter for
synthesizing the speech signal.
Write a Comment
User Comments (0)
About PowerShow.com