Title: Informed Audio Watermarking using Digital Chaotic Signals
1Informed Audio Watermarking using Digital Chaotic
Signals
- G.C. Silvestre, N.J. Hurley, G.S. Hanau
- and W.J. Dowling
- University College Dublin, Ireland
- University of Dublin, Trinity College, Ireland
2Overview
- Introduction
- Digital Chaotic Maps
- Psychoacoustic Model
- Model Formulation
- Simulation Results
- Discussion
3Introduction
- We present a data embedding scheme for digital
audio which shows strong robustness to MPEG-1
encoding. - Two important aspects of the model are
- The use of a psychoacoustic model to determine
the perceptually significant components in which
to embed the data AND moreover to adaptively
determine the maximum strength at which the
watermark can be embedded in different parts of
the signal, while remaining imperceptible. The
watermark strength is derived from an informed
parameter which must be reliably extracted from
the signal by the detector. - The use of chaotic maps to spread each watermark
symbol over a large number of samples. This
ensures robustness of detection (at a cost of
reduced embedding bit rate). Data symbols are
represented so that blind detection is enabled at
the decoder and self-synchronisation is possible
by exploiting the correlation properties of the
chaotic maps.
4Digital Chaotic Maps
- Digital chaotic maps are a class of deterministic
dynamical system admitting non-periodic signals
characterised by a continuous noise-like broad
spectrum. - Chaotic maps have some advantages over
conventional spread spectrum - Spread spectrum techniques are limited by the
periodic nature of the pseudo-random sequences
exploited. Chaotic maps, on the other hand, are
truly aperiodic - Using differential chaos shift keying (DCSK) and
the autocorrelation properties of chaotic maps, a
self-synchronising blind watermarking scheme can
be developed. - In this work, we represent the embedded symbols
using chaotic basis functions
5Digital Chaotic Maps (contd)
- Given a chaotic map, , each watermark
symbol bit, , is represented by a map of
length T using a binary DCSK technique given by -
- for a bit value 0, and by
-
- for a bit value 1.
- For example, c(n) might be a bernoulli map, given
by
6Detection
- Since they are aperiodic, there is no symbol
duration T over which the chaotic map will have
constant energy. Instead, the energy may be
considered as a stochastic variable, centred at
some mean value. - It can be shown that the standard deviation of
the energy scales approximately as 1/(BWT),
where BW is the statistical bandwidth of the
signal. Hence, by choosing T large enough, we
can assure that the energy is almost constant
over the symbol period. - Once the chaotic map is extracted from the
watermarked signal, the data symbols can be
retrieved by performing the autocorrelation - If a data symbol has been embedded, then this
correlation peaks at - Hence content are considered watermarked if, for
some threshold Thgt0, - If R is negative, then a 0 symbol is retrieved,
if it is positive then a 1 is retrieved.
7Correlation Property of Digital Map
- Correlation of Digital Map Retrieved at the
Decoder, with T600. The correlation in the case
of no noise and in the case of WNR-6dB
T samples are chosen from offset k and the first
half is correlated with the second half. The
correlation peaks at values of kmT where m is an
integer. A positive correlation implies that a
symbol 1 is embedded while a negative correlation
implies that a 0 is embedded.
8Psychoacoustic Principles
- Inaudible embedding is achieved through the use
of a psychoacoustic model which characterises the
perceptual properties of the Human Auditory
System (HAS). - The HAS is modelled by a set of 25 band-pass
filters, whose associated critical bands,
represent a non-linear mapping of the frequency
range, such that perception is largely similar
within each band. - For each critical band, the psychoacoustic model
seeks to determine all frequency maskers and
hence, a global masking threshold corresponding
to the sound pressure level below which a
frequency component becomes inaudible. From this
a Signal-to-Mask Ratio (SMR) is determined and
the amount of noise which can be imperceptibly
introduced to the critical band is derived. - In our watermarking scheme, this calculation is
performed for each critical band of the frames in
which data is embedded and is used to determine
the maximum strength of the watermark in that
band.
9Psychoacoustic Principles
Critical Band 1
Critical Band2
masker2
Masking threshold
masker1
SPL(dB)
maskee
Frequency(Hz)
Spread of Masking of Masker1
10Model Formulation
- To embed a data symbol, a T dimensional vector
is extracted from a discrete time audio signal
so that - In practise, the extraction process, , is
applied to a number N of audio frames in the
fourier domain and 25 components are extracted
from each frame, one corresponding to each
critical band of the frame. So T25xN - A modulation function, , is applied to to
embed T samples of a chaotic signal ,
resulting in a watermarked vector - The watermarked signal is finally given by
embedding back into i.e. - where is the embedding process
11Model Formulation (contd)
- The function is chosen such that the noise
introduced in is perceptually
insignificant. - A parameter derived from the psycho-acoustic
model is used to adaptively determine the
strength at which the watermark can be inserted
in each critical band without becoming
perceptual. - For example, may be a uniform quantiser
designed so that the quantisation noise is
bounded by a maximum noise determined from the
parameter. - The watermark data is modulated by dithering the
quantisation process.
12Simulation Results
- Computer Simulations are carried out using a 30s
mono audio signal sampled at 44.1kHz - Simulations compare
- A non-adaptive scheme in which the watermark
strength is constant throughout all critical
bands and - An adaptive scheme which varies the watermark
strength according to an informed parameter
derived from the psychoacoustic model. - Results are shown for frame sizes of 512 and 1024
samples. - The WSR for the simulations was set to 23dB, a
level at which the watermark could not be
perceived in subjective listening tests.
13Simulation Results
- Robustness of Adaptive Data Embedding Scheme as a
function of perceptual coding attacks using an
MPEG-1 algorithm for different values of the
watermark bit-rate - Frame Size 1024 Samples, WSR-23dB
14Simulation Results
- Robustness of Non-Adaptive Data Embedding Scheme
as a function of perceptual coding attacks using
an MPEG-1 algorithm for different values of the
watermark bit-rate - Frame Size 1024 Samples, WSR-23dB
15Simulation Results
- Robustness of Adaptive Data Embedding Scheme as a
function of perceptual coding attacks using an
MPEG-1 algorithm for different values of the
watermark bit-rate - Frame Size 512 Samples, WSR-23dB
16Simulation Results
- Robustness of Non-Adaptive Data Embedding Scheme
as a function of perceptual coding attacks using
an MPEG-1 algorithm for different values of the
watermark bit-rate - Frame Size 512 Samples, WSR-23dB
17Simulation Results
- Robustness of Data Embedding Scheme to Additive
White Gaussian Noise - Frame Size 1024 Samples, WSR-23dB
18Simulation Results
- Robustness of Informed Parameter derived from
Psychoacoustic Model
19Discussion
- Good robustness to MPEG-1 encoding and AGWN
attacks is observed for the scheme. It is found
possible to embed a watermark signal of 3.6
bits/s which survived without detection error,
perceptual filtering down to 96kbits/s. - The advantage of the adaptive embedder is that
the watermark energy is spread non-uniformly
over the perceptually significant values. Hence
it should be possible to sustain a larger WSR
without perception than a non-adaptive scheme.
On the other hand, for good performance, the
decoder must be able to reliably extract the
informed parameter from the watermarked signal. - The results indicate that the adaptive scheme
outperforms the non-adaptive scheme on a frame
size of 1024 samples, but on a frame size of 512
samples, the non-adaptive scheme is better. This
can be attributed to a loss of accuracy in the
psychoacoustic model at this frame size. - In future work it should be possible to increase
the embedding bit rate by embedding more than one
bit in each critical band.
20Dither Quantisation
- The chaotic map is digitised to m levels and
centred at mean0. Hence, it takes integer
values - It can be embedded in using an m-ary dither
quantisation.
Dither Source Scaled Chaotic Map
Quantiser
Quantiser Step Size D
D/m
D