Feature-Based Steganalysis for JPEG images and its applications for future design of steganographic schemes. - Jessica Fridrich - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Feature-Based Steganalysis for JPEG images and its applications for future design of steganographic schemes. - Jessica Fridrich

Description:

Cropping by 4 pixels is important because 8 x 8 grid of recompression does not ... We can think the cropped/recompressed image as an approximation to the cover ... – PowerPoint PPT presentation

Number of Views:155

Avg rating:3.0/5.0

Slides: 24

Provided by: Prav4

Category:

more less

Transcript and Presenter's Notes

Title: Feature-Based Steganalysis for JPEG images and its applications for future design of steganographic schemes. - Jessica Fridrich

1
Feature-Based Steganalysis for JPEG images and
its applications for future design of
steganographic schemes.
- Jessica Fridrich

Submitted by
Praveena Gummadi
Vandana Vasudeva

2
Contents

Abstract
Previous work
Proposed Research
Experimental results
Conclusion
Comments
Acknowledgements

3
Abstract

The goal of forensic steganalysis is to detect
the presence of embedded data and to eventually
extract the secret message. In the given paper a
new feature based steganalytic method for JPEG
images was introduced. The detection method is a
linear classifier trained on feature vectors
corresponding to cover and stego images. The
features are calculated as an L1 norm of
difference between a specific macroscopic
functional calculated from stego image and same
functional obtained from a decompressed, cropped,
and recompressed stego image. The functionals are
built from marginal and joint statistics of DCT
coefficients. Because the features are calculated
directly from DCT coefficients, conclusions can
be drawn about the impact of embedding
modifications on detectability.
Three different steganographic examples are
tested and compared. The experimental results
reveal new facts about current steganographic
methods for JPEGs and new design principles for
more secure JPEG steganography.

4
Previous work on Steganalytic methods

Chi-square attack by Westfield original
version could detect sequentially embedded
message and was based on first order statistics.
Based on distinguishing statistic steganalyst
first inspects the embedding algorithm and then
identifies a quantity (distinguishing statistics)
that changes predictably with the length of
embedded message. For JPEG images the calibration
is done by decompressing the stego image,
cropping up a few pixels in each direction and
recompressing using same quantisation table. The
DS calculated from this image is used as an
estimate for the same quantity from the cover
image. Using this calibration a highly accurate
and reliable estimation of the embedded message
length can be constructed for many schemes.
Blind Classifiers by Memon and Farid A blind
detector learns what a typical unmodified image
looks like in a multidimensional feature space
and a classifier is then trained to learn the
difference between cover and stego image
features.
Introduction of blind detectors prompted further
research in steganography and Tzscoppe
constructed a JPEG steganographic scheme HPDM
(histogram preserving data mapping) which was
undetectable using Farids scheme but is easily
detectable using single scalar feature-calibrated
spatial blockiness in DCT domain rather than from
a wavelet decomposition.

5
Proposed Research

The paper combined the concept of calibration
with the feature based classification to devise a
blind detector specific to JPEG images.
Calibrated Features
Two types of features were used in analysis
first order features second order features. All
features were constructed in the following
manner.
A vector functional F is applied to the stego
JPEG image J1. This functional could be global
DCT coefficient histogram, a cocurrence matrix,
spatial blockiness. The stego image J1 is
decompressed to the spatial domain, cropped by 4
pixels in each direction and then recompressed
with the same quantisation as J1 to obtain J2.
The same vector functional F is then applied to
J2. The final feature f is obtained as an L1 norm
of the difference
f
F(J1) F(J2)

6
(No Transcript)
7

Basic logic behind this choice for features is
the following
The cropping and recompression produces a
calibrated image with most macroscopic features
similar to the original cover image because the
cropped stego image is perceptually similar to
the cover image and thus its DCT coefficients
have approx the same statistical properties as
the cover image.
Cropping by 4 pixels is important because 8 x 8
grid of recompression does not see the previous
JPEG compression and thus the obtained DCT
coefficients are not influenced by previous
quantisation in DCT domain.
We can think the cropped/recompressed image as
an approximation to the cover image or as side
information.

First order Features
The simplest first order statistic of DCT
coefficients is their histogram. Suppose the
stego JPEG file is represented with a DCT
coefficient array dk(i,j) and quatisation matrix
Q(i,j) and global histogram of all 64k DCT
coefficients will be denoted as Hr where
rL,..,R Lmin k, i, j dk(i,j) and Rmax k, i,
j dk(i,j)
For a fixed DCT mode (i,j) let hr ij ,denote the
individual histogram of values dk(i,j). To
provide additional first order macroscopic
statistics to our set of functionals, we use dual
histogram given as
Where delta(u,v) 1 if uv and 0 otherwise.
The above g value is the number of times the
value d occurs as (i,j)-th DCT coefficient over
all total B blocks in JPEG image.

9
Second order Features

If the corresponding DCT coefficients from
different blocks were independent then any
embedding scheme that preserves the first order
statistics the histogram would be undetectable
by Cachins definition of steganographic
security. Thus we use the features that capture
inter-block dependencies as they would be likely
violated by most steganographic algorithms.
Let Ir and Ic denote the vectors of block indices
while scanning the image by rows and columns
resp. The first functional capturing inter-block
dependencies is the variation V defined as
Embedding changes also increase the
discontinuities along the 8 x 8 block boundaries,
thus two blockiness measures Ba , a1,2are
included to our set of functionals. The
blockiness is calculated from decompressed JPEG
image and represents an integral measure of
inter-block dependency over all DCT modes over
the whole image.

In the expression above M and N are image
dimensions, x ij are grayscale values of the
decompressed JPEG image.
The final three functionals are calculated from
the co-occurrence matrix C of neighboring DCT
coefficients which is a DxD matrix, DR-L1 and
matrix C describes the probability distribution
of pairs of neighboring DCT coefficients and is
defined as
Let C(J1) and C(J2) b e the co-occurrence
matrices for JPEG images J1 and J2 resp. Due to
approx symmetry of Cst around (s, t)(0,0), the
differences Cst(J1) Cst(J2) for (s, t)
belonging to (0,1),(1,0),(-1,0),(0,-1) are
strongly positively correlated and same is true
for the group (s, t) belonging to
(1,1),(-1,1),(1,-1),(-1,-1).

The co-occurrence matrix for the embedded image
can be obtained as a convolution CP (q), where P
is the probability distribution of the embedding
distortion which depend on the relative message
length, and values of CP (q) will spread out and
following three quantities were taken as our
features

The final set of 23 functionals used in this
paper is summarized as in table below

13
Experiment

The paper used the Greenspun image database
consisting of 1814 images of size 780x540. All
these images were converted to grayscale, the
black border frame was cropped away and images
were compressed using 80quality JPEG. The paper
selected three different steganographic
algorithms namely F5 algorithm, Outguess 0.2, and
Model based Steganography without and with
deblocking MB1 and MB2 for JPEG images.
Each technique was analyzed separately. For a
fixed relative message length expressed in bits
per non-zero DCT coefficients of the cover image,
a training database of embedded image was
created. The Fisher Linear Discriminant
Classifier was trained on 1314 cover and stego
images. The generalized Eigen vector obtained
form this training was then used to calculate the
ROC curve for the remaining 500 cover and stego
images. The detection performance was evaluated
using detection reliability P defined as
P
2A-1,
Where A is the area under ROC (Receiver
Operating Characteristic Curve) also called as
accuracy. The accuracy was scaled to obtain P 1
for a perfect detection and P 0 when ROC
coincides with diagonal line (where reliability
of detection is 0). The detection reliability of
all the three methods is shown in table 2 as

Table 2.
Detection reliability p for F5 algorithm with
matrix embedding (1,k,2k -1), F5 turned off
matrix embedding, Outguess 0.2, Model based
steganography without and with deblocking (MB1
andMB2) for different embedded rates.

15

Figure 1.Capacity for the tested techniques
expressed in bits per non-zero DCT coefficients.

16

Figure 2. ROC curves for embedding
capacities and methods from table 2.
17

Table 3.Detection reliability for individual
features for all three embedding algorithms for
fully embedded images.
18
Conclusion

From table 2 we can see that Outguess algorithm
is the most detectable and also it provides the
smallest capacity. The detection reliability is
relatively high even for embedding rates as small
as 0.05 bpc and this method becomes highly
detectable for messages above 0.1 bpc.
F5 algorithm performs better than outguess on
turning off the matrix embedding since matrix
embedding decreases the detectability of short
messages as it improves the efficiency.
From table 3 it can be seen that both MB1 and MB2
methods clearly have the best performance of all
three tested algorithms. MB1 preserves not only
the global histogram but all marginal statistics
(histograms) for each individual DCT mode. MB2
algorithm has same embedded mechanism as MB1 but
reserves one half of the capacity for
modifications that bring the blockiness of the
stego image to its original value.

19
Comments

Looking at the results in table 1 and table 2
there is no doubt that the model based
Steganography MB1 and MB2 is by far the most
secure method out of three tested paradigms. MB1
and MB2 not only preserve the global histogram
but also all histograms of individual DCT
coefficients and hence all dual histograms are
also preserved. MB2 also preserves one second
order functional, L1 blockiness. Thus, we
conclude that the more statistical measures an
embedding method preserves, the more difficult it
is to detect it.
One surprising fact revealed is that preserving a
specific functional does not mean that the
calibrated feature will be preserved. Preserving
the blockiness along the original 8x8 grid does
not mean that the blockiness along the shifted
grid will also be preserved . This is because the
embedding and deblocking changes are likely to
introduce distortion into the middle of the
blocks and thus disturb the blockiness feature,
which is the difference between the blockiness
along the solid and dashed lines as seen in the
figure 3 below

20
Figure 3
21

Its further pointed out that the features derived
from the co-occurrence matrix are very
influential for all three schemes esp. for Model
based steganographic methods.
MB2 method is the currently the only JPEG
steganographic method that takes into account the
inter-blocking dependencies between the DCT
coefficients which is the probability
distribution of coefficient pairs from
neighboring blocks.

22
Acknowledgements

Information Hiding 6th International Workshop,
IH 2004, Toronto, Canada, May 23-25 2004, Revised
Selected Papers
By Jessica Fridrich
Published by Springer, 2004
Determining the stego algorithm for JPEG
imagesPevny, T. Fridrich, J. Dept. of
Computer Science, State Univ. of New York
Binghamton, NY