Title: LeastSignificant Bit Steganography and Steganalysis
1Least-Significant Bit Steganography and
Steganalysis
2What is Steganography?
- Steganography is the science of embedding
communications into other un-assuming cover
data. - A subfield of data-hiding
- Cryptography is used to prevent people from
understanding secret communications - steganography is used to prevent people from
knowing the secret communication even exists!
3And Steganalysis?
- Steganalysis is the counter-measure against
steganography. - Attempts to analyze a data stream to determine
whether or not it contains hidden messages. - More ambitiously, can attempt to actually recover
the hidden message. - Frequently, just detecting the presence of a
hidden message is sufficient.
4Some historical examples
5Tattoo messages in Ancient Greece
- Herodotus reports that messages were tattooed
onto the shaved heads of slaves. Once the hair
grew back, the slaves were sent to the recipient,
with the message hidden in plain sight.
6DeCSS
- When the DVD copy-protection circumvention
program DeCSS was declared illegal, hackers used
clever (and frequently ironic) steganographic
techniques to continue spreading the program.
Scan of the preliminary injunction issued against
DeCSS, with the program embedded in the images
color palette.
Image of MPAA president, Jack Valenti. The bane
of his existence is embedded in his face.
7Quick overview of Information Theory in
Steganography
8Terminology
Cover object
Lorem ipsum dolor sit amet, consectetuer
adipiscing elit. Nam id est at ante mattis
placerat. Aliquam erat
Stego-object
Payload (secret message)
9Known interference
- As with other forms of data hiding, the cover
object can be viewed as channel interference
known to sender (but usually not the receiver). - With the interference known to the encoder, some
of the available resources can be used to cancel
the interference. - But this wastes resources and reduces the maximum
communication rate.
10Dirty Paper
- In his 1983 paper, Max Costa likened the
situation to writing on dirty paper. - Instead of wasting resources trying to avoid
(cancel) the dirty spots, they can be
incorporated into the communications. - Costa showed that the capacity is the same as if
the interference was not there.
11Wet Paper
- Jessica Fridrich et al expanded this metaphor to
writing on wet paper in their 2005 paper. - Their method accommodates the issue of
perceptibility of the hidden message in the
stego-object - The wet spots are locations in the cover object
that cant be changed. - In an image, for instance, changes made to
certain pixels may cause greater visual
distortion than other pixels.
12Some Steganography Techniques
13Basic premise for Steganography
- Any time we have a choice, we have an opportunity
to encode data.
14Palette Images
- Palette images define a list (or palette) of
all the colors used in the image. - Pixels are encoded as indices into the palette,
instead of the actual RGB values. - We have complete freedom to arrange the palette
however we want, without changing the actual
image. - For a 256 color palette, there are 256! gt 8e506
possible orderings - Equivalent to 1,684 bits (210 bytes) of
information embedded without any visual change to
the image.
15LSB Overwriting
- For many types of data (like non-palette images),
values can be altered slightly without much
perceptible change. - Overwriting the least significant bit (LSB) of
all or some of the bytes in this type of data is
an effective way to embed a message.
16LSB Parity Encoding
- A variation of LSB overwriting is to convey
message bits in the LSB parity of a group of
bytes. - Freedom to alter any byte in the group in order
to set the parity - Allows the steganographer to be more
discriminating about what data is changed to
minimize the perceptibility (like wet-paper
coding). - Also causes the disturbance to the cover image to
be more randomly distributed. Generally harder to
detect than periodic disturbances.
17How to detect hidden messages
18The stupid way
- One way to know whether or not a data set
contains hidden information is to learn every
steganographic algorithm available, and check the
data against each one. - Even if you had all the time in the world to try
to pull this off, messages are generally
compressed and/or encrypted before being
embedded. - When you use a given algorithm to extract a
hidden message, it will look like random bits.
How do you know if its a message or just garbage?
19The smarter way
- If a class of objects can be shown to share a
particular set of characteristics, and these
characteristics change after a message is
embedded into the object, then these
characteristics form the basis for a
stego-analytical investigation of objects in this
class.
20Stupid example
- A particular class of images is composed of all
those images that are a single solid color. - After a message is embedded into such an image,
it will no longer be a solid color.
21LSB Steganalysis in natural images
22Why natural images?
- Natural images are images of things that exist in
the real world landscapes, people, food, - Digital photos
- Natural choice for hiding messages because the
high level of non-uniform detail makes subtle
changes difficult to perceive.
23Dumitrescu, et al
- Technique to estimate the length of messages
hidden natural images. - Divides the image into pairs of adjacent pixels.
- Puts each pair into one of four mutually
exclusive primary sets. - The authors propose some assumptions about
natural images that allow them to establish some
properties regarding the size of the sets. - Natural images are presumed to be isotropic the
gradient in any direction is positive or negative
with equal probability. - The sign of the gradient (of a pixel-pair) is
independent of whether the second pixel in the
pair is odd or even.
24Dumitrescu, et al (cont)Initial Sets
- P is the set of all pixel pairs (u,v)
- A pair (u,v) is even or odd based on whether v is
even or odd (respectively). - Gradient of the pair is just u-v
- X all even pairs that have a negative gradient,
and all odd pairs with positive gradient. - Y opposite of X.
- Z all pairs with 0 gradient.
- P X U Y U Z
25Dumitrescu, et al (cont)Primary Sets
- Y is subdivided into V and W
- W is the set of all the pairs from Y which have a
gradient of /- 1 - V is everything else from Y
- The four primary sets are X, Z, V, and W
- P X U Z U V U W
- All primary sets are mutually exclusive
26Dumitrescu, et al (cont)Primary Sets
- The primary sets can be expressed as the
following patterns of bit strings - X (Q0,QN0), (Q1,QN0), (QN0, Q1),
(QN1, Q1) - V (QN0,Q0), (QN1,Q0) (Q0, QN1),
(Q1,QN1) - W (Q1,Q0), (Q0,Q1)
- Z (Q0,Q0), (Q1,Q1)
- Q is any string of (n-1) bits (for n-bit pixels),
consistent in each pixel pair. - N is any integer value such that (QN) gt Q
- 0 and 1 are the least significant bits
27Dumitrescu, et al (cont)Primary Set Migration
- Under LSB manipulation, each pair will undergo
one of 4 possible mutations - Both pixels changed
- Neither pixels changed
- One or the other changed (two possible cases)
- Represent these as a pair of bits, where a 1
means the corresponding pixel changed.
28Dumitrescu, et al (cont)Primary Set Migration
- Using the bit pattern definitions for each of the
4 primary sets, it is easy (but tedious) to see
where pixel pairs in each set will end up under
all possible mutation pattern.
29Dumitrescu, et al (cont)Primary Set Migration
30Dumitrescu, et al (cont)Effects of embedding on
cardinality
- Using these migration patterns, we can generate
expressions for the size of each set after
embedding, in terms of the sizes before embedding
and the probability of each kind of mutation.
31Dumitrescu, et al (cont)Effects of embedding on
cardinality
- For example, the set X after embedding will be
composed of pixels originally in X that underwent
a 00 or 10 mutation, and all the pixels
originally in V that underwent a 11 or 01
mutation. - So the size of X after embedding is given by
- X X(P(00)P(10)) V(P(11)P(01))
32Dumitrescu, et al (cont)Determining the length
of message
- Using the two assumptions about natural images,
it is easy to show that X Y for un-altered
images (no embedded message). - One further assumption is that the altered pixels
are randomly distributed through the image. - This allows us to express the probability of each
kind of mutation in terms of p, the ratio of
image pixels to message bits. - Some simple arithmetic can now be used to find a
quadratic expression for p, based on the
equations for the sizes of each primary set after
embedding - 0.5(WZ)p2 (2X-P)p Y - X 0
33Dumitrescu, et al (cont)Determining the length
of message
- (WZ)p2 (2X-P)p Y - X 0
- In most cases, this should yield two values for
p. - The actual length of the embedded message will
given by the smaller of the two values.
34Dumitrescu, et al (cont)back in reality
- This is a really clever idea using simple
measurements and calculations. - The assumptions about natural images are not
perfectly accurate - A test batch of assorted natural images yielded
believable false positives. - p values as high as 5, which gives a message
length of 30kB. In fact, there was no message in
image.
35Dumitrescu, et al (cont)back in reality
- Additionally, the assumption about the
probability of each kind of mutation is way off
for a lot of common embedding schemes. - For instance, if we embed a bit into the LSB of
every k-th pixel (for kgt1), then the probability
of both pixels in the pair being altered (i.e.,
P(11)) is 0. - Same for parity encoding where the group size is
an even number (so no pixel pair spans two
groups). - The probability of each kind of alteration is
much more complex then given in the paper, which
changes the expressions for cardinality after
embedding.
36Conclusions
- Steganography is really cool
- Its fun to play with an basic embedding schemes
are easy to implement but fairly effective. - Obviously has a lot of good and bad applications,
as with an technology. - Steganalysis is still playing catch up
- Much like with cryptanalysis, early approaches
were brute-force and clumsy. - New approaches involving statistical
classification are much more promising, but still
have a ways to go.
37The end
This image of the Mona Lisa has been embedded
into the Stego-saurus background with lsb
parity encoding, with a groups size of 10 pixels.
Dinosaur image thanks to stegosaurus.org