Title: The Unseen Challenge Data Sets
1The Unseen Challenge Data Sets
- Anderson Rocha Walter Scheirer
- Siome Goldenstein Terrance Boult
2The Data Sets
- Two data sets are provided
- PNG lossless compression
- JPEG lossy compression
- Prevalence of images on the Internet
- Sources Google images, Yahoo Images, and Flickr
3Message Sizes
- For each tool, we provide four different
embedding size - Tiny lt 5 of the channel capacity
- Small gt 5 lt 15 of the channel capacity
- Medium gt 15 lt 40 of the channel capacity
- Large gt 40 of the channel capacity
- For the PNG set, the message size is explicitly
stated - For the JPEG set, the message size is NOT stated
4Message Content
- Random bit sequences
- Snippets of mp3 songs
- Plain text
- Other images
A B C
5Categories
- Each set consists of clean and stego images
- Clean set
- Modified cropping, overlay, object-appending
- Non-modified original
- Stego set
- 4 categories for JPEG, 3 categories for PNG, one
for each tool
6Categories
- JPEG subcategories
- Stego
- Animals
- Business
- Maps
- Natural
- Tourist
- Vacation
- Clean
- Misc
7Clean Manipulated Images
Object Appending
Image Cropping
Overlay
8PNG Tools
- Camaleão (http//www.ic.unicamp.br/rocha/sci/steg
o) - Simple LSB insertion/modification software
- Uses cyclic permutations and block ciphering to
hide messages in LSBs - SecurEngine
- (http//www.sharewareplaza.com/SecurEngine-dow
nload_4268.html) - Incorporates 5 crypto algorithms Blowfish, Gost,
Vernam, Cast256, and Mars - LSB encoding
9PNG Tools
- Stash-It
- (http//www.smalleranimals.com/stash.htm)
- Windows based stego tool
- Simple LSB insertion/modification software
- No encryption feature
10JPEG Tools
- F5
- (http//www.inf.tu-dresden.de/aw4)
- Resilient to ?2 statistical attack
- Instead of replacing LSBs directly, F5 decreases
the absolute value of the DCT coefficients - Chooses DCT coefficients randomly
- Matrix embedding
- JPHide
- (http//linux01.gwdg.de/alatham)
- Uses blowfish to generate a stream of
pseudo-random control bits to define bit
encodings - Large embeddings trivial to detect
11JPEG Tools
- JSteg
- (http//zooid.org/paul/crypto/jsteg)
- 40 bit RC4 Encryption
- Channel capacity determination
- LSB encoding in quantized DCT coefficients
- Outguess
- (http//www.outguess.org/detection.php)
- Preserves statistics based on frequency counts
- Seed based iterator available to choose embedding
locations - Change minimization calculation for each seed
- Remains one of the most difficult tools to detect
12PNG Data Set - Breakdown
4,000 total images in the PNG clean category
4,731 total images in the PNG stego category
13PNG Data Set - Breakdown
2,993 total images in the PNG stego category
14JPEG Data Set - Breakdown
29,185 total images in the JPEG stego category
15JPEG Data Set - Breakdown
29,185 total images in the JPEG stego category
16JPEG Data Set - Breakdown
4,596 total images in the JPEG stego category
17Sample Usage stegdetect
Detected, C correct algorithm detected
Detected, I incorrect algorithm detected
Overall false detect rate for the clean image set
is 8.6
18Sample Usage stegdetect
Overall false detect rate for the clean image set
is 8.0
19Sample Usage stegdetect
- Detailed results for JPHide Test Set
20Sample Usage stegdetect
- Conclusions
- Significant differences between the results of
training and testing - Weaker performance overall for testing
- Designed difficulty of testing set
- Stegdetect performs poorly for large embeddings
(non-intuitive), as well as small and tiny
embeddings (expected)
21The Unseen Challenge Data Sets
- Lossy (JPEG) and Lossless (PNG) imagery
- 3 tools for PNG set, 4 tools for JPEG set
- 4 distinct embedding sizes for PNG, varying sizes
for JPEG - Clean imagery across all sets
22The Unseen Challenge Data Sets
- Valid approaches for use
- Detection
- Detection and recovery (size or content)
- Detection and destruction
- Fusion
No standard data set exists for steg evaluation!
This set is a step in that direction!
23Download!
- http//www.liv.ic.unicamp.br/wvu/datasets.php