Title: Data Compression and Security
1Data Compression and Security
- Chapter 20,
- Exploring the Digital Domain
2Digital Compression Concepts
- Compression techniques are used to replace a file
with another that is smaller - Decompression techniques expands the compressed
file to recover original data -- exactly or in
facsimile - A pair of compression/decompression techniques
that work together is called a codec for short
3Types of Codecs
- Codecs that upon decompression always reproduce
the original file exactly are called lossless
codecs - Codecs that reproduce only an approximation of
the original file upon decompression are called
lossy codecs - Codecs that take approximately the same amount of
time to compress and decompress a file are
referred to as symmetric codecs - By contrast, codecs that feature simple fast
decompression but significantly slower
compression are called asymmetric codecs
4Compression Encoding
- Compression is an encoding process that filters
the original file in several successive stages
5Codec Methods
- Syntactic encoding (also called entropy encoding)
methods attempt to reduce the redundancy of
symbolic patterns in a file without any regard to
the type of information represented - Semantic methods consider special properties of
the type of information represented to reduce
nonessential information in a file - Hybrid methods combine both syntactic and
semantic methods
6Compressing Text and Numerical DataLossless
Syntactic Methods
- Run-Length Encoding (RLE)
- looks for repeated sequences of symbols
- widely used for fax (facsimile) transmissions
- Huffman Codes
- exploits frequency distribution of symbols in a
source - adaptive Huffman coding builds its own frequency
tables rather than use predefined statistics - Lempel-Ziv-Welch (LZW) compression
- based on recognizing patterns of strings in
original file - fast and yields good results (50 typical
reduction)
7Compressing Images
- GIF (Graphic Interchange Format) Codec
- employs LZW method for lossless compression
- TIFF (Tagged Image File Format) Codec
- lossless syntactic method
- JPEG (Joint Photographics Experts Group) Codec
- umbrella term covering several lossy and lossless
methods - baseline method is most commonly used one --
lossy method based on a hybrid method
8Compressing Video
- Video compression employs both spatial and
temporal compression - spatial techniques compress individual frames
- temporal methods compress data in frames over
time - QuickTime and AVI (Audio Video Interleaved) are
two popular (and incompatible with each other)
formats
9Compressing VideoSome Additional Methods
- DVI (Digital Video Interactive)
- Motion-JPEG
- MPEG (Motion Pictures Experts Group)
- The px64 Standard
10Temporal Compression in Video
- Lossy strategies for eliminating redundancy of
information between frames employ temporal
compression -- referred to as interframe
compression - Sequence of frames are considered together
- key frames
- difference frames
- Used in QuickTime and DVI
11Temporal Video Compression (contd)
- MPEG and related codecs employ a more complex
frame-referencing method - intrapictures (I pictures)
- predicted pictures (P pictures)
- bidirectional pictures (B pictures)
12Compressing Audio
- A widely used method is ADPCM (Adaptive
Differential Pulse Code Modulation) - ADPCM
- lossy method
- employs a differencing technique related to those
used in video compression - used in DVI
- MP3 employs psychoacoustic methods
- filter out parts of the signal most people do not
hear - uses methods to measure the amount of
quantization that will just mask noise
13Encryption and Data Security
- Cryptography is the art and science of keeping
message secret - Encryption techniques convert data into a secret
code for transmission - The process of retrieving the original message at
the receiver is called decryption
14Encryption Keys
- Keys are essential information -- usually a
numerical parameter(s) -- needed for encryption
and/or decryption algorithms - Encryption keys are used to encode plaintext as
encoded ciphertext - Decryption keys are used to decode ciphertext and
recover the original plaintext - Decryption keys are sometimes discovered by brute
force methods employing computers to search large
potential key spaces
15Symmetric or Secret Key Ciphers
- Secret key ciphers use a single secret key (or
set of keys) for both encryption and decryption - The secret key must be transferred securely in
order for secret key methods to be secure - Data Encryption Standard (DES) is a US government
sponsored secret key cipher. DES uses a 56-bit
key. - International Data Encryption Algorithm (IDEA)
has been proposed to replace DES. It uses a
128-bit key. - Longer keys make it more difficult for brute
force discovery of the secret key
16Asymmetric or Public Key Ciphers
- The first practical public key algortihm was
published by Rivest, Shamir, and Adleman in 1976
and is know as RSA (for their last names) - Public key ciphers employ an algortihm with two
keys -- a public key and a private key - A sender looks up the recipient's public key and
uses it to encode a message - The recipient then decodes the message with his
or her private key (this private key is necessary
to decode the message)
17Asymmetric or Public Key Ciphers Illustrated
18More on Public Key Methods
- No attempt is made to keep secret the actual
encryption and decryption algorithms for public
key methods -- security depends on only the
recipient knowing his or her private key - Public key ciphers are more secure than secret
key ciphers, but are not as efficient since they
require longer keys and more computing in the
encryption and decryption processes - For sake of efficiency, sometimes secret key
encryption is used and the secret key is
communicated employing public key methods -- the
combination of a secret key encoded message and
public key encoded value of the secret key is
called a digital envelope
19Authentication
- The process used to verify the identity of a
respondent is called authentication - Authentication is very important for electronic
commerce and other network transactions - Authentication exploits the symmetry of public
and private keys - To authenticate that a person is who they say
they are - send that person a nonsense message and ask them
to encode it with their private key and return it
to you - when the message is returned, if the person is
who they claim to be, you should be able to
recover your nonsense message using their public
key (which presumably you know)
20Encryption and National Security
- An escrowed secret key cipher is a secret key
cipher in which a trusted third party controls
the secret key. - DES is an example of such a cipher, and the US
government holds the escrowed 56-bit secret key - The International Trafficking in Arms Regulation
(ITAR) prohibits the export of secret key cipher
systems with secret keys longer than 40 bits
21Encryption and National Security
- A major governments can break ciphers with
40-bit or shorter keys by brute force - Limiting longer secret key ciphers is an attempt
to retain the ability to break codes when this is
deemed necessary for national security - The ITAR law has been debated for a number of
years - Public key ciphers have complicated the debate
further -- and it continues - The basic issue is privacy versus the national
security
22Summary
- Compressing data means reducing the effective
size of a data file for storage or transmission - Particular paired compression/decompression
methods are called codecs - Codecs that cannot reproduce the original file
exactly are called lossy methods those that
reproduce the original exactly are called
lossless methods - Text and numbers usually require lossless methods
- Images, video, and sound codecs are usually lossy
23Summary (contd)
- Syntactic methods attempt to reduce the
redundancy of symbolic patterns in a file without
any regard to the type of information represented - Semantic methods exploit characteristics inherent
in the type of information being represented - The use of codecs is not an exact science -- the
effectiveness and suitability of any method will
depend on the exact nature of the original file
and the intended use for the compressed file
24Summary (contd)
- With the increasing access to and ease of
transmitting sensitive and confidential
information come significant security risks - Encryption techniques are used to encode messages
for secure transmission - The two primary encryption/decryption methods
are - secret key (symmetric key) ciphers
- public key (asymmetric key) ciphers
25Summary (contd)
- Public key ciphers are more secure, but secret
key ciphers are more efficient - Public key encryption is used for authentication
over computer networks - An active national (and international) debate
continues over government control and regulation
of encryption/decryption methods