Title: Bishop, Chapter 9 Basic Cryptography
1Bishop, Chapter 9 (Basic Cryptography)
- First some terminology a cryptosystem
- encypher function converts plaintext into
cyphertext - decypher function converts cyphertext into
plaintext - of course plaintext and cyphertext are in the eye
of the beholder - Alternative terminology is encrypts and
decrypts - some cultures find this offensive or at least
creepy - most authors use these terms anyway, despite
their best intentions - The book also introduces the concept of a key
- enciphering function takes plaintext and a key as
input, produces cyphertext - decyphering function takes a cyphertext and a key
as input, produces plaintext
2Keys and Algorithms
- The notion of a key isnt really necessary to
describe a cryptosystem - an encryption function simply translates
plaintext to ciphertext (think of it as a black
box) - a decryption function simply translates
ciphertext to plaintext - There is a spectrum of transparency in
cryptosystems - a black-box system where the encryption and
decryption algorithms are totally secret - a totally transparent system where the algorithm
is completely public (and you need something
additional information like a key to prevent
unwanted readers from decrypting your
information) - the real question is what is secret and what is
public - The form of a key isnt specified either just
think of it as data of some sort (a string, some
bits, ...) each algorithm will specify what the
key is
3Ways of Attacking a Cryptosystem
- Attack means trying to
- learn the plaintext of a single message, given
some ciphertext - learn the general deciphering algorithm (or more
specifically the key, if you know what sort of
algorithm is being used) - Ciphertext-only attack
- Attacker has the ciphertext of several messages,
and knows they have been encrypted using the same
algorithm. Goal is to recover the plaintext of
as many messages as possible, or better still
deduce the keys - Known-plaintext attack
- Attacker has both the ciphertext and plaintext of
several messages - Chosen-plaintext attack
- Attacker can get the ciphertext for messages of
his choosing - Adaptive chosen plaintext
- Can make repeated requests for encryption based
what he learns about the system from previous
analysis - (Rubber-hose cryptanalysis)
4What We Will Look At
- Classical systems (pre-computer)
- transposition
- substitution
- Symmetric-key block ciphers
- DES
- AES
- Public-key ciphers
- Diffie-Hellman
- RSA
5Classical Cryptosystems
- Same key is used for both encryption and
decryption - meaning that the key must be kept secret, because
anybody who has the key has full knowledge about
the system - Two main types transposition ciphers and
substitution ciphers - transposition the cyphertext is a permutation
of the plaintext (same letters, different order) - cyphertext is plaintext in reverse order
- pairs of letters are reversed
- substitution letter or letters in the
cyphertext are substitutions for letter or
letters in the plaintext - cyphertext replaces A with B, B with C, ... Z
with A
6Transposition Ciphers
- The trick is to describe the permutation rule
concisely - Often this is done via a graphical aid, using a
rectangular grid
I
A
M
I AM VERY SECRET
V
E
R
Y
IA EYSCE MVR ERT
E
S
R
C
E
T
7Transposition Ciphers (cont.)
- To decrypt the ciphertext (knowing the grid size)
is simple just fit the ciphertext onto the
transposed grid, and read down the columns - To decrypt, all you need to know is the period,
8
I
A
E
Y
E
S
C
IA EYSCE MVR ERT
E
M
V
R
R
T
8Breaking a Transposition Cypher
- Plaintext attacks arent even worth mentioning
if you have a plaintext/ciphertext pair, you have
the key - Ciphertext-only attacks are also easy, given
naive implementations of the technique - Brute force will often do it just try many
different matrices you only have to look at the
first few characters of the resulting plaintext - There are other serious regularities left in the
ciphertext that an attacker can exploit - the first letter in the ciphertext is also the
first letter in the plaintext - based on probabilities of two-character sequences
in the English language, you can guess the second
letter I followed by a space is common, so
the period is probably either 2, 8, or 12 - To strengthen the cipher, run it through the
transposition twice (with two different matrix
sizes)
9Substitution Ciphers
- The key in a substitution cipher is a mapping
from characters in the plaintext to characters in
the ciphertext - (A, Q), (B, A), (C, X), (D, R), ... (this
should be a 11 mapping) - this is equivalent to (1, 15), (2, 1), (3, 24),
(4, 13), ... - an even more concise key is a number N which
stands for the set (1, N1), (2, N2), (i,
(Ni) mod 26) - several famous examples
- the Caesar cipher (N3) (A, D), (B, E), ... (W,
Z), (X, A), (Y, B), (Z, C) - ROT13 (N13)
10Simple Substitution Cyphers
- In these simple index-based schemes, the key is
simply the number N, and to decipher you use the
same enciphering method only use the key (26-N) - to encipher a message with the Caesar cipher,
apply the substitution with N3 to decipher use
N23 - The main weakness of ciphers based on
single-letter substitutions is the powerful
regularity of letter frequencies in the English
language - In regular English text, 13 out of 100 letters is
an E, but only 2 out of 100 letters is a Q or a
Z. Therefore if you see a letter being used
frequently, you can make a strong guess as to its
identity - In computer languages the regularities are even
stronger, since keywords make up a significant
fraction of the text, and people tend to use
English words as variables. - Of course this does not crack the code by itself,
but it allows you to cut down the number of
guesses, and its remarkably accurate
11How easy is it to break a simple substitution?
- Compute the letter frequency for each letter in
the cyphertext - For each offset (1..25) compute the deviation
between the actual frequency for each letter
after applying the offset and the predicted
frequency for that letter. Choose the offset
with the lowest deviation. - Even quicker choose a letter in the ciphertext
and compute its frequency. Find the letter whose
frequency is closest. That gives you a guess for
the offset apply the inverse to the plaintext
and check.
12Polyalphabetic Substitutions
- There is an obvious relationship between the
strength of a cipher and the key length simple
substitution is so easy to break because the key
is a single number between 1 and 26, so we can
use brute force - The Vigenere Cipher uses a sequence of keys, each
between 0 and 25 - if the key is (9,13,0)
- shift the first plaintext character by 9
- shift the second plaintext character by 13
- shift the third plaintext character by 0 (not at
all) - shift the fourth plaintext character by 9
- etc
- the book uses an equivalent alphabetic key (IMA
for our example), but notice in Figure 9-3 that
this is simply describing 26 shift indexes
13A Quick Example
- This is the result of a quick Perl implementation
- Plaintext
- THISISASHORTBUTSWEETPIECEOFSECRETPLAINTEXT
- Key
- LUCY
- Cyphertext
- EBKQTMCQSITRMOVQHYGRACGAPIHQPWTCEJNYTHVCIN
14Breaking the Cypher
- The key insight is that if you know the length of
the key (four in our case), you can split the
full text into four separate single-character
substitution cyphers, and use the
letter-frequency trick to infer each letter - LUCY
- EBKQ
- TMCQ
- SITR
- MOVQ
- HYGR
- ACGA
- PIHQ
- PWTC
- EJNY
- THVC
- IN
15Finding the Key Length The Index of Coincidence
- The relative frequencies of English-language
letters has the following interesting effect - Suppose you took a piece of English-language text
offset a copy of the text by a certain amount,
and counted the number of places the same letter
co-occurred - THISISASHORTBUTSWEETPIECEOFSECRETPLAINTEXT
- RETPLAINTEXTTHISISASHORTBUTSWEETPIECEOFSEC
- X X X
- There are three duplicates per 42 letters, or
7.1 - This is slightly above the statistical average of
6 - Now suppose instead we took randomly generated
letters and offset it the same amount - LQDLUNKUJWRSNRBHJWZMYLWGFERHPGFFBCXGAWYQRH
- FBCXGAWYQRRLQDLUNKUJWRSNRBHJWZMYLWGFERHPGF
- X
- There is one duplicate per 42 letters, or 2.3
- This is slightly below the statistical average of
3.8
16Index of Coincidence (cont.)
- The point is
- duplicates are much more likely in a plaintext
string than in a random text string, due to
letter-frequency regularities - and the same applies to single-character
substitutions, because the letter frequencies are
the same as in plaintext - in other words, the index of coincidence in a
piece of text encrypted using a single-character
substitution would be exactly the same as the
plaintext (same number of coincidences, but
different letters would match) - What's the significance to the Vigenere cypher?
17IC and the Vigenere Cypher
- If we line up the text at the correct period (the
key length), we are essentially counting four
single-character substitutions, and the IC will
be high - Alternatively, if we line up the text at the
incorrect period, we are essentially counting a
piece of random text - So IC values at the key length or multiples of
the key length will tend to have large values
other values will tend to be small
18IC Values for the Example
This is 1 EBKQTMCQSITRMOVQHYGRACGAPIHQPWTCEJNYTHVC
IN NEBKQTMCQSITRMOVQHYGRACGAPIHQPWTCEJNYTHVCI Thi
s is 2 EBKQTMCQSITRMOVQHYGRACGAPIHQPWTCEJNYTHVCIN
INEBKQTMCQSITRMOVQHYGRACGAPIHQPWTCEJNYTHVC This
is 4 EBKQTMCQSITRMOVQHYGRACGAPIHQPWTCEJNYTHVCIN VC
INEBKQTMCQSITRMOVQHYGRACGAPIHQPWTCEJNYTH x
x x This is 7 EBKQTMCQSITRMOVQHY
GRACGAPIHQPWTCEJNYTHVCIN YTHVCINEBKQTMCQSITRMOVQHY
GRACGAPIHQPWTCEJN x
x This is 8 EBKQTMCQSITRMOVQHYGRACGAPIHQ
PWTCEJNYTHVCIN NYTHVCINEBKQTMCQSITRMOVQHYGRACGAPIH
QPWTCEJ x x x
19Using the (Suspected) Key Length
- Now we can work on four different
single-character cyphers. - If we were correct about the key length, each of
these came from a single-character substitution.
So what do we do next?
1 gt ETSMHAPPETI 2 gt BMIOYCIWJHN 3 gt
KCTVGGHTNV 4 gt QQRQRAQCYC
20Using the (Suspected) Key Length
- Here is output for the minimum-variance
calculation applied to the first column - This isn't quite right what's the right answer?
Are we close? What went wrong?
ETSMHAPPETI Variance for 0 is 0.00251999141767324
Variance for 11 is 0.00311439701207883 Variance
for 15 is 0.00314936204704387
21Cracking the Cypher (cont.)
Cyphertext is BMIOYCIWJHN Variance for 6 is
0.00166811602034329 Variance for 5 is
0.00250727685950413 Variance for 21 is
0.00285692720915448
Cyphertext is KCTVGGHTNV Variance for 24 is
0.00276665384615385 Variance for 11 is
0.00422819230769231 Variance for 20 is
0.00438203846153846
Cyphertext is QQRQRAQCYC Variance for 2 is
0.00645896153846154 Variance for 14 is
0.00735126923076923 Variance for 17 is
0.00765126923076923
22And finally
- Of course a little human nature doesn't hurt....
substituting our best guesses yields the
following plaintext - EHIS TS A SSORT MUT SHEETAIECP OF SPCREE
PLATNTEIT - Or alternatively, our current candidate for a key
is AUCY - Put these two pieces of information together, you
see that the first character is probably wrong,
and if you know your victim .....
23Final Notes on Polyalphabetic Ciphers
- Two factors are crucial to making the cipher
difficult to break - length of the key (longer is better, 1 is very
easy) - lack of predictability in the key
- infrequent use (so not vulnerable to a
known-plaintext or chosen-plaintext attack) - Carrying these factors to the extreme we get the
"one-time pad" - key is at least as long as the plaintext, so
there are no cycles - key is chosen truly randomly
- key is used only once
24Symmetric-Key Block Ciphers
- Symmetric key same key is used both for
encryption and decryption - Block cipher encrypts fixed-sized blocks
(typically 128 bits or 16 bytes) - efficiency and hardware implementations are the
main reasons for this - In effect the block cipher just permutes its
input bits (with the key describing the
permutation). The two goals are confusion and
diffusion - displace original positions of the input bits
- distribute the input bits onto the output bits
uniformly - this is called an oblique cipher
- Most block ciphers consist of multiple rounds of
a weak cipher
25Basic Structure of DES
- Initial shuffle of the plaintext (not significant
cryptographically, but in the standard) - Sixteen rounds (64-bit to 64-bit transformation)
- Each round (i)
- Select 48 bits from the 56 bit key (Ki)
- Split the input into two halves (L, R)
- Transform R using Ki (more on this later)
- XOR R back with L
- feed this into the next round
- Final inverse shuffle of the ciphertext
26DES Single-Round Diagram
R i-1
L i-1
F i
K i
32 bits
Expand
S i
Shuffle
48 bits
32 bits
L i
R i
27Interesting Properties of the Algorithm
- Implementation aspects
- code and S tables implemented in hardware, even
in the 70s (need for fixed block and key length) - same basic algorithm for decryption (run the same
algorithm, but reverse the round keys) - Functions of the various pieces
- splitting L and R (Feistel structure) diffuses
the input bits - XOR ensures that the key is mixed with the data
- S-boxes provide nonlinearity (no mathematical
regularity to the transformation) - S-box, expand, and shuffle mean that small
changes in the input bits can still cause a big
difference in the output - Main complaint at this point key length is too
small - triple DES run the algorithm three times (is
that enough??)
28AES (nee Rijndael)
- Adopted by NIST as the new encryption standard in
2001 - The development / selection process was
completely different than that for DES - NIST asked for proposals from the community
- Of the 15 proposals, 5 were selected as finalists
- Competition based on criteria including
- formal adequacy
- speed of encryption / decryption
- unsuccessful attacks
29Single AES Round
K
K
K
K
K
K
K
K
S
S
S
S
S
S
S
S
30Similarities and differences
- Variable key size (128 bits to 256 bits)
- Number of rounds (10 14) can depend on the key
size - Not as easily inverted
- No splitting of the input into L and R halves
- Same basic idea of mixing, swapping, XOR with
key, and S boxes