Cache-Collision Timing Attacks Against AES - PowerPoint PPT Presentation

About This Presentation

Title:

Cache-Collision Timing Attacks Against AES

Description:

Cache-Collision Timing Attacks Against AES Joseph Bonneau Stanford University jbonneau_at_stanford.edu Ilya Mironov Microsoft Research mironov_at_microsoft.com – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 57

Provided by: JosephC55

Category:

more less

Transcript and Presenter's Notes

Title: Cache-Collision Timing Attacks Against AES

1
Cache-Collision Timing Attacks Against AES

Joseph Bonneau
Stanford University
jbonneau_at_stanford.edu

Ilya Mironov Microsoft Research mironov_at_microsoft.
com
2
Side Channel Cryptanalysis

Definition Any attack on a cryptosystem using
information leaked given off as a byproduct of
the physical implementation of the cryptosystem,
rather than a theoretical weakness.
Exploitable side-channels
Power usage
Cache accesses
Noise
Heat
Time

3
Brief History of Timing Attacks

Timing attacks consider variability in the time
taken to perform an encryption due to secret
data.
Paul Kocher demonstrated timing attacks against
Diffie-Hellman, RSA, DSS, etc. at CRYPTO 96
Dan Boneh, David Brumley demonstrate first remote
timing attack against RSA in 2003
Public Key systems are vulnerable due to their
use of lengthy mathematical operations

4
Brief History of Timing Attacks

During AES competition, timing attacks were only
believed to be possible against branch statements
or data-dependent rotations.
Rijndael has a mathematical formulation in the
field GF(28)
Optimized Rijndael implementation in software use
only table lookup, shift, and exclusive-or
operations
NIST declared Rijndael not vulnerable to timing
attacks in it final evaluation in 2000, Rijndael
wins competition.

5
Brief History of Timing Attacks

Daniel Bernstein announces successful timing
attacks against AES in April 2005, exploiting
timing characteristics of table lookups
Osvik, Shamir, Tromer, follow up in November 2005
with very powerful attacks, requiring direct
observation of cache before and after encryption

6
Implementation details of AES, part I

The textbook description of an AES round as a
function from (Xi, Ki) ? Xi1

7
Implementation details of AES, part I

The actual round computation in software, as
proposed with Rijndael and now widely used
All three operations are combined into
pre-computed tables. A round of encryption
requires just 16 table lookups, 16 xors, and 12
shifts.

8
Bernsteins timing attack
Notice that for the first round, the table lookup
indices are each related to only one key byte and
one plaintext byte Remarkably, the entire
encryption time will be affected by just the
value of
9
Bernsteins timing attack
To prepare for the attack, collect a large body
of reference timing data for each
10
Bernsteins timing attack
Next, collect a large body of timing data from a
target machine for the plaintext byte
11
Bernsteins timing attack
The target machines timing data should be
shifted from the reference data by exactly
12
Bernsteins timing attack
The target machines timing data should be
shifted from the reference data by exactly
13
Bernsteins timing attack

Problems
The reference machine must be identical to the
target
Requires known plaintext as well as timing data
Plaintexts must be sufficiently random
High number of samples required, best case as
reported by Bernstein is around 227.5

14
Bernsteins timing attack

Overall, a very general statistical method to
constructing a timing attack.
Getting code to run in constant time on a machine
with cache is very difficult, meaning most
cryptosystems are theoretically vulnerable.
Bernsteins attack doesnt exploit any specific
features of Rijndael, yet the attack does not
seem to work against other AES finalists
(Serpent, Twofish)

15
Cache-collision timing attacks
What is Rijndaels weakness?
16
Cache-collision timing attacks

What is Rijndaels weakness?
Heavy use of table lookups which dominate the
running time
Table lookup indices are easily related to single
plaintext and key bytes

17
Cache collisions

Rijndael is just a sequence of table lookups.

Tx Tx Txi Tx Tx Txj Tx

18
Cache collisions

Rijndael is just a sequence of table lookups.
What happens when xi xj?

Tx Tx Txi Tx Tx Txj Tx

19
Cache collisions

Rijndael is just a sequence of table lookups.
What happens when xi xj?
The access to xj will hit in cache.

Tx Tx Txi Tx Tx Txj Tx

20
Cache collisions

Rijndael is just a sequence of table lookups.
What happens when xi xj?
The access to xj will hit in cache.
What happens when xi? xj?

Tx Tx Txi Tx Tx Txj Tx

21
Cache collisions

Rijndael is just a sequence of table lookups.
What happens when xi xj?
The access to xj will hit in cache.
What happens when xi? xj?
The access to xj may or may not hit in cache,
depending on the rest of the sequence and the
prior cache contents.

Tx Tx Txi Tx Tx Txj Tx

22
Cache collisions
A cache-collision occurs when we know that xi
xj. For a large number of samples, the average
encryption time will be lower when xi xj than
when xi? xj. This is all we need to build an
attack.
Tx Tx Txi Tx Tx Txj Tx

23
Cache collisions
Actual Results, Pentium III
24
First Round Attack
Pick two lookups in the first round of
encryption
25
First Round Attack
Pick two lookups in the first round of
encryption
Solve for the collision constraint
26
First Round Attack
Result A working attack! There is an easily
identifiable low average encryption time whenever
27
First Round Attack
Result A working attack! There is an easily
identifiable low average encryption time
whenever However, there are some complications
28
Complication 1 Table families
Notice four separate tables are used
Each family of four bytes is isolated.
29
Complication 2 Cache Lines
Modern memory is cached in lines.
30
Complication 2 Cache Lines
Modern memory is cached in lines.
Table Lookup
31
Complication 2 Cache Lines
Modern memory is cached in lines.
Table Lookup
Cache
32
Complication 2 Cache Lines
So, we can only tell if two lookups hit the same
line in memory, not if they are identical. We
denote Most CPUs use 32 or 64 byte cache
lines. With 4 byte table entries, this means we
are forced to ignore the 3 or 4 low-order bits.
33
First Round Attack The bad news
We gain a set of equations in each family, such
as This leaves 68 or 80 bits of key to
search. This limitation was also problematic
for Osvik et al. Their solution examine the
second round as well. This can fix some of the
problems but is difficult for timing attacks (see
paper).
34
First Round Attack The good news

Cache-collisions are a strong method.
The timing variability is much better than the
random effects previously used.
The attack requires 215 samples, compared to
227.5.
Can we recover the full key with this efficiency?

35
Implementation details of AES, part II
The final round of encryption is special
round 1
round 2

special!
round 8
round 9
round 10
36
Implementation details of AES, part II

The final round of encryption is special
No MixColumns operation is performed, as it would
add no additional security
In software, this requires a new table to be used
only for the final round. This table is just the
S-box

37
Implementation details of AES, part II

The final round also uses expanded key bytes
However, the AES key schedule is invertible.
Finding the final 16 bytes is equivalent to
finding the raw key. This design was intentional.

38
Final Round Attack

Again, we consider a cache-collision for two
bytes
When do these bytes collide in the table?

39
Final Round Attack

We want to solve for
40
Final Round Attack

We want to solve for We
assume that
41
Final Round Attack

We want to solve for We
assume that ,
leaving
42
Final Round Attack

We want to solve for We
assume that ,
leaving
43
Final Round Attack

So, guarantees a
collision What happens if
?
44
Final Round Attack

So, guarantees a
collision What happens if
? We get a fixed offset
45
Final Round Attack

So, guarantees a
collision What happens if
? We get a fixed offset Surprise the
non-linearity of the S-box enables the attack to
succeed.
46
Final Round Attack

Why does this happen? Because a, ß, are the
result S-box lookups, a fixed offset does not
mean anything about the indices used to look them
up. A small offset ? 1 does not mean a
collision on the same cache line. Thus, the
cache-line issue is gone.
47
Final Round Attack

Collect timing data, compute average time for
each value of for all i, j. Low
times will occur at the values
Attack data produces likelihood estimate for
different values for each ki, kj.
Need to find k0,,k15 minimizing the global cost
function ?ij Cij(ki, kj)
Use standard AI algorithms (Local Optimization,
Belief Propagation).

48
Final Round Attack Results
CPU L2 cache eviction L1 cache eviction
Pentium III 215 216
Pentium IV 216 219.9
UltraSPARC-III 215 218.7

Huge improvement over the original 227.5.
Offline complexity is low, attack takes
seconds. This can be increased to further lower
number of samples required.

49
Expanded Final Round Attack
CPU L2 cache eviction L1 cache eviction
Pentium III 213 214
Pentium IV 213.6 218.6
UltraSPARC-III 214.3 217.3

Produce cost estimate for specific values of key
bytes, instead of simply their difference
Require more time, memory by attacker, but attack
still finishes in 10 minutes

50
Final Round Attack Results

Bonuses from attacking the final round
Attack requires only ciphertext and timing.
Related plaintexts produce essentially random
cipher state by the 9th round.
Attack is oblivious to the target platform
Attack works well against decryption

51
Final Round Attack Results

The attack should be widely applicable
Most CPUs use similar cache structure
Most standard crypto libraries use the original
Rijndael implementation of AES. Attacks are
implemented against OpenSSL.

52
Final Round Attack Complications

The attacks assume the AES tables are out of
cache before encryption. This means a target
machine must be made to do some unrelated work in
between encryptions.
Recent CPUs (ie Pentium IV) are more complicated
than the modelhardware prefetch, out-of-order
execution, etc.
Larger cache line sizes are also a problem.

53
Countermeasures

Solutions requiring special hardware support are
probably not practical
Cannot guarantee the encryption will take
constant time without crippling performance.
It is possible to greatly increase resistance of
the common AES implementation to final round
attacks with no performance penalty by
eliminating the special lookup table

54
Conclusions

AES is vulnerable to timing attacks due to its
use of table lookups. Better attacks are still
possible.
Real-world use of timing attacks is questionable,
as they require cycle-count level data, but these
attacks tolerate much more noise than before
Applications?
Process-to-Process attacks
Virtual Machines
Against a secure CPU on a multiprocessor
machine
Against a remote server- the holy grail

55
Conclusions

Table lookups into cached memory are dangerous
for cryptographic software.
Information leaked through many side channels
Time
Cache contents
Power usage
AES selection largely ignored this problem.
Runner up cipher Serpent avoids lookup tables,
but this was not seen as an advantage.

56
Thank you