Cache-Collision Timing Attacks Against AES - PowerPoint PPT Presentation

About This Presentation
Title:

Cache-Collision Timing Attacks Against AES

Description:

Cache-Collision Timing Attacks Against AES Joseph Bonneau Stanford University jbonneau_at_stanford.edu Ilya Mironov Microsoft Research mironov_at_microsoft.com – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 57
Provided by: JosephC55
Category:

less

Transcript and Presenter's Notes

Title: Cache-Collision Timing Attacks Against AES


1
Cache-Collision Timing Attacks Against AES
  • Joseph Bonneau
  • Stanford University
  • jbonneau_at_stanford.edu

Ilya Mironov Microsoft Research mironov_at_microsoft.
com
2
Side Channel Cryptanalysis
  • Definition Any attack on a cryptosystem using
    information leaked given off as a byproduct of
    the physical implementation of the cryptosystem,
    rather than a theoretical weakness.
  • Exploitable side-channels
  • Power usage
  • Cache accesses
  • Noise
  • Heat
  • Time

3
Brief History of Timing Attacks
  • Timing attacks consider variability in the time
    taken to perform an encryption due to secret
    data.
  • Paul Kocher demonstrated timing attacks against
    Diffie-Hellman, RSA, DSS, etc. at CRYPTO 96
  • Dan Boneh, David Brumley demonstrate first remote
    timing attack against RSA in 2003
  • Public Key systems are vulnerable due to their
    use of lengthy mathematical operations

4
Brief History of Timing Attacks
  • During AES competition, timing attacks were only
    believed to be possible against branch statements
    or data-dependent rotations.
  • Rijndael has a mathematical formulation in the
    field GF(28)
  • Optimized Rijndael implementation in software use
    only table lookup, shift, and exclusive-or
    operations
  • NIST declared Rijndael not vulnerable to timing
    attacks in it final evaluation in 2000, Rijndael
    wins competition.

5
Brief History of Timing Attacks
  • Daniel Bernstein announces successful timing
    attacks against AES in April 2005, exploiting
    timing characteristics of table lookups
  • Osvik, Shamir, Tromer, follow up in November 2005
    with very powerful attacks, requiring direct
    observation of cache before and after encryption

6
Implementation details of AES, part I
  • The textbook description of an AES round as a
    function from (Xi, Ki) ? Xi1

7
Implementation details of AES, part I
  • The actual round computation in software, as
    proposed with Rijndael and now widely used
  • All three operations are combined into
    pre-computed tables. A round of encryption
    requires just 16 table lookups, 16 xors, and 12
    shifts.

8
Bernsteins timing attack
Notice that for the first round, the table lookup
indices are each related to only one key byte and
one plaintext byte Remarkably, the entire
encryption time will be affected by just the
value of
9
Bernsteins timing attack
To prepare for the attack, collect a large body
of reference timing data for each
10
Bernsteins timing attack
Next, collect a large body of timing data from a
target machine for the plaintext byte
11
Bernsteins timing attack
The target machines timing data should be
shifted from the reference data by exactly
12
Bernsteins timing attack
The target machines timing data should be
shifted from the reference data by exactly
13
Bernsteins timing attack
  • Problems
  • The reference machine must be identical to the
    target
  • Requires known plaintext as well as timing data
  • Plaintexts must be sufficiently random
  • High number of samples required, best case as
    reported by Bernstein is around 227.5

14
Bernsteins timing attack
  • Overall, a very general statistical method to
    constructing a timing attack.
  • Getting code to run in constant time on a machine
    with cache is very difficult, meaning most
    cryptosystems are theoretically vulnerable.
  • Bernsteins attack doesnt exploit any specific
    features of Rijndael, yet the attack does not
    seem to work against other AES finalists
    (Serpent, Twofish)

15
Cache-collision timing attacks
What is Rijndaels weakness?
16
Cache-collision timing attacks
  • What is Rijndaels weakness?
  • Heavy use of table lookups which dominate the
    running time
  • Table lookup indices are easily related to single
    plaintext and key bytes

17
Cache collisions
  • Rijndael is just a sequence of table lookups.

Tx Tx Txi Tx Tx Txj Tx

18
Cache collisions
  • Rijndael is just a sequence of table lookups.
  • What happens when xi xj?

Tx Tx Txi Tx Tx Txj Tx

19
Cache collisions
  • Rijndael is just a sequence of table lookups.
  • What happens when xi xj?
  • The access to xj will hit in cache.

Tx Tx Txi Tx Tx Txj Tx

20
Cache collisions
  • Rijndael is just a sequence of table lookups.
  • What happens when xi xj?
  • The access to xj will hit in cache.
  • What happens when xi? xj?

Tx Tx Txi Tx Tx Txj Tx

21
Cache collisions
  • Rijndael is just a sequence of table lookups.
  • What happens when xi xj?
  • The access to xj will hit in cache.
  • What happens when xi? xj?
  • The access to xj may or may not hit in cache,
    depending on the rest of the sequence and the
    prior cache contents.

Tx Tx Txi Tx Tx Txj Tx

22
Cache collisions
A cache-collision occurs when we know that xi
xj. For a large number of samples, the average
encryption time will be lower when xi xj than
when xi? xj. This is all we need to build an
attack.
Tx Tx Txi Tx Tx Txj Tx

23
Cache collisions
Actual Results, Pentium III
24
First Round Attack
Pick two lookups in the first round of
encryption
25
First Round Attack
Pick two lookups in the first round of
encryption
Solve for the collision constraint
26
First Round Attack
Result A working attack! There is an easily
identifiable low average encryption time whenever
27
First Round Attack
Result A working attack! There is an easily
identifiable low average encryption time
whenever However, there are some complications
28
Complication 1 Table families
Notice four separate tables are used
Each family of four bytes is isolated.
29
Complication 2 Cache Lines
Modern memory is cached in lines.
30
Complication 2 Cache Lines
Modern memory is cached in lines.
Table Lookup
31
Complication 2 Cache Lines
Modern memory is cached in lines.
Table Lookup
Cache
32
Complication 2 Cache Lines
So, we can only tell if two lookups hit the same
line in memory, not if they are identical. We
denote Most CPUs use 32 or 64 byte cache
lines. With 4 byte table entries, this means we
are forced to ignore the 3 or 4 low-order bits.
33
First Round Attack The bad news
We gain a set of equations in each family, such
as This leaves 68 or 80 bits of key to
search. This limitation was also problematic
for Osvik et al. Their solution examine the
second round as well. This can fix some of the
problems but is difficult for timing attacks (see
paper).
34
First Round Attack The good news
  • Cache-collisions are a strong method.
  • The timing variability is much better than the
    random effects previously used.
  • The attack requires 215 samples, compared to
    227.5.
  • Can we recover the full key with this efficiency?

35
Implementation details of AES, part II
The final round of encryption is special
round 1
round 2

special!
round 8
round 9
round 10
36
Implementation details of AES, part II
  • The final round of encryption is special
  • No MixColumns operation is performed, as it would
    add no additional security
  • In software, this requires a new table to be used
    only for the final round. This table is just the
    S-box

37
Implementation details of AES, part II
  • The final round also uses expanded key bytes
  • However, the AES key schedule is invertible.
    Finding the final 16 bytes is equivalent to
    finding the raw key. This design was intentional.

38
Final Round Attack
  • Again, we consider a cache-collision for two
    bytes
  • When do these bytes collide in the table?

39
Final Round Attack

We want to solve for
40
Final Round Attack

We want to solve for We
assume that
41
Final Round Attack

We want to solve for We
assume that ,
leaving
42
Final Round Attack

We want to solve for We
assume that ,
leaving
43
Final Round Attack

So, guarantees a
collision What happens if
?
44
Final Round Attack

So, guarantees a
collision What happens if
? We get a fixed offset
45
Final Round Attack

So, guarantees a
collision What happens if
? We get a fixed offset Surprise the
non-linearity of the S-box enables the attack to
succeed.
46
Final Round Attack

Why does this happen? Because a, ß, are the
result S-box lookups, a fixed offset does not
mean anything about the indices used to look them
up. A small offset ? 1 does not mean a
collision on the same cache line. Thus, the
cache-line issue is gone.
47
Final Round Attack
  • Collect timing data, compute average time for
    each value of for all i, j. Low
    times will occur at the values
  • Attack data produces likelihood estimate for
    different values for each ki, kj.
  • Need to find k0,,k15 minimizing the global cost
    function ?ij Cij(ki, kj)
  • Use standard AI algorithms (Local Optimization,
    Belief Propagation).


48
Final Round Attack Results
CPU L2 cache eviction L1 cache eviction
Pentium III 215 216
Pentium IV 216 219.9
UltraSPARC-III 215 218.7
  • Huge improvement over the original 227.5.
  • Offline complexity is low, attack takes
    seconds. This can be increased to further lower
    number of samples required.

49
Expanded Final Round Attack
CPU L2 cache eviction L1 cache eviction
Pentium III 213 214
Pentium IV 213.6 218.6
UltraSPARC-III 214.3 217.3
  • Produce cost estimate for specific values of key
    bytes, instead of simply their difference
  • Require more time, memory by attacker, but attack
    still finishes in 10 minutes

50
Final Round Attack Results
  • Bonuses from attacking the final round
  • Attack requires only ciphertext and timing.
  • Related plaintexts produce essentially random
    cipher state by the 9th round.
  • Attack is oblivious to the target platform
  • Attack works well against decryption

51
Final Round Attack Results
  • The attack should be widely applicable
  • Most CPUs use similar cache structure
  • Most standard crypto libraries use the original
    Rijndael implementation of AES. Attacks are
    implemented against OpenSSL.

52
Final Round Attack Complications
  • The attacks assume the AES tables are out of
    cache before encryption. This means a target
    machine must be made to do some unrelated work in
    between encryptions.
  • Recent CPUs (ie Pentium IV) are more complicated
    than the modelhardware prefetch, out-of-order
    execution, etc.
  • Larger cache line sizes are also a problem.

53
Countermeasures
  • Solutions requiring special hardware support are
    probably not practical
  • Cannot guarantee the encryption will take
    constant time without crippling performance.
  • It is possible to greatly increase resistance of
    the common AES implementation to final round
    attacks with no performance penalty by
    eliminating the special lookup table

54
Conclusions
  • AES is vulnerable to timing attacks due to its
    use of table lookups. Better attacks are still
    possible.
  • Real-world use of timing attacks is questionable,
    as they require cycle-count level data, but these
    attacks tolerate much more noise than before
  • Applications?
  • Process-to-Process attacks
  • Virtual Machines
  • Against a secure CPU on a multiprocessor
    machine
  • Against a remote server- the holy grail

55
Conclusions
  • Table lookups into cached memory are dangerous
    for cryptographic software.
  • Information leaked through many side channels
  • Time
  • Cache contents
  • Power usage
  • AES selection largely ignored this problem.
    Runner up cipher Serpent avoids lookup tables,
    but this was not seen as an advantage.

56
Thank you
  • Questions?
  • Joseph Bonneau
  • jbonneau_at_stanford.edu
  • Current version of paper available at
  • www.stanford.edu/jbonneau/AES_timing.pdf
Write a Comment
User Comments (0)
About PowerShow.com