Title: Cache-Collision Timing Attacks Against AES
1Cache-Collision Timing Attacks Against AES
- Joseph Bonneau
- Stanford University
- jbonneau_at_stanford.edu
Ilya Mironov Microsoft Research mironov_at_microsoft.
com
2Side Channel Cryptanalysis
- Definition Any attack on a cryptosystem using
information leaked given off as a byproduct of
the physical implementation of the cryptosystem,
rather than a theoretical weakness. - Exploitable side-channels
- Power usage
- Cache accesses
- Noise
- Heat
- Time
3Brief History of Timing Attacks
- Timing attacks consider variability in the time
taken to perform an encryption due to secret
data. - Paul Kocher demonstrated timing attacks against
Diffie-Hellman, RSA, DSS, etc. at CRYPTO 96 - Dan Boneh, David Brumley demonstrate first remote
timing attack against RSA in 2003 - Public Key systems are vulnerable due to their
use of lengthy mathematical operations
4Brief History of Timing Attacks
- During AES competition, timing attacks were only
believed to be possible against branch statements
or data-dependent rotations. - Rijndael has a mathematical formulation in the
field GF(28) - Optimized Rijndael implementation in software use
only table lookup, shift, and exclusive-or
operations - NIST declared Rijndael not vulnerable to timing
attacks in it final evaluation in 2000, Rijndael
wins competition.
5Brief History of Timing Attacks
- Daniel Bernstein announces successful timing
attacks against AES in April 2005, exploiting
timing characteristics of table lookups - Osvik, Shamir, Tromer, follow up in November 2005
with very powerful attacks, requiring direct
observation of cache before and after encryption
6Implementation details of AES, part I
- The textbook description of an AES round as a
function from (Xi, Ki) ? Xi1
7Implementation details of AES, part I
- The actual round computation in software, as
proposed with Rijndael and now widely used - All three operations are combined into
pre-computed tables. A round of encryption
requires just 16 table lookups, 16 xors, and 12
shifts.
8Bernsteins timing attack
Notice that for the first round, the table lookup
indices are each related to only one key byte and
one plaintext byte Remarkably, the entire
encryption time will be affected by just the
value of
9Bernsteins timing attack
To prepare for the attack, collect a large body
of reference timing data for each
10Bernsteins timing attack
Next, collect a large body of timing data from a
target machine for the plaintext byte
11Bernsteins timing attack
The target machines timing data should be
shifted from the reference data by exactly
12Bernsteins timing attack
The target machines timing data should be
shifted from the reference data by exactly
13Bernsteins timing attack
- Problems
- The reference machine must be identical to the
target - Requires known plaintext as well as timing data
- Plaintexts must be sufficiently random
- High number of samples required, best case as
reported by Bernstein is around 227.5
14Bernsteins timing attack
- Overall, a very general statistical method to
constructing a timing attack. - Getting code to run in constant time on a machine
with cache is very difficult, meaning most
cryptosystems are theoretically vulnerable. - Bernsteins attack doesnt exploit any specific
features of Rijndael, yet the attack does not
seem to work against other AES finalists
(Serpent, Twofish)
15Cache-collision timing attacks
What is Rijndaels weakness?
16Cache-collision timing attacks
- What is Rijndaels weakness?
- Heavy use of table lookups which dominate the
running time - Table lookup indices are easily related to single
plaintext and key bytes
17Cache collisions
- Rijndael is just a sequence of table lookups.
Tx Tx Txi Tx Tx Txj Tx
18Cache collisions
- Rijndael is just a sequence of table lookups.
- What happens when xi xj?
Tx Tx Txi Tx Tx Txj Tx
19Cache collisions
- Rijndael is just a sequence of table lookups.
- What happens when xi xj?
- The access to xj will hit in cache.
Tx Tx Txi Tx Tx Txj Tx
20Cache collisions
- Rijndael is just a sequence of table lookups.
- What happens when xi xj?
- The access to xj will hit in cache.
- What happens when xi? xj?
Tx Tx Txi Tx Tx Txj Tx
21Cache collisions
- Rijndael is just a sequence of table lookups.
- What happens when xi xj?
- The access to xj will hit in cache.
- What happens when xi? xj?
- The access to xj may or may not hit in cache,
depending on the rest of the sequence and the
prior cache contents.
Tx Tx Txi Tx Tx Txj Tx
22Cache collisions
A cache-collision occurs when we know that xi
xj. For a large number of samples, the average
encryption time will be lower when xi xj than
when xi? xj. This is all we need to build an
attack.
Tx Tx Txi Tx Tx Txj Tx
23Cache collisions
Actual Results, Pentium III
24First Round Attack
Pick two lookups in the first round of
encryption
25First Round Attack
Pick two lookups in the first round of
encryption
Solve for the collision constraint
26First Round Attack
Result A working attack! There is an easily
identifiable low average encryption time whenever
27First Round Attack
Result A working attack! There is an easily
identifiable low average encryption time
whenever However, there are some complications
28Complication 1 Table families
Notice four separate tables are used
Each family of four bytes is isolated.
29Complication 2 Cache Lines
Modern memory is cached in lines.
30Complication 2 Cache Lines
Modern memory is cached in lines.
Table Lookup
31Complication 2 Cache Lines
Modern memory is cached in lines.
Table Lookup
Cache
32Complication 2 Cache Lines
So, we can only tell if two lookups hit the same
line in memory, not if they are identical. We
denote Most CPUs use 32 or 64 byte cache
lines. With 4 byte table entries, this means we
are forced to ignore the 3 or 4 low-order bits.
33First Round Attack The bad news
We gain a set of equations in each family, such
as This leaves 68 or 80 bits of key to
search. This limitation was also problematic
for Osvik et al. Their solution examine the
second round as well. This can fix some of the
problems but is difficult for timing attacks (see
paper).
34First Round Attack The good news
- Cache-collisions are a strong method.
- The timing variability is much better than the
random effects previously used. - The attack requires 215 samples, compared to
227.5. - Can we recover the full key with this efficiency?
35Implementation details of AES, part II
The final round of encryption is special
round 1
round 2
special!
round 8
round 9
round 10
36Implementation details of AES, part II
- The final round of encryption is special
- No MixColumns operation is performed, as it would
add no additional security - In software, this requires a new table to be used
only for the final round. This table is just the
S-box
37Implementation details of AES, part II
- The final round also uses expanded key bytes
- However, the AES key schedule is invertible.
Finding the final 16 bytes is equivalent to
finding the raw key. This design was intentional.
38Final Round Attack
- Again, we consider a cache-collision for two
bytes - When do these bytes collide in the table?
39Final Round Attack
We want to solve for
40Final Round Attack
We want to solve for We
assume that
41Final Round Attack
We want to solve for We
assume that ,
leaving
42Final Round Attack
We want to solve for We
assume that ,
leaving
43Final Round Attack
So, guarantees a
collision What happens if
?
44Final Round Attack
So, guarantees a
collision What happens if
? We get a fixed offset
45Final Round Attack
So, guarantees a
collision What happens if
? We get a fixed offset Surprise the
non-linearity of the S-box enables the attack to
succeed.
46Final Round Attack
Why does this happen? Because a, ß, are the
result S-box lookups, a fixed offset does not
mean anything about the indices used to look them
up. A small offset ? 1 does not mean a
collision on the same cache line. Thus, the
cache-line issue is gone.
47Final Round Attack
- Collect timing data, compute average time for
each value of for all i, j. Low
times will occur at the values - Attack data produces likelihood estimate for
different values for each ki, kj. - Need to find k0,,k15 minimizing the global cost
function ?ij Cij(ki, kj) - Use standard AI algorithms (Local Optimization,
Belief Propagation).
48Final Round Attack Results
CPU L2 cache eviction L1 cache eviction
Pentium III 215 216
Pentium IV 216 219.9
UltraSPARC-III 215 218.7
- Huge improvement over the original 227.5.
- Offline complexity is low, attack takes
seconds. This can be increased to further lower
number of samples required.
49Expanded Final Round Attack
CPU L2 cache eviction L1 cache eviction
Pentium III 213 214
Pentium IV 213.6 218.6
UltraSPARC-III 214.3 217.3
- Produce cost estimate for specific values of key
bytes, instead of simply their difference - Require more time, memory by attacker, but attack
still finishes in 10 minutes
50Final Round Attack Results
- Bonuses from attacking the final round
- Attack requires only ciphertext and timing.
- Related plaintexts produce essentially random
cipher state by the 9th round. - Attack is oblivious to the target platform
- Attack works well against decryption
51Final Round Attack Results
- The attack should be widely applicable
- Most CPUs use similar cache structure
- Most standard crypto libraries use the original
Rijndael implementation of AES. Attacks are
implemented against OpenSSL.
52Final Round Attack Complications
- The attacks assume the AES tables are out of
cache before encryption. This means a target
machine must be made to do some unrelated work in
between encryptions. - Recent CPUs (ie Pentium IV) are more complicated
than the modelhardware prefetch, out-of-order
execution, etc. - Larger cache line sizes are also a problem.
53Countermeasures
- Solutions requiring special hardware support are
probably not practical - Cannot guarantee the encryption will take
constant time without crippling performance. - It is possible to greatly increase resistance of
the common AES implementation to final round
attacks with no performance penalty by
eliminating the special lookup table
54Conclusions
- AES is vulnerable to timing attacks due to its
use of table lookups. Better attacks are still
possible. - Real-world use of timing attacks is questionable,
as they require cycle-count level data, but these
attacks tolerate much more noise than before - Applications?
- Process-to-Process attacks
- Virtual Machines
- Against a secure CPU on a multiprocessor
machine - Against a remote server- the holy grail
55Conclusions
- Table lookups into cached memory are dangerous
for cryptographic software. - Information leaked through many side channels
- Time
- Cache contents
- Power usage
- AES selection largely ignored this problem.
Runner up cipher Serpent avoids lookup tables,
but this was not seen as an advantage.
56Thank you
- Questions?
- Joseph Bonneau
- jbonneau_at_stanford.edu
- Current version of paper available at
- www.stanford.edu/jbonneau/AES_timing.pdf