Title: Vulnerabilities on high-end processors
1Vulnerabilities on high-end processors
André Seznec IRISA/INRIA CAPS project-team
2 A paradox
- Microarchitectures are more and more complex
- Timing side channel attacks were presented on
versions of AES (Bernstein) and RSA (Açiimez et
al.)
3Many hardware features only to improve performance
- Caches
- Pipeline
- Superscalar execution
- Branch prediction
- Thread parallelism
4Execution time of a short instruction sequence is
a complex function !
5Execution time of a short instruction sequence is
a complex function (2)
- Depends on the precise state of every
microarchitecture component - More than 100 speculative instructions inflight
at the same time on a Pentium 4 - Instructions are executed out-of-order.
- Strange correlations almost impredictable at
compile time - (even in the back-end compiler)
6Understanding AES cache timing attack on high end
microprocessor (follows Bernstein2005)
- AES with lookup tables is a 10 round algorithm
with the following vulnerabilties - The number, the types and the order of the
instructions are independent of the key K and the
message M to be encrypted. - The exact locations of the data word read and
written by the first round only depend on K xor
M - The execution time of the first round depends on
K xor M (at least statistically) - CAN BE EXPLOITED
7Bernstein 2005 (empty cache)
- Plaintext attack
- Irrealistic hypothesis
- Access to cycle-accurate encryption timing
- Cache is flushed between two encryptions
- Not explicit in the paper (but see Lauradoux et
al.) - Byte by byte determination of the key based on
statistically determining the maximum encryption
time for each byte of K xor M - works only on Pentium 3, not on Pentium 4 ?
8A loaded cache attack (proof of concept codes
available)
- Plaintext attack
- Timing of large number of encryptions
- An irrealistic hypothesis
- Access to cycle-accurate encryption timings
- On a byte basis of K xor M, determine bit
subchains statistically leading to the highest
encryption time ( threshold to get confidence) - Depending on microarchitectures
- 0 to 80 bits of the key recovered by this method
depending on the model and stepping of Pentium 4 - Suspect exercising banking in the cache
-
9First vulnerability
- For given sequence,
- Timings are erratic
- Unlikely to get exactly the same timing
- But statistically correlated
- cache banking, operation chaining appears in the
average
10A possible counter measure for AES
- Periodically and randomly change the mapping of
the look up tables - 9000 cycles for this change XOR based
permutation - See Lauradoux et al
- HAVEGE can provide the random numbers.
11 Indirect timing measures ?
- Hypothesis
- The attacker has access to user mode on the
system (legal or illegal) - The attacker has no access to your data
- He/she can run concurently its process with the
encryption - On conventional systems, no access to microscopic
timing of your application - Time slice in 1,000,000s cycles
12Simultaneous Multithreading (SMT) parallel
processing on a single processor
- functional units are underused on superscalar
processors - SMT
- Sharing the functional units on a superscalar
processor between several process - Advantages
- Single process can use all the resources units
- dynamic sharing of all structures on
parallel/multiprocess workloads
Second Vulnerability
13 Superscalar
Issue slots
14Indirect timing measures on a SMT processor
(principles)
- SPY wants to get information on CRYPT
- SPY and CRYPT runs in parallel
- SPY tracks a specific event on CRYPT
- For instance execution of a branch ?
- SPY saturates hardware resources needed for this
event by CRYPT for fast execution - SPY records its own execution time (reading the
hardware clock counter) - Irregurality in its own execution time signals
the event - CRYPT has try to grab the hardware resource
15Indirect timing measures on a SMTproof of
concept (derived from SBPA)
The skeleton of a naive RSA core
For I 1 to N Sequence X //
1,000s of cycles If KeyI1
Sequence Y // 1,000s of cycles Endfor
Spy this branch B
16Indirect timing measures on a SMTproof of
concept (2)
- Branch instructions are buffered in a BTB
- On Pentium 4, when the branch misses in the BTB,
more than 20 cycles penalty - SPY nearly infinite loop iterating on branching
over a set of branches occupying the possible
entries for B - Track irregularities in the timing of the loop
- When B is executed, a branch of the SPY is
ejected from the BTB, thus creating a timing
irregularity - Iteration is X-type or XY-type
Able to reproduce this attack on a toy example
17Indirect timing measures on a SMT
- Feasible
- On a branch on Pentium4 HT, information is
leaking - I recovered all the bits of 32 bits key in a
single run (on a toy example) - Same kind of attack may apply for cache access
memory access sequence could be discovered
18Feasible, but difficult
- Technically, very difficult
- Lack of documentation on the BTB
- Strange indexing, unknown associativity, BTB
hierarchy - Requires relatively infrequent events 1,000s
cycles frequency measure resolution is in the
100s cycles resolution
19So what ?
- On Pentium 4 HT
- If key bits control branches (or addresses of
loads) - Might be recovered by a spy thread
20Countermeasures
- Just deactivate Hyperthreading.
- At present that is a global OS mode (boot time)
- Rework implementation
- Introduce randomness in control path at execution
? - Makes attack much more complex