Matt Henricksen PowerPoint PPT Presentation

presentation player overlay
1 / 43
About This Presentation
Transcript and Presenter's Notes

Title: Matt Henricksen


1
Fast Implementation of Symmetric Ciphers
  • Matt Henricksen
  • Information Security InstituteQueensland
    University of Technology

2
Introduction
  • Dragon A Fast Word Based Stream Cipher.
  • Kevin Chen, Matt Henricksen, Bill Millan, Joanne
    Fuller, Leonie Simpson, Ed Dawson, H. Lee, S
    Moon.
  • Why we need fast symmetric ciphers
  • Ten steps to designing fast symmetric ciphers
  • Some conclusions

3
Why do we care about speed?
  • Benchmark cipher AES
  • New ciphers must be
  • ultra-efficient (multi-gigabit per second) or
  • efficient in constrained devices or
  • demonstrably more secure than AES

Steve Babbage Stream ciphers what does
industry want? http//www.ecrypt.eu.org/stvl/sasc
/slides21.pdf
4
ECRYPT NoE eSTREAM
  • 2005 call for stream cipher primitives

slow broken IP
http//www.ecrypt.eu.org/stream/perf/results
5
eSTREAM Candidate Statements about Software
Implementation
6
Low Risk, High Reward Guidelines
Cipher efficiency relates to not only number
and type of operations but also the match
between design and architecture
7
Timing Symmetric Ciphers
  • cpuid
  • rdtsc
  • mov subtime, eax
  • cpuid
  • rdtsc
  • sub eax, subtime
  • mov subtime, eax
  • cpuid
  • rdtsc
  • mov subtime, eax
  • cpuid
  • rdtsc
  • sub eax, subtime
  • mov subtime, eax

cpuid rdtsc mov subtime, eax cpuid rdtsc sub eax,
subtime mov subtime, eax cpuid rdtsc mov
time, eax Do operation cpuid rdtsc sub eax, time
mov time, eax
8
Timing Symmetric Ciphers
9
53.3
10
Intel Pentium 4 Architecture
Execution Engine
L2 Cache
Front-end
Execution Core
Ports
Fast/Normal Integer
Memory Load
Memory Store
Retirement Unit
11
Registers
  • Register pressure
  • seven registers, more variables
  • invisible overhead
  • POP register
  • MOV register, eighth variable
  • MOV eighth variable, register
  • PUSH register

12
Phelix
Doug Whiting, Bruce Schneier, Stefan Lucks and
Frédéric Muller. Phelix Fast Encryption and
Authentication in a Single Cryptographic
Primitive
13
Phelix
14
Mir-1 (Loop State Update)
15
Large States and Small Updates
Dragon Stream Cipher
16
(No Transcript)
17
L2 Cache
Execution Engine
Front-end
Execution Core
Retirement Unit
18
FSRs
S0
S4
x
(x-4) mod l
S4
S0
19
Unrolling FSRs
  • Advantages
  • no need for index
  • loop unrolling benefit
  • no bound checking
  • Disadvantages
  • increased code footprint
  • reduced flexibility
  • possible overhead
  • reduced applicability
  • cache penalties

20
Py
P 256 byte array
Pi 1 word pointer into P
s 1 word memory
Y 260 word array ( 1040 bytes)
Yi 1 word pointer into Y
Eli Biham and Jennifer Seberry. Py A Fast and
Secure Stream Cipher using Rolling Arrays
21
Py Strategy
F
default strategy allocate 4000 stages 32
kilobytes
  • Advantages
  • 25 increase in speed
  • loop unrolling benefit
  • no bound checking
  • Disadvantages
  • increased code footprint (1500)
  • only partially unrolled
  • extensive copy retained
  • reduced flexibility
  • reduced applicability
  • cache penalties

22
Merkle-Damgård Construction
(128 bits)
23
Iterated Halving
Praveen S.S. Gauravaram, Lauren May and William
L. Millan CRUSH A New Cryptographic Hash
Function using Iterative Halving Technique
24
Execution Engine
Port 0
Port 3
Port 2
Port 1
Memory Store
ALU x2
Memory Load
ALU x2
Int x1
MMX x½
Move x1
, -, ? AND, OR, NOT Ifthen Store
, -, ?
?, / ltlt, ltltlt
25
Schedule of Instructions
From IA-32 Intel Architecture Optimization
Reference Manual, April 2006
26
Latency and Throughput
27
(No Transcript)
28
Time 0 1 2 3 4 5 6 6.5 7 7.5 8 9 10 10.5 11 12 1
3 14
ALU0 (Port 0) ADD eax, edx ADD ebx, esi
XOR ecx, eax XOR edx, ebx ADD esi, ecx ADD
edx, edi XOR eax, edx
ALU1 (Port 1) ROL edx, 15 ROL esi,
25 ROL eax, 9 ROL ebx, 10 ROL ecx, 17
LOAD (Port 2) MOV edi MOV eax MOV ebx MOV
ecx MOV edx MOV esi MOV edi POP edx
STORE (Port 3) PUSH edx
29
S-boxes
  • High source of non-linearity
  • 8x8 s-box single lookup
  • do not use fast execution units
  • at least three cycles per byte
  • 8 x 8 s-box lookup y S(x)

ALU0 MOV MOV
30
Large s-boxes
S3
S2
S1
S0
8 x 32
x2
x1
x0
x3
0
8
31
x
y
31
(No Transcript)
32
Hermes
114 cycles/byte
33
MAG
30.7 cycles/byte
34
  • Productivity
  • Dont use small word sizes (HERMES)
  • Dont throw away word as filter (MAG)

pCIPHER
35
Branching
conditional
branch 1
branch 2
36
Branching in MAG
throughput 0.5
227 iterations - ? 1.21 seconds, ? 0.046975
seconds
37
Improving the Branching
throughput 1 latency 10
227 iterations - ? 0.66 seconds, ? 0.0037839
seconds
38
MMX/SSE

39
SSE2
  • Limited instruction set
  • cannot work with imediates
  • cannot indirectly address memory (sboxes)
  • Operations ? 2 cycles each
  • (half speed execution unit)
  • Not interoperable with other register sets
  • penalty for transferring from general registers
  • Alignment penalties
  • Misalignment within a cache line -40
  • Misalignment across a cache line -500

40
Dragon
15 cycles
1.5 cycles
42 cycles
1.5 cycles
23 cycles
Time 0 0.5 1.0 1.5
ALU0 XOR EBX XOR EDX XOR EAX
ALU1 XOR ESI ADD ECX ADD EDI
41
0 MOV EDX, DWORD EBP 0x8
0
127
63
1 MOVDQA XMM0, EDX
2 MOVDQA XMM1, EDX16
10 XORPD XMM1, XMM0
12 PSRLDQ XMM1, 4
14 PADDQ XMM1, 4
16 MOVDQA EDX, XMM0
18 MOVDQA EDX16, XMM1
26 CALL sboxes
42
Summary
  • Main points of this talk
  • dont make arbitrary assumptions
  • understand architecture in general terms
  • implement your cipher during design phase
  • Secondary points
  • all of the fast eSTREAM ciphers follow most of
    these guidelines
  • but not all guidelines
  • (Py uses lots of memory, Dragon uses large
    s-boxes, etc)
  • these guidelines are orthogonal to cipher
    security
  • they are guidelines, not constraints!

43
Questions?
Write a Comment
User Comments (0)
About PowerShow.com