Master - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Master

Description:

Blowfish is outlier, drops below 10% only for 64k byte sessions. 8/27/09 ... Blowfish, 3DES, Rijndael and Twofish rely on substitutions; benefit from ... – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 29
Provided by: lis69
Category:
Tags: blowfish | master

less

Transcript and Presenter's Notes

Title: Master


1
Masters ThesisFast Flexible Architectures for
Secure Communication
  • Lisa Wu
  • University of Michigan
  • Advanced Computer Architecture Laboratory
  • Advisor Professor Todd Austin

2
Project Overview
  • Cipher Kernel Analyses
  • Throughput analysis, bottleneck analysis,
    relative run time cost, kernel characterization
  • Architectural Extensions
  • CryptoManiac Architecture
  • Instruction architecture, system architecture,
    processing element architecture, physical design
    characteristics
  • Super Optimizer
  • Validation and parameter studies
  • Performance Analysis
  • Encryption rate studies

3
My Research Contribution
  • Design and implementation of the CryptoManiac
    co-processor
  • Hardware models of CryptoManiac
  • 8WC, 4WC, 3WC, 2WC, and 4WNC
  • ISA and scheduling of kernels
  • Timing, area, power, and performance analysesof
    the CryptoManiac co-processor
  • Design and implementation of the super optimizer
  • Instruction combination study
  • Automatic generation of varied width schedules
  • Publication - ISCA 2001

4
Cryptography
  • Definitions
  • encryption vs. decryption
  • public-key cipher vs. secret-key cipher
  • Public-secret key ciphers are the most commonly
    used

plaintext
ciphertext
plaintext
Public Key
Private Key
plaintext
ciphertext
plaintext
Private Key
Private Key
5
SSL Session BreakdownFocus Secret-Key Ciphers
server
client
authenticate
public
private key
https get
https recv
. . .
private
close
6
Benchmark Suite
  • Cipher Key Size Blk Size Rnds/Blk Author Applicati
    on
  • 3DES 112 64 48 CryptSoft SSL, SSH
  • Blowfish 128 64 16 CryptSoft Norton Utilities
  • IDEA 128 64 8 Ascom PGP, SSH
  • Mars 128 128 16 IBM AES Candidate
  • RC4 128 8 1 CryptSoft SSL
  • RC6 128 128 18 RSA Security AES Candidate
  • Rijndael 128 128 10 Rijmen AES Standard
  • Twofish 128 128 16 Counterpane AES Candidate

7
Cipher Throughput Analysis
  • Alpha 21264 vs. 4W
  • All except Mars and Twofish were within 10 of
    the actual machine tests
  • Mars 11, Twofish 15
  • Alpha 21264 vs. DF
  • Blowfish, IDEA, and RC6 are running within 20 of
    DF performance
  • Mars 29, Twofish 76
  • RC4 and Rijndael are outliers

8
Cipher Bottleneck Analysis
  • Alias - impact of stalling loads in the pipeline
    until all ealier store addresses have been
    resolved
  • Branch - effects of mispredictions
  • Issue - impact of reducing issue width
  • Mem - impact of introducing a realistic memory
    system
  • Res - impact of limited functional unit resources
  • Window - impact of a limited-size instruction
    window

9
Cipher Relative Run Time CostFocus Kernel Loop
  • 3DES and IDEA are small even for 16 byte sessions
  • Mars, RC4, RC6, Rijndael, and Twofish drop well
    below 10 for 4k byte sessions
  • Blowfish is outlier, drops below 10 only for
    64k byte sessions

10
Cipher Kernel Characterization
  • SBOX - substitutions
  • XBOX - permutations
  • IDEA, Mars, RC4, and RC6 rely on arithmetic
    computations benefit from more resources
    (multiplies) and from faster operations (rotates)
  • Blowfish, 3DES, Rijndael and Twofish rely on
    substitutions benefit from increased memory
    bandwidth and accesses

11
Architectural Extensions
  • All instructions are limited to two register
    input operands and one register output
  • ROL and ROR (rotates) for 64 and 32-bit data
    types
  • ROLX and RORX support a constant rotate of a
    register input, followed by an XOR with another
    register input
  • MULMOD computes the modular multiplication of two
    register values modulo the value 0x10001
  • SBOX speeds the accessing of substitution tables
    with 256-entry tables and 32-bit contents
  • SBOXSYNC synchronize the SBOX table with memory
  • XBOX implements a portion of a full 64-bit
    permutation

12
SBOX Instruction Semantics
  • SBOX instruction eliminates address generation
  • All SBOX tables are aligned to a 1k byte boundary
  • Address generation becomes zero-latency bit
    concatenation
  • Stores to SBOX storage are not visible by later
    SBOXs until
  • An SBOXSYNC is executed
  • An alias bit is set

13
Performance of ISA Extensions
14
The CryptoManiac Processor
  • A 4-wide 32-bit VLIW machine with no cache and a
    simple branch predictor
  • Supports a triadic (three input operands) ISA
    that permits combining of most cryptographic
    operation pairs for better clock cycle
    utilization
  • Can be combined into chip multiprocessor
    configurations for improved performance on
    workloads with inter-session and inter-packet
    parallelism

15
CryptoManic ISA
  • bundle ltinstgtltinstgtltinstgtltinstgt
  • inst ltoperation pairgtltdestgtltoperand 1gtltoperand
    2gtltoperand 3gt
  • operation pair ltshortgtlttinygtlttinygtltshortgtlttin
    ygtlttinygtltlonggtltnopgt
  • tiny ltxorgt ltandgt ltincgt ltsignextgt ltnopgt
  • short ltaddgt ltsubgt ltrotgt ltsboxgt ltnopgt
  • long ltmulgt ltmulmodgt
  • Examples
  • Instruction Expression
  • Add-Xor R4, R1, R2, R3 R4 lt- (R1R2)?R3
  • And-Rot R4, R1, R2, R3 R4 lt- (R1R2)ltltltR3
  • And-Xor R4, R1, R2, R3 R4 lt- (R1R2)?R3

16
Scheduling Example Blowfish
17
High-Level Schematic of a Single Functional Unit
18
CryptoManiac Architecture
19
CryptoManiac System Architecture
20
Timing and Area Results
21
Encryption Performance
22
Special Case Studies3DES and Rijndael
23
The Super Optimizer
  • Validate hand-scheduled kernel results
  • Automate generation of optimized kernels for the
    various CryptoManiac architecture studied
  • Instruction combination studies give insight as
    to possibly eliminate unnecessary hardware

S
24
Instruction Combination Study
25
Instruction Combining Characteristics
26
Conclusion
  • Two hardware/software-design techniques to
    improve the performance of secret-key cipher
    algorithms
  • Add instruction support for fast substitutions,
    general permutations, rotates, and modular
    arithmetic
  • SBOX eliminates address generation
  • Overall speedup of 59 over baseline machine w/
    rotates
  • Design an efficient 4-wide VLIW cryptographic
    co-processor called the CryptoManiac
  • Instruction combining - efficient utilization of
    clock cycle
  • Rijndael runs 2.25 times faster with 1/100th area
    and power of a 600MHz Alpha processor

27
Future Work
  • Access the cost of programmability in the
    CryptoManiac by comparing design and performance
    of
  • A dedicated hardware Rijndael implementation (no
    programmability)
  • A FPGA Rijndael implementation (hardware
    programmability)
  • CryptoManiac (software programmability).
  • Other application specific processors such as
    audio processing, speech recognition, and
    soft-radio.

28
Acknowledgement
  • Credit for much of the work described in this
    thesis belongs to my advisor, Professor Todd
    Austin, for his insight, guidance, and patience.
    He provided for an excellent research
    environment, left me enough freedom to do things
    the way I thought they should be done, and was
    always available to discuss ideas and problems.
  • I would also like to thank my committee members
    Professor Steve Reinhardt and Professor Gary
    Tyson for reviewing this document and serving on
    the defense committee.
  • Other people that have contributed to the
    CryptoManiac project include Chris Weaver for
    hardware design and synthesis support, Jerome
    Burke and John McDonald for earlier versions of
    ISA extensions code modifications.
Write a Comment
User Comments (0)
About PowerShow.com