Unified Architectures for Efficient and Compact CryptoProcessing - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Unified Architectures for Efficient and Compact CryptoProcessing

Description:

Radix-8 multiplier outperforms radix-2 multiplier more than 3 times when the ... Dual-Radix Multiplier. Three multipliers. A1: GF(p)-only multiplier ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 44
Provided by: ipam
Category:

less

Transcript and Presenter's Notes

Title: Unified Architectures for Efficient and Compact CryptoProcessing


1
Unified Architectures for Efficient and Compact
Crypto-Processing
  • Erkay Savas
  • Sabanci University

2
Outline
  • Research Motivation
  • Public Key Cryptography
  • Unified Arithmetic
  • High-Radix Multiplication
  • Dual-Radix Multiplication
  • Support for GF(3n) Arithmetic
  • Implementation Results
  • Future Research

3
Motivation
  • Compatibility
  • support for fast arithmetic in different finite
    fields and groups
  • Saving in Area
  • Improve time ? area metric
  • Algorithm Agility
  • NTRU ? ECC

4
Public Key Cryptography (PKC)
  • Each user has a pair of keys
  • Private Key - known only to the owner
  • Public Key - known to everyone in the systems
    with assurance
  • Encryption
  • Encryption with the Public Key of the receiver
  • Decryption
  • Only the receiver can decrypt the message by
    her/his Private Key

5
Public Key Cryptography in Use
  • RSA, Rabins scheme
  • Integer factorization, Square root of modulo a
    composite number
  • Discrete Logarithm Based Algorithms
  • Diffie-Helman Key Exchange, El Gamal
  • Elliptic curve DH Key Exchange, ECDSA
  • Discrete logarithm over elliptic curves
  • IBE
  • pairings over elliptic curve points

6
RSA
  • Most popular PKC
  • Invented by Rivest/Shamir/Adleman in 1977 at MIT.
  • Its patent expired in 2000.
  • Based on Integer Factorization problem
  • Each user has public and private key pair.

7
RSA Encryption Decryption
  • Encryption done by using public key
  • y ? xe mod n, where x, y lt n
  • Decryption done by using private key
  • x ? yd mod n

8
DL Based Cryptosystems
  • Fundamental operation
  • gx mod p, where x, g lt p and g is primitive

9
Elliptic Curve Cryptography 1/2
  • Emerging public key cryptography standard for
    constrained devices.
  • 160 bit key length is equivalent in cryptographic
    strength to 1024-bit RSA.
  • 313 bit ECC is equivalent to 4096 bit RSA
  • As algebraic/geometric entities have been studied
    extensively for the past 150 years.
  • Rich and deep theory suitable to cryptography
  • First proposed for cryptographic usage in 1985
    independently by Neal Koblitz and Victor Miller

10
Elliptic Curve Cryptography 2/2
  • Dominant fundamental operations
  • Multiplication in GF(q) where q pk and p is
    prime
  • Alternatives
  • GF(p) k 1
  • GF(2k) p 2
  • GF(pk)
  • GF(3k) p 3

11
Identity Based Encryption (IBE)
  • Public key can be any string
  • e-mail address, name, etc.
  • No need for certificates
  • Anonymity achieved
  • users can choose any public key without revealing
    their ID
  • It can easily change it

12
IBE Bilinear Mapping
  • e(xP, yQ) e(P, Q)xy e(yP, xQ) g
  • g is in an (extension of) the underlying field.
  • Bilinear mapping over elliptic curves
  • Weil pairing
  • Tate pairing
  • Resource consuming
  • Most efficient bilinear mappings
  • defined on curves over GF(3k)

13
An Introduction to UnifiedArithmetic
  • Types of finite fields are heavily used
  • Prime fields, GF(p)
  • Binary extension fields, GF(2k)
  • Ternary extension fields GF(3k) (recently, due to
    IBE schemes)
  • These finite fields feature dissimilar properties
  • Different implementations on specialized hardware

14
Unified Arithmetic
  • Unified hardware design methodology requires
  • A single (unified) datapath
  • A single (unified) control
  • Insignificant overhead in the area
  • Insignificant overhead in the time complexity
    (e.g. critical path delay)
  • Good time?area metric

15
Unified Arithmetic (GF(p) GF(2k))
  • A unified hardware design methodology for both
    field is possible since
  • the elements of either field are represented
    using almost the same data structures in digital
    systems
  • the algorithms for basic arithmetic operations in
    both fields have structural similarities (i.e.
    the steps of the algorithms are almost identical)
  • Hence, eventually unified arithmetic is possible

16
Finite Field Operations in ECC
  • Addition in GF(p) and GF(2k)
  • Relatively inexpensive in area and time
    complexity
  • Multiplicative inversion in GF(p) and GF(2k)
  • Prohibitively expensive in terms of time
  • Possible to avoid some of them
  • Multiplication in GF(p) and GF(2k)
  • Expensive in terms of time and area
  • Usually most important operation
  • Our focus

17
Montgomery Multiplication
  • Very efficient way of doing multiplication in
    GF(p) and GF(2k) (now also in GF(3k))
  • Faster (replaces division by shifts)
  • Suitable for unified design
  • Suitable for scalable design
  • Highly parallel
  • Suitable for pipelining

18
Montgomery Multiplication
  • Definition
  • Given a, b ? GF(p), MonMul(a, b) abR-1 mod p,
    where R 2k mod p and k ?log2p?.
  • Algorithm
  • c 0
  • for i 0 to k-1
  • c (c ai b)
  • c (c c0 p)/2
  • if c gt p then c c-p (final subtraction)

19
Algorithm for GF(2k)
  • Input a(x), b(x) ? GF(2k), p(x) and k
  • Output c(x) a(x)b(x)xk? GF(2k)
  • c(x) 0
  • for i 0 to k-1
  • c(x) (c(x) ? ai b(x))
  • c(x) (c(x) ? c0 p(x))/x
  • No final subtraction
  • Note that
  • c/2 and c(x)/x are implemented in an identical
    way in SW and HW

20
Representation
  • Addition
  • Atomic operation multiplication is performed as
    a repeated addition
  • Unified addition
  • most efficient when carry-save representation is
    used for elements of GF(p)
  • Carry-save representation
  • an integer is represented as the sum of two other
    integers
  • x xs xc (sum and carry parts, resp.)

21
Scalability
  • Original Montgomery multiplication algorithm
    performs full-precision integer additions
  • Not scalable
  • Instead,
  • long integers are divided into words
  • Addition of words are handled separately on word
    adders.
  • Choice of word length depends on the precision,
    area and speed requirements

22
Word-Based Multiplication
ai
PUi
c(j)0
c(j)w-1
c(j1)0
c(j1)w-1
c(j)1
c(j1)1
c(j)
23
Dependency Graph
24
Processing Unit (PU) with w2
C1(j)
C0(j)
25
Dual-Field Adder (DFA) 1/2
  • Almost identical to a full-adder (FA)
  • Difference
  • it has and additional (control) input (FSEL)
    which suppress the carry output of the adder when
    it is set to logic-0
  • Namely, when FSEL 0 then the adder operates in
    GF(2k), otherwise it becomes a regular FA

26
DFA 2/2
B
S
A
C
FSEL
Cout
27
Pipeline Organization with two PUs
s the number of PUs
28
Total Computation Time (in clock cycles)
w word size, k precision, e ?k/w?, s the
number of PUs
29
Example Execution Times
  • Example k 1024, w 32
  • s 17 ? T 2105
  • s 15 ? T 2305
  • s 10 ? T 3415
  • s 1 ? T 33792
  • Example k 2048, w 32
  • s 33 ? T 4221
  • s 30 ? T 4543
  • s 10 ? T 13343
  • s 1 ? T 133120

30
Comparison to the single-field (GF(p)) design
w word size 1.2 ?m CMOS technology
31
Design Alternatives
  • Higher Radix
  • Original design is radix 2
  • Namely, multiplier bits are scanned one bit in
    each clock cycle
  • Possible to scan two or more bits of the
    multiplier a
  • Radix-4 two bits
  • Radix-8 three bits
  • More Complex Design lower clock frequency,
    higher area
  • Less clock cycle count ? Faster execution of
    multiplication

32
Comparison
  • Higher radix vs. single radix
  • Metric
  • area ? time
  • For small total area (i.e. lt10000 equivalent NAND
    gates) the performances of radix-2 and radix-8
    are comparable
  • Radix-8 multiplier outperforms radix-2 multiplier
    more than 3 times when the total area is around
    25000 NAND gates

33
Dual-Radix Multiplier
  • Radix-2 for GF(p) and radix-4 for GF(2k)

34
Dual-Radix Multiplier
  • Three multipliers
  • A1 GF(p)-only multiplier
  • A2 single-radix unified multiplier (with
    precomp.)
  • A3 dual-radix multiplier
  • Performance (area ? time)
  • A3 performs slightly worse than A1 and A2
    (between 7 to 19) in GF(p) mode
  • A3 outperforms A2 by 38 to 46 in GF(2k)-mode

35
Unified Arithmetic?
  • Unified multiplier
  • carry-save adders used in multiplier
  • It is not easy to perform other arithmetic
    operations with carry-save representation such as
    subtraction and comparison (essential in
    inversion)

36
New Redundant Representation
  • Recall
  • Carry-save representation
  • X xs xc.
  • New redundant representation
  • Redundant signed representation (RSD)
  • X xp - xn.
  • Subtraction is equivalent to the addition
  • X-Y (xp - xn) - (yp - yn) (xp - xn) (yn -
    yp)
  • Comparison is relatively easy

37
RSD
  • All previous multipliers require a reverse
    transformation to non-redundant for after each
    multiplication
  • There are thousands multiplication in ECC
  • With RSD, all the computation can be done in RSD
    form without any reverse transformation
  • a single transformation is necessary if the
    result is needed in non-redundant form.

38
Support for GF(3n) Arithmetic
  • RSD lends itself to a unified arithmetic
    architecture that efficiently supports GF(3n)
    arithmetic

39
Analysis
  • A1 GF(p)-only architecture
  • A2 GF(2k)-only architecture
  • A3 GF(3n)-only architecture
  • A4 Unified architecture (GF(p) GF(2k))
  • A5 Unified architecture (GF(p) GF(2k)
    GF(3n))
  • A1 A2 Hypothetical architecture that has
    separate datapath for GF(p) and GF(2k)

40
Analysis
  • Metric area ? time
  • A4 over A1 A2 7.94
  • A5 over A1 A2 A3 33.54
  • A5 over A4 A3 28.36

41
Implementation Results
  • 2.38 GHz, 0.13 ?m CMOS
  • 4 PUs ? 11,000, 8 PUs ? 15,000 NAND gates

42
Research Directions
  • Embed the unified architectures into common
    general-purpose processors
  • Unified inversion using RSD
  • Unified architectures for other PKC

43
Ending
  • Questions
  • Contact
  • Erkay Savas
  • erkays_at_sabanciuniv.edu
  • http//people.sabanciuniv.edu/erkays
Write a Comment
User Comments (0)
About PowerShow.com