Exploring and Exploiting Wire-Level Pipelining in Emerging Technologies - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Exploring and Exploiting Wire-Level Pipelining in Emerging Technologies

Description:

QCA Device and Circuits. QCA Clock. Simple 12 Dataflow. Simple 12 One-Hot State Machine ... International Conference on Electronics, Circuits and Systems, 1999. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 32
Provided by: liuda
Learn more at: http://www.cs.ucf.edu
Category:

less

Transcript and Presenter's Notes

Title: Exploring and Exploiting Wire-Level Pipelining in Emerging Technologies


1
Exploring and Exploiting Wire-Level Pipelining in
Emerging Technologies
  • Danzhou Liu

2
Outline
  • Introduction
  • QCA Device and Circuits
  • QCA Clock
  • Simple 12 Dataflow
  • Simple 12 One-Hot State Machine
  • LIFO Controller
  • Primitive Architecture
  • Future work

3
Introduction
  • Pipelining is considered fundamental by computer
    architects, and has been with us for 40 years.
  • Various techniques is currently used for
    pipelining and multi-threading, but all are
    gate-level
  • CMOS limit 0.05 microns (50 nm)
  • New Technology Quantum Cellular Automata (QCA)

4
QCA Device and Circuits
  • Why should we explore QCA?
  • Inherent self-latching in the circuit
  • Processing-in-wire
  • Free multi-threading
  • Brings the level of pipelining down below the
    gate level to the cell level
  • Research being done to build QCA using chemical
    molecules (currently built using metal dots)
  • Allows for two-order of magnitude density
    increase

5
QCA Cell
  • Uses electrons in cells to store and transmit
    data
  • Electrons move between different positions via
    electron tunneling
  • Logic functions performed by Coulumbic
    interactions

6
QCA Cell-Cell Response Function
Note The two cells are separated by 60nm. This
shows the polarization P1 induced in cell 1 by
the fixed polarization of its neighbor P2.
7
QCA Cells
a) The standard cells
Binary 0
Binary 1
b) The rotated cells
Rotated cells are used for crossing wires and
inverting (0 1)
8
Some Ground States (Minimum Coulombic Repulsion)
9
QCA Logic Gates
  • NOT gate

1
0
0
1
10
QCA Logic Gates (continued)
  • The majority voting logic

0
0
0
1
0
1
1
1
11
QCA Logic Gates (continued)
(c)
0
0
1
0
12
QCA Logic Gates (continued)
  • The programmable AND/OR gate

13
Crossing of Two QCA Wires
rotated cells
Note The state of a standard cell has no
switching effect on a rotated cell directly in
line with it. Similarly, the state of a rotated
cell has no effect on the sate of a normal cell
in line with it.
14
Crossing of Two QCA Wires(continued)
1
1
0
0
1
1
A
B
1
1
15
The Exclusive OR Gate
A
a.)
1
0
0
C
0
0
0
B
1
b.)
0
16
The CNOT Gate
a.)
1
b.)
0
1
1
0
0
0
0
0
1
17
Single-bit Full Adder
Ci-1
0
0
1
0
0
A
1
1
S
1
0
1
0
1
0
0
0
1
1
1
1
Ci
B
18
QCA Clock
  • Clocking not performed for individual cells
    rather, clocking provided for zones, which
    consist of multiple QCA cells
  • Four phase clock vs. two-phase clock for CMOS
  • Switch Phase (Phase 1)
  • Cells begin unpolarized, interdot potential
    barriers low
  • Cells become polarized according to state of
    driver
  • Actual computation occurs in this phase
  • Barriers raised to lock cell states
  • Hold Phase (Phase 2)
  • Barriers held high
  • Outputs of subarray used as inputs to next stage

19
QCA Clock (continued)
  • Release Phase (Phase 3)
  • Barriers lowered
  • Cells return to unpolarized state
  • Relax Phase (Phase 4)
  • Barriers remain lowered, cells remain unpolarized
  • This scheme provides for inherent self-latching
  • Cells in one zone perform certain calculation
  • State frozen by raising of barriers
  • Successor zone does not influence calculation
  • Physically neighboring zones receive temporally
    neighboring clocking phases
  • No need for explicit latches to hold values

20
QCA Clock (continued)
  • Figure (a) shows a physical 5 cell wire
  • Figure (b) shows a value propagating down the wire

21
Simple 12 Dataflow
Processing-in-wire
22
Simple 12 Dataflow (continued)
  • Processing-in-wire
  • Zero A Logic is placed directly after output of
    ALU
  • A input to ALU is zeroed on the way back to ALU
    input
  • Useful computation being performed for free
  • How is it free?
  • Feedback wire is spread out over the clocking
    zones back to the input
  • Even if there was no logic in feedback path,
    signal would traverse the same clocking zones

23
Simple 12 Dataflow (continued)
  • Problems with Simple 12 Dataflow
  • Longest wire is spread out over 16 clocking zones
  • Even a simple calculation will take 4 clock
    cycles to finish
  • Probability that cell switches successfully
    decreases proportionally to the distance a
    particular cell is from a frozen input at
    beginning of wire
  • Solution
  • Reduce wire length
  • Spread long wires over several clocking zones
  • Advantage of several clocking zones in long wires
    free multithreading

24
Simple 12 Dataflow (continued)
  • Free Multithreading
  • Inherent self-latching allows multiple
    computations to execute simultaneously
  • The ?s in above figure represent potential
    different threads in different clocking zones
  • Possible to execute 4 simultaneous instructions
  • Size
  • QCA design offers at least an order of magnitude
    area density increase over equivalent CMOS design
    when scaled to 0.05 micron
  • With molecular dots, area density increases three
    orders of magnitude

25
Simple 12 One-Hot State Machines
  • Two execute feedback paths because both stopped
    and ifetch states depend on this state
  • Problem
  • Execute wire is broken up over 4 clocking zones
  • Information from previous state will not be
    available in time for next set of inputs to use
    to determine state transition

26
Simple 12 One-Hot State Machines (continued)
  • Solution Feedback path consists of one clocking
    zone as shown below
  • Problem Do we still have relative correctness?
  • Relative correctness output of state machine
    correct relative to time of execution

27
Simple 12 One-Hot State Machines (continued)
  • Solution Keep feedback wire over 4 clocking
    zones, but add wires to inputs.
  • Next input will take an extra cycle to reach
    state machine combinational logic
  • Will take one extra clock cycle to process
    information, but multithreading will ensure
    sustained performance

28
A Primitive Architecture
  • Control logic delay B
  • Zero and negative flag feedback path delay D
  • All delays will affect degree of multithreading
    possible
  • Control signals should be pipelined with dataflow
    bits so that they arrive at appropriate logic at
    appropriate time
  • B and D should be equalized as much as possible
  • Datapath can be folded onto itself to reduce
    delays

29
Future Work
  • Exploring new ways to exploit processing-in-wire
    and multithreading
  • How QCA can be used in memory
  • Build QCA devices using chemical molecules rather
    than metal dots for increased area density

30
References
  • Michael T. Niemier and Peter M. Kogge, Exploring
    and Exploiting Wire-Level Pipelining in Emerging
    Technologies, International Symposium of Computer
    Architecture, Sweden, July 2001. pp. 166-177
  • P. Tougaw and C. Lent. Logical devices
    implemented using quantum cellular automata.
    Journal of Applied Physics, 751818, 1994.
  • C. S. Lent and P. D. Tougaw. A device
    architecture for computing with quantum dots.
    Proceedings of the IEEE, 85541, 1997.
  • M. Niemier and P. Kogge. Logic in wire Using
    quantum dots to implement a microprocessor. In
    Proceedings of 6th International Conference on
    Electronics, Circuits and Systems, 1999.
  • M. Niemier, M. Kontz, and P. Kogge. A design of
    and design tools for a novel quantum dot based
    microprocessor. In Proceedings of the 27th Design
    Automation Conference, pages 227232, 2000.

31
QA
  • Thanks
Write a Comment
User Comments (0)
About PowerShow.com