Title: Exploring and Exploiting Wire-Level Pipelining in Emerging Technologies
1Exploring and Exploiting Wire-Level Pipelining in
Emerging Technologies
2Outline
- Introduction
- QCA Device and Circuits
- QCA Clock
- Simple 12 Dataflow
- Simple 12 One-Hot State Machine
- LIFO Controller
- Primitive Architecture
- Future work
3Introduction
- Pipelining is considered fundamental by computer
architects, and has been with us for 40 years. - Various techniques is currently used for
pipelining and multi-threading, but all are
gate-level - CMOS limit 0.05 microns (50 nm)
- New Technology Quantum Cellular Automata (QCA)
4QCA Device and Circuits
- Why should we explore QCA?
- Inherent self-latching in the circuit
- Processing-in-wire
- Free multi-threading
- Brings the level of pipelining down below the
gate level to the cell level - Research being done to build QCA using chemical
molecules (currently built using metal dots) - Allows for two-order of magnitude density
increase
5QCA Cell
- Uses electrons in cells to store and transmit
data - Electrons move between different positions via
electron tunneling - Logic functions performed by Coulumbic
interactions
6QCA Cell-Cell Response Function
Note The two cells are separated by 60nm. This
shows the polarization P1 induced in cell 1 by
the fixed polarization of its neighbor P2.
7QCA Cells
a) The standard cells
Binary 0
Binary 1
b) The rotated cells
Rotated cells are used for crossing wires and
inverting (0 1)
8Some Ground States (Minimum Coulombic Repulsion)
9QCA Logic Gates
1
0
0
1
10QCA Logic Gates (continued)
- The majority voting logic
0
0
0
1
0
1
1
1
11QCA Logic Gates (continued)
(c)
0
0
1
0
12QCA Logic Gates (continued)
- The programmable AND/OR gate
13Crossing of Two QCA Wires
rotated cells
Note The state of a standard cell has no
switching effect on a rotated cell directly in
line with it. Similarly, the state of a rotated
cell has no effect on the sate of a normal cell
in line with it.
14Crossing of Two QCA Wires(continued)
1
1
0
0
1
1
A
B
1
1
15The Exclusive OR Gate
A
a.)
1
0
0
C
0
0
0
B
1
b.)
0
16The CNOT Gate
a.)
1
b.)
0
1
1
0
0
0
0
0
1
17Single-bit Full Adder
Ci-1
0
0
1
0
0
A
1
1
S
1
0
1
0
1
0
0
0
1
1
1
1
Ci
B
18QCA Clock
- Clocking not performed for individual cells
rather, clocking provided for zones, which
consist of multiple QCA cells - Four phase clock vs. two-phase clock for CMOS
- Switch Phase (Phase 1)
- Cells begin unpolarized, interdot potential
barriers low - Cells become polarized according to state of
driver - Actual computation occurs in this phase
- Barriers raised to lock cell states
- Hold Phase (Phase 2)
- Barriers held high
- Outputs of subarray used as inputs to next stage
19QCA Clock (continued)
- Release Phase (Phase 3)
- Barriers lowered
- Cells return to unpolarized state
- Relax Phase (Phase 4)
- Barriers remain lowered, cells remain unpolarized
- This scheme provides for inherent self-latching
- Cells in one zone perform certain calculation
- State frozen by raising of barriers
- Successor zone does not influence calculation
- Physically neighboring zones receive temporally
neighboring clocking phases - No need for explicit latches to hold values
20QCA Clock (continued)
- Figure (a) shows a physical 5 cell wire
- Figure (b) shows a value propagating down the wire
21Simple 12 Dataflow
Processing-in-wire
22Simple 12 Dataflow (continued)
- Processing-in-wire
- Zero A Logic is placed directly after output of
ALU - A input to ALU is zeroed on the way back to ALU
input - Useful computation being performed for free
- How is it free?
- Feedback wire is spread out over the clocking
zones back to the input - Even if there was no logic in feedback path,
signal would traverse the same clocking zones
23Simple 12 Dataflow (continued)
- Problems with Simple 12 Dataflow
- Longest wire is spread out over 16 clocking zones
- Even a simple calculation will take 4 clock
cycles to finish - Probability that cell switches successfully
decreases proportionally to the distance a
particular cell is from a frozen input at
beginning of wire - Solution
- Reduce wire length
- Spread long wires over several clocking zones
- Advantage of several clocking zones in long wires
free multithreading
24Simple 12 Dataflow (continued)
- Free Multithreading
- Inherent self-latching allows multiple
computations to execute simultaneously - The ?s in above figure represent potential
different threads in different clocking zones - Possible to execute 4 simultaneous instructions
- Size
- QCA design offers at least an order of magnitude
area density increase over equivalent CMOS design
when scaled to 0.05 micron - With molecular dots, area density increases three
orders of magnitude
25Simple 12 One-Hot State Machines
- Two execute feedback paths because both stopped
and ifetch states depend on this state - Problem
- Execute wire is broken up over 4 clocking zones
- Information from previous state will not be
available in time for next set of inputs to use
to determine state transition
26Simple 12 One-Hot State Machines (continued)
- Solution Feedback path consists of one clocking
zone as shown below - Problem Do we still have relative correctness?
- Relative correctness output of state machine
correct relative to time of execution
27Simple 12 One-Hot State Machines (continued)
- Solution Keep feedback wire over 4 clocking
zones, but add wires to inputs. - Next input will take an extra cycle to reach
state machine combinational logic - Will take one extra clock cycle to process
information, but multithreading will ensure
sustained performance
28A Primitive Architecture
- Control logic delay B
- Zero and negative flag feedback path delay D
- All delays will affect degree of multithreading
possible - Control signals should be pipelined with dataflow
bits so that they arrive at appropriate logic at
appropriate time - B and D should be equalized as much as possible
- Datapath can be folded onto itself to reduce
delays
29Future Work
- Exploring new ways to exploit processing-in-wire
and multithreading - How QCA can be used in memory
- Build QCA devices using chemical molecules rather
than metal dots for increased area density
30References
- Michael T. Niemier and Peter M. Kogge, Exploring
and Exploiting Wire-Level Pipelining in Emerging
Technologies, International Symposium of Computer
Architecture, Sweden, July 2001. pp. 166-177 - P. Tougaw and C. Lent. Logical devices
implemented using quantum cellular automata.
Journal of Applied Physics, 751818, 1994. - C. S. Lent and P. D. Tougaw. A device
architecture for computing with quantum dots.
Proceedings of the IEEE, 85541, 1997. - M. Niemier and P. Kogge. Logic in wire Using
quantum dots to implement a microprocessor. In
Proceedings of 6th International Conference on
Electronics, Circuits and Systems, 1999. - M. Niemier, M. Kontz, and P. Kogge. A design of
and design tools for a novel quantum dot based
microprocessor. In Proceedings of the 27th Design
Automation Conference, pages 227232, 2000.
31QA