Title: Physical Computing Theory, Ultimate Models, and the Tight Church
1Physical Computing Theory, Ultimate Models, and
the Tight Churchs Thesis A More Accurate
Complexity Theory for Future Nanocomputing Dr.
Mike FrankUniversity of FloridaCISE
Departmentmpf_at_cise.ufl.edu
- Presented to
- Algorithms Theory ClubWed., May 1, 2002
2Source ITRS 99
3½CV2 based on ITRS 99 figures for Vdd and
minimum transistor gate capacitance. T300 K
4Physical Computing Theory
- The study of theoretical models of computation
that are based on (or closely tied to) physics - Make no nonphysical assumptions!
- Includes the study of
- Fundamental physical limits of computing
- Physically-based models of computing
- Includes reversible and/or quantum models
- Ultimate (asymptotically optimal) models
- An asymptotically tight Churchs thesis
- Model-independent basis for complexity theory
- Basis for design of future nanocomputer
architectures - Asymptotic scaling of architectures algorithms
- Physically optimal algorithms
5Ultimate Models of Computing
- We would like models of computing that match the
real computational power of physics. - Not too weak, not too strong.
- Most traditional models of computing only match
physics to within polynomial factors. - Misleading asymptotic performance of algorithms.
- Not good enough to form the basis for a real
systems engineering optimization of
architectures. - Develop models of computing that are
- As powerful as physically possible on all
problems - Realistic within asymptotic constant factors
6Scalability Maximal Scalability
- A multiprocessor architecture accompanying
performance model is scalable if - it can be scaled up to arbitrarily large
problem sizes, and/or arbitrarily large numbers
of processors, without the predictions of the
performance model breaking down. - An architecture ( model) is maximally scalable
for a given problem if - it is scalable and if no other scalable
architecture can claim asymptotically superior
performance on that problem - It is universally maximally scalable (UMS) if it
is maximally scalable on all problems! - I will briefly mention some characteristics of
architectures that are universally maximally
scalable
7Universal Maximum Scalability
- Existence proof for universally maximally
scalable (UMS) architectures - Physics itself is a universal maximally scalable
architecture because any real computer is
merely a special case of a physical system. - Obviously, no real computer can beat the
performance of physical systems in general. - Unfortunately, physics doesnt give us a very
simple or convenient programming model. - Comprehensive expertise at programming physics
means mastery of all physical engineering
disciplines chemical, electrical, mechanical,
optical, etc. - Wed like an easier programming model than this!
8Physics Constrains the Ultimate Model
9Simple UMS Architectures
- (I propose) any practical UMS architecture will
have the following features - Processing elements characterized by constant
parameters (independent of of processors) - Mesh-type message-passing interconnection
network, arbitrarily scalable in 2 dimensions - w. limited scalability in 3rd dimension.
- Processing elements that can be operated in a
highly reversible mode, at least up to some
limit. - Enables improved 3-d scalability, in a limited
regime - (In long term) Have capability for
quantum-coherent operation, for extra perf. on
some probs.
10Ideally Scalable Architectures
Conjecture A 2- or 3-D mesh multiprocessor with
a fixed-size memory hierarchy per node is an
optimal scalable computer systems design (for any
application).
Processing Node
Processing Node
Processing Node
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Processing Node
Processing Node
Processing Node
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Mesh interconnection network
11Tight Churchs Thesis
- Conjecture 3-d meshes of some variety of
fixed-size reversible/quantum processing element
gives the same or lesser asymptotic complexity
(within a constant factor) for all problem
classes as any special-purpose physical mechanism
for solving the given problem class.
12Reversibility of Physics
- The universe is (apparently) a closed system
- Closed systems evolve via unitary transforms
- Apparent wavefunction collapse doesnt contradict
this (confirmed by work of Everett, Zurek, etc.) - Time-evolution of concrete state of universe (or
closed subsystems) is reversible - Invertible (bijective)
- Deterministic looking backwards in time
- Total info. (log of poss. states) doesnt
decrease - Can increase, though, if volume is increasing
- Information cannot be destroyed!
13Illustrating Landauers principle
Before bit erasure
After bit erasure
s0
0
0
s0
Nstates
sN-1
0
0
sN-1
Unitary(1-1)evolution
2Nstates
s0
sN
1
0
Nstates
0
sN-1
s2N-1
1
14Benefits of Reversible Computing
- Reduces energy/cooling costs of computing
- Improves performance per unit power consumed
- Given heat flux limits in the cooling system,
- Improves performance per unit convex hull area
- A faster machine in a given size box.
- For communication-intensive parallel algorithms,
- Improves performance, period!
- All these benefits are by small polynomial
factors in the integration scale the device
properties.
15Reversible/Adiabatic CMOS
- Chips designed at MIT, 1996-1999
16(No Transcript)
17Minimum Losses w. Leakage
Etot Eadia Eleak
Eleak Pleaktr
Eadia cE / tr
18Systems Engineering (my defn)
- The interdisciplinary study of the design of
complex systems in which multiple areas of
engineering may interact in nontrivial ways. - Optimizing a complete system will in general
require considering the effect of concerns in
different engineering disciplines on each other. - E.g., simultaneous consideration of
- Mechanical (structural dynamic) engineering
- Thermal/power engineering
- Electrical/electronic/photonic engineering
- Algorithmic/software engineering
- Economic/social/financial engineering?
19Cost-Efficiency
- The primary concern of systems engineering.
- Cost most generally can be any appropriate
measure of resources consumed. - The cost-efficiency to achieve a task is the
fraction min/ of cost that had to be spent. - Goal When designing a system to accomplish a
given task, choose the design that minimizes the
cost (thus maximizes cost-efficiency). - Include design cost, amortized over expected
number of reuses of the design.
20Two-pass system optimization
- A general methodology for the integrated
optimization of the design of complex systems. - Performance characteristicsof system expressed
as afunction of design parameters,
subsystemcharacteristics. - Then, optimize designparameters from top downto
maximize overall systemcost-efficiency.
Top-levelsystems design
Optimizedesignparametersfrom topdownwards
High-levelsubsystems
Characterizecost-efficiency from bottomupwards
Mid-levelcomponents
Lowest-leveldesign elements
21Computer Systems Engineering (CSE)
- General systems engineering principles applied to
the design of computer systems. - E.g. Electronic, algorithmic, thermal, and
communications concerns interact when optimizing
massively parallel computers for some problems - When looking ahead to the cross-disciplinary
interactions that become more important for
bit-devices at the nanometer scale, - Call the subject nanocomputer systems
engineering (NCSE) - This is what I do.
22Cost-Efficiency
- Cost-efficiency of anything is min/,
- The fraction of actual cost that really needed
to be spent to get the thing, using the best
poss. method. - Measures the relative number of instances of the
thing that can be accomplished per unit cost, - compared to the maximum number possible
- Inversely proportional to cost .
- Maximizing means minimizing .
- Regardless of what min actually is.
- In computing, the thing is a computational task
that we wish to be carried out.
23Components of Cost
- The cost of a computation may be a sum of terms
for many different components - Time cost
- Cost to user of having to wait for results
- E.g., missing deadlines, incurring penalties.
- May increase nonlinearly with time for long
times. - Spacetime-related costs
- Cost of raw physical spacetime occupied by
computation. - Cost to rent the space.
- Cost of hardware (amortized over its lifetime)
- Cost of raw mass-energy, particles, atoms.
- Cost of materials, parts.
- Cost of assembly.
- Cost of parts/labor for operation maintenance.
24More cost components
- Continued...
- Area-time costs
- Cost to rent portion of an enclosing convex hull
for getting things in out of the system - Energy, heat, information, people, materials,
entropy. - Some examples
- Chip area, power level, cooling capacity, I/O
bandwidth, desktop footprint, floor space, real
estate, planetary surface - Area-time costs scale with the maximum number of
items that can be sent/received. - Energy expenditure costs
- Cost of raw free energy expenditure (entropy
generation). - Cost of energy-delivery system. (Amortized.)
- Cost of cooling system. (Amortized.)
25General Cost Measures
- The most comprehensive cost measure includes
terms for all of these potential kinds of costs. - comprehensive Time SpaceTime AreaTime
FreeEnergy - Time is an non-decreasing function
f(?tstart?end) - Simple model Time ? ?tstart?end
- FreeEnergy is most generally
- Simple model FreeEnergy ? ?Sgenerated
- SpaceTime and AreaTime are most generally
- Simple model
- SpaceTime ? Space ? Time
- AreaTime ? Area ? Time
Max ops thatcould be done
Max items thatcould be I/Od
26Generalized Amdahls Law
- Given any cost that is a sum of components, tot
1 n, - There are diminishing proportional returns to be
gained from reducing any single cost component
(or subset of components) to much less than the
sum of the remaining components. - Optimization effort should focus on the cost
components that are most significant in the
application of interest. - At a design equilibrium, all cost components
will be roughly equal (unless externally driven)
27Reversible vs. Irreversible
- Want to compare their cost-efficiency under
various cost measures - Time
- Entropy
- Area-time
- Spacetime
- Note that space (volume, mass, etc.) by itself as
a cost measure is only significant if either - (a) The computer isnt reusable so the cost to
build it dominates operating costs. - (b) I/O latency ? V1/3 affects other costs.
Or, for some applications,one quantity might be
minimizedwhile another one (space, time,
area)is constrained by some hard limit.
28Time Cost Comparison
- For computations with unlimited power/cooling and
no communication requirements - Reversible worse than irreversible by a factor of
sgt1 (adiabatic slowdown factor), times maybe a
small constant depending on logic style
used. r,Time ? i,Time s
29Time Cost Comparison, cont.
- For parallelizable power-limited applications
- With nonzero leakage r,Time ? i,Time /
Ron/offg - Worst-case computations g ? 0.4
- Best-case computations g 0.5.
- For parallelizable area-limited,
entropy-flux-limited, best case applications - with leakage ? 0 r,Time ? i,Time d 1/2
- where d is systems physical diameter.
30Time cost comparison, cont.
- For entropy-flux limited, parallel, heavily
communication-limited, best case applications - with leakage approaching 0 r,Time ? i,Time3/4
- where i,Time scales up with the space
requirement V as i,Time ? V2/9 - so the reversible speedup scales with the 1/18
power of system size.
31Reversible Emulation - Ben89
k 2n 3
k 3n 2
32Bennett 89 alg. is not optimal
k 2n 3
k 3n 2
Just look at all the spacetime it wastes!!!
33Parallel Frank02 algorithm
- We can simply squish the triangles closer
together to eliminate the wasted spacetime! - Resulting algorithm is linear time for all n and
k and dominates Ben89 for time, spacetime,
energy!
k3n2
k2n3
Emulated time
k4n1
Real time
34Setup for Analysis
- For energy-dominated limit,
- let cost equal energy.
- c energy coefficient, r r(min) leakage
power - i energy dissipation per irreversible
state-change - Let the on/off ratio Ron/off r(max)/r(min)
Pmax/Pmin. - Note that c ? itmin i (i / r(max)),
so r(max) ? i2/c - So Ron/off ? i2 / cr(min) i2 / cr
35Time Taken
- There are n levels of recursion.
- Each multiplies the width of the base of the
triangle by k. - Lowest-level triangles take time ctop.
- Total time is thus ctopkn.
k4n1
Width 4 sub-units
36Number of Adiabatic Ops
- Each triangle contains k (k ? 1) 2k ? 1
immediate sub-triangles. - There are n levels of recursion.
- Thus number of adiabatic ops is c(2k ? 1)n
k3n2
52 25little triangles(adiabaticoperations)
37Spacetime Usage
- Each triangle includes the spacetime usage of all
k ? 1 of its subtriangles, - Plus,additional spacetime units, each
consisting of 1 storage unit, for time
topkn?1
k5n1
1 state of irrev. mach. Being stored
1
2
Time top kn-1
3
Resulting recurrence relationST(k,0) 1 (or
c)ST(k,n) (2k?1)ST(k,n?1) (k2?3k2)kn?1/2
123 units
38Reversible Cost
- Adiabatic cost plus spacetime cost r a r
(2k-1)nc/t ST(k,n)rt - Minimizing over t gives r 2(2k-1)n
ST(k,n) c r1/2 - But, in energy-dominated limit, c r ? i2 /
Ron/off, - So r 2i (2k-1)n ST(k,n) / Ron/off1/2
39Tot. Cost, Orig. Cost, Advantage
- Total cost i for irreversible operation
performed at end of algorithm, plus reversible
cost, gives tot i 1 2(2k-1)n
ST(k,n) / Ron/off1/2 - Original irreversible machine performing kn ops
would use cost orig ikn, so, - Advantage ratio between reversible irreversible
cost,
40Optimization Algorithm
- For any given value on Ron/off,
- Scan the possible values of n (up to some limit),
- For each of those, scan the possible values of k,
- Until the maximum R(i/r) for that n is found
- (the function only has a single local maximum)
- And return the max R(i/r) over all n tried.
41Spacetime blowup
Energy saved
k
n
42Asymptotic Scaling
- The potential energy savings factor scales as
R(i/r) ? Ron/off0.4, - while the spacetime overhead goes only as
R(i/r) ? R(i/r)0.45, or Ron/off0.18. - E.g., with an Ron/off of 109, you can do
worst-case computation in an adiabatic circuit
with - An energy savings of up to a factor of 1,200 !
- But, this point is 700,000 less
hardware-efficient!
43Various Cost Measures
- Entropy - advantage as per previous analysis
- Area times time - scales w. entropy generated
- Performance, given area constraint -
- In leakage free-limit, advantage proportional to
d1/2 - With leakage, whats the max advantage? (See hw)
- NOW
- Are there any performance/cost advantages from
adiabatics even when there is no cost or
constraint to entropy or to area? - YES, for flux-limited computations that require
communications. Lets see why
44Perf. scaling w. of devices
- If alg. is not limited by communications needs,
- Use irreversible processors spread in a 2-D
layer. - Remove entropy along perpendicular dimension.
- No entropy rate limits,
- so no speed advantage from reversibility.
- If alg. requires only local communication,latency
? cyc. time, in an NDNDND mesh, - Leak-free reversible machine perf. scales better!
- Irreversible tcyc ?(ND1/3)
- Reversible tcyc ?(ND1/4) ?(ND1/12) faster!
- To boost reversibility speedup by 10, one must
consider 1036-CPU machines (1.7 trillion moles
of CPUs!) - 1.7 trillion moles of H atoms weighs 1.7 million
metric tons! - A 100-m tall hill of H-atom sized CPUs!
45Lower bound on irrev. time
- Simulate Nproc ND3 cells for Nsteps ND steps.
- Consider a sequence of ND update steps.
- Final cell value depends on ND4 ops in time T.
- All ops must occur within radius r cT of cell.
- Surface area A ? T2, rate Rop ? T2 sustainable.
- Nops ? Rop T ? T3 needs to be at least ND4.
- ? T must be ?(ND4/3) to do all ND steps.
- Average time per step must be ?(ND1/3).
- Any irreversible machine (of any technology or
architecture) must obey this bound!
46Irreversible 3-D Mesh
47Reversible 3-D Mesh
48Non-local Communication
- Best computational task for reversibility
- Each processor must exchange messages with
another that is ND1/2 nodes away on each cycle - Unsure what real-world problem demands this
pattern! - In this case, reversible speedup scales with
number of CPUs to only the 1/18th power. - To boost reversibility speedup by 10, only
need 1018 (or 1.7 micromoles) of CPUs - If each was a 1-nm cluster of 100 C atoms, this
is only 2 mg mass, volume 1 mm3. - Current VLSI Need cost level of 25B before
you see a speedup.
49Ballistic Machines
- In the limit if cS ? 0, the asymptotic benefit
for 3-D meshes goes as ND1/3 or Nproc1/9. - Only need a billion devices to multiply
reversible speedup by 10. - With 1 nm3 devices, a cube 1 ?m on a side
(bacteria size) would do it! - Does rod logic have low enough cS and small
enough size to attain this prediction? - (Need to check.)
50Minimizing volume via folding
- Allows prev.solutions to bepacked in
min.volume. - Volume scalesproportionallyto mass.
- No change inspeed or entropy flux.
51Cooling Technologies
52Irreversible Max Perf. Per Area
53Reversible Entropy Coeffs.
54Rev. vs. Irrev. Comparisons
55Sizes of Winning Rev. Machines
56Some Analytical Challenges
- Combine Frank 02 emulation algorithm,
- Analysis of its energy and space efficiency as a
function of n and k, - And plug it into the analysis for the 3-D meshes,
to see - What are the optimal speedups for arbitrary mesh
computations on rev. machines, as a function of - Ron/off, device volume, entropy flux limit,
machine size. - And, does perf./hw improve, and if so, how much?
57Open issues for reversible comp.
- Integrate realistic fundamental models of the
clocking system into the engineering analysis. - There is an open issue about the scalability of
clock distribution systems. - Exist quantum bounds on reusability of timing
signals. - Not yet clear if reversible clocking is scalable.
- Fortunately, self-timed reversible computing also
appears to be a possibility. - Not yet clear if this approach works above 1-D
models. - Simulation experiments planned to investigate
this. - Develop efficient physical realizations of
nano-scale bit-devices timing systems.
58Quantum Computing pros/cons
- Pros
- Removes an unnecessary restriction on the types
of quantum states ops usable for computation. - Opens up exponentially shorter paths to solving
some types of problems (e.g., factoring,
simulation) - Cons
- Sensitive, requires overhead for error
correction. - Also, still remains subject to fundamental
physical bounds on info. density, rate of state
change! - Myth A quantum memory can store an
exponentially large amount of data. - Myth A quantum computer can perform operations
at an exponentially faster rate than a classical
one.
59Some goals of my QC work
- Develop a UMS model of computation that
incorporates quantum computing. - Design simulate quantum computer architectures,
programming languages, etc. - Describe how to do the systems-engineering
optimization of quantum computers for various
problems of interest.
60Conclusion
- As we near the physical limits of computing,
- Further improvements will require an increasingly
sophisticated interdisciplinary integration of
concerns across many levels of engineering. - I am developing a principled nanocomputer systems
engineering methodology - And applying it to the problem of determining the
real cost-efficiency of new models of computing - Reversible computing
- Quantum computing
- Building the foundations of a new discipline that
will be critical in coming decades.