Physical Computing Theory, Ultimate Models, and the Tight Church

About This Presentation

Title:

Physical Computing Theory, Ultimate Models, and the Tight Church

Description:

Physical Computing Theory, Ultimate Models, and the Tight Church's Thesis: ... Conjecture: A 2- or 3-D mesh multiprocessor with a fixed-size memory hierarchy per node ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 61

Provided by: Jam123

Learn more at: https://www.cise.ufl.edu

Category:

more less

Transcript and Presenter's Notes

Title: Physical Computing Theory, Ultimate Models, and the Tight Church

1
Physical Computing Theory, Ultimate Models, and
the Tight Churchs Thesis A More Accurate
Complexity Theory for Future Nanocomputing Dr.
Mike FrankUniversity of FloridaCISE
Departmentmpf_at_cise.ufl.edu

Presented to
Algorithms Theory ClubWed., May 1, 2002

2
Source ITRS 99
3
½CV2 based on ITRS 99 figures for Vdd and
minimum transistor gate capacitance. T300 K
4
Physical Computing Theory

The study of theoretical models of computation
that are based on (or closely tied to) physics
Make no nonphysical assumptions!
Includes the study of
Fundamental physical limits of computing
Physically-based models of computing
Includes reversible and/or quantum models
Ultimate (asymptotically optimal) models
An asymptotically tight Churchs thesis
Model-independent basis for complexity theory
Basis for design of future nanocomputer
architectures
Asymptotic scaling of architectures algorithms
Physically optimal algorithms

5
Ultimate Models of Computing

We would like models of computing that match the
real computational power of physics.
Not too weak, not too strong.
Most traditional models of computing only match
physics to within polynomial factors.
Misleading asymptotic performance of algorithms.
Not good enough to form the basis for a real
systems engineering optimization of
architectures.
Develop models of computing that are
As powerful as physically possible on all
problems
Realistic within asymptotic constant factors

6
Scalability Maximal Scalability

A multiprocessor architecture accompanying
performance model is scalable if
it can be scaled up to arbitrarily large
problem sizes, and/or arbitrarily large numbers
of processors, without the predictions of the
performance model breaking down.
An architecture ( model) is maximally scalable
for a given problem if
it is scalable and if no other scalable
architecture can claim asymptotically superior
performance on that problem
It is universally maximally scalable (UMS) if it
is maximally scalable on all problems!
I will briefly mention some characteristics of
architectures that are universally maximally
scalable

7
Universal Maximum Scalability

Existence proof for universally maximally
scalable (UMS) architectures
Physics itself is a universal maximally scalable
architecture because any real computer is
merely a special case of a physical system.
Obviously, no real computer can beat the
performance of physical systems in general.
Unfortunately, physics doesnt give us a very
simple or convenient programming model.
Comprehensive expertise at programming physics
means mastery of all physical engineering
disciplines chemical, electrical, mechanical,
optical, etc.
Wed like an easier programming model than this!

8
Physics Constrains the Ultimate Model
9
Simple UMS Architectures

(I propose) any practical UMS architecture will
have the following features
Processing elements characterized by constant
parameters (independent of of processors)
Mesh-type message-passing interconnection
network, arbitrarily scalable in 2 dimensions
w. limited scalability in 3rd dimension.
Processing elements that can be operated in a
highly reversible mode, at least up to some
limit.
Enables improved 3-d scalability, in a limited
regime
(In long term) Have capability for
quantum-coherent operation, for extra perf. on
some probs.

10
Ideally Scalable Architectures
Conjecture A 2- or 3-D mesh multiprocessor with
a fixed-size memory hierarchy per node is an
optimal scalable computer systems design (for any
application).
Processing Node
Processing Node
Processing Node
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Processing Node
Processing Node
Processing Node
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Mesh interconnection network
11
Tight Churchs Thesis

Conjecture 3-d meshes of some variety of
fixed-size reversible/quantum processing element
gives the same or lesser asymptotic complexity
(within a constant factor) for all problem
classes as any special-purpose physical mechanism
for solving the given problem class.

12
Reversibility of Physics

The universe is (apparently) a closed system
Closed systems evolve via unitary transforms
Apparent wavefunction collapse doesnt contradict
this (confirmed by work of Everett, Zurek, etc.)
Time-evolution of concrete state of universe (or
closed subsystems) is reversible
Invertible (bijective)
Deterministic looking backwards in time
Total info. (log of poss. states) doesnt
decrease
Can increase, though, if volume is increasing
Information cannot be destroyed!

13
Illustrating Landauers principle
Before bit erasure
After bit erasure
s0
0
0
s0
Nstates

sN-1
0
0
sN-1
Unitary(1-1)evolution
2Nstates
s0
sN
1
0
Nstates

0
sN-1
s2N-1
1
14
Benefits of Reversible Computing

Reduces energy/cooling costs of computing
Improves performance per unit power consumed
Given heat flux limits in the cooling system,
Improves performance per unit convex hull area
A faster machine in a given size box.
For communication-intensive parallel algorithms,
Improves performance, period!
All these benefits are by small polynomial
factors in the integration scale the device
properties.

15
Reversible/Adiabatic CMOS

Chips designed at MIT, 1996-1999

16
(No Transcript)
17
Minimum Losses w. Leakage
Etot Eadia Eleak
Eleak Pleaktr
Eadia cE / tr
18
Systems Engineering (my defn)

The interdisciplinary study of the design of
complex systems in which multiple areas of
engineering may interact in nontrivial ways.
Optimizing a complete system will in general
require considering the effect of concerns in
different engineering disciplines on each other.
E.g., simultaneous consideration of
Mechanical (structural dynamic) engineering
Thermal/power engineering
Electrical/electronic/photonic engineering
Algorithmic/software engineering
Economic/social/financial engineering?

19
Cost-Efficiency

The primary concern of systems engineering.
Cost most generally can be any appropriate
measure of resources consumed.
The cost-efficiency to achieve a task is the
fraction min/ of cost that had to be spent.
Goal When designing a system to accomplish a
given task, choose the design that minimizes the
cost (thus maximizes cost-efficiency).
Include design cost, amortized over expected
number of reuses of the design.

20
Two-pass system optimization

A general methodology for the integrated
optimization of the design of complex systems.
Performance characteristicsof system expressed
as afunction of design parameters,
subsystemcharacteristics.
Then, optimize designparameters from top downto
maximize overall systemcost-efficiency.

Top-levelsystems design
Optimizedesignparametersfrom topdownwards
High-levelsubsystems
Characterizecost-efficiency from bottomupwards
Mid-levelcomponents
Lowest-leveldesign elements
21
Computer Systems Engineering (CSE)

General systems engineering principles applied to
the design of computer systems.
E.g. Electronic, algorithmic, thermal, and
communications concerns interact when optimizing
massively parallel computers for some problems
When looking ahead to the cross-disciplinary
interactions that become more important for
bit-devices at the nanometer scale,
Call the subject nanocomputer systems
engineering (NCSE)
This is what I do.

22
Cost-Efficiency

Cost-efficiency of anything is min/,
The fraction of actual cost that really needed
to be spent to get the thing, using the best
poss. method.
Measures the relative number of instances of the
thing that can be accomplished per unit cost,
compared to the maximum number possible
Inversely proportional to cost .
Maximizing means minimizing .
Regardless of what min actually is.
In computing, the thing is a computational task
that we wish to be carried out.

23
Components of Cost

The cost of a computation may be a sum of terms
for many different components
Time cost
Cost to user of having to wait for results
E.g., missing deadlines, incurring penalties.
May increase nonlinearly with time for long
times.
Spacetime-related costs
Cost of raw physical spacetime occupied by
computation.
Cost to rent the space.
Cost of hardware (amortized over its lifetime)
Cost of raw mass-energy, particles, atoms.
Cost of materials, parts.
Cost of assembly.
Cost of parts/labor for operation maintenance.

Cost of SW licenses

24
More cost components

Continued...
Area-time costs
Cost to rent portion of an enclosing convex hull
for getting things in out of the system
Energy, heat, information, people, materials,
entropy.
Some examples
Chip area, power level, cooling capacity, I/O
bandwidth, desktop footprint, floor space, real
estate, planetary surface
Area-time costs scale with the maximum number of
items that can be sent/received.
Energy expenditure costs
Cost of raw free energy expenditure (entropy
generation).
Cost of energy-delivery system. (Amortized.)
Cost of cooling system. (Amortized.)

25
General Cost Measures

The most comprehensive cost measure includes
terms for all of these potential kinds of costs.
comprehensive Time SpaceTime AreaTime
FreeEnergy
Time is an non-decreasing function
f(?tstart?end)
Simple model Time ? ?tstart?end
FreeEnergy is most generally
Simple model FreeEnergy ? ?Sgenerated
SpaceTime and AreaTime are most generally
Simple model
SpaceTime ? Space ? Time
AreaTime ? Area ? Time

Max ops thatcould be done
Max items thatcould be I/Od
26
Generalized Amdahls Law

Given any cost that is a sum of components, tot
1 n,
There are diminishing proportional returns to be
gained from reducing any single cost component
(or subset of components) to much less than the
sum of the remaining components.
Optimization effort should focus on the cost
components that are most significant in the
application of interest.
At a design equilibrium, all cost components
will be roughly equal (unless externally driven)

27
Reversible vs. Irreversible

Want to compare their cost-efficiency under
various cost measures
Time
Entropy
Area-time
Spacetime
Note that space (volume, mass, etc.) by itself as
a cost measure is only significant if either
(a) The computer isnt reusable so the cost to
build it dominates operating costs.
(b) I/O latency ? V1/3 affects other costs.

Or, for some applications,one quantity might be
minimizedwhile another one (space, time,
area)is constrained by some hard limit.
28
Time Cost Comparison

For computations with unlimited power/cooling and
no communication requirements
Reversible worse than irreversible by a factor of
sgt1 (adiabatic slowdown factor), times maybe a
small constant depending on logic style
used. r,Time ? i,Time s

29
Time Cost Comparison, cont.

For parallelizable power-limited applications
With nonzero leakage r,Time ? i,Time /
Ron/offg
Worst-case computations g ? 0.4
Best-case computations g 0.5.
For parallelizable area-limited,
entropy-flux-limited, best case applications
with leakage ? 0 r,Time ? i,Time d 1/2
where d is systems physical diameter.

30
Time cost comparison, cont.

For entropy-flux limited, parallel, heavily
communication-limited, best case applications
with leakage approaching 0 r,Time ? i,Time3/4
where i,Time scales up with the space
requirement V as i,Time ? V2/9
so the reversible speedup scales with the 1/18
power of system size.

31
Reversible Emulation - Ben89
k 2n 3
k 3n 2
32
Bennett 89 alg. is not optimal
k 2n 3
k 3n 2
Just look at all the spacetime it wastes!!!
33
Parallel Frank02 algorithm

We can simply squish the triangles closer
together to eliminate the wasted spacetime!
Resulting algorithm is linear time for all n and
k and dominates Ben89 for time, spacetime,
energy!

k3n2
k2n3
Emulated time
k4n1
Real time
34
Setup for Analysis

For energy-dominated limit,
let cost equal energy.
c energy coefficient, r r(min) leakage
power
i energy dissipation per irreversible
state-change
Let the on/off ratio Ron/off r(max)/r(min)
Pmax/Pmin.
Note that c ? itmin i (i / r(max)),
so r(max) ? i2/c
So Ron/off ? i2 / cr(min) i2 / cr

35
Time Taken

There are n levels of recursion.
Each multiplies the width of the base of the
triangle by k.
Lowest-level triangles take time ctop.
Total time is thus ctopkn.

k4n1
Width 4 sub-units
36
Number of Adiabatic Ops

Each triangle contains k (k ? 1) 2k ? 1
immediate sub-triangles.
There are n levels of recursion.
Thus number of adiabatic ops is c(2k ? 1)n

k3n2
52 25little triangles(adiabaticoperations)
37
Spacetime Usage

Each triangle includes the spacetime usage of all
k ? 1 of its subtriangles,
Plus,additional spacetime units, each
consisting of 1 storage unit, for time
topkn?1

k5n1
1 state of irrev. mach. Being stored
1
2
Time top kn-1
3
Resulting recurrence relationST(k,0) 1 (or
c)ST(k,n) (2k?1)ST(k,n?1) (k2?3k2)kn?1/2
123 units
38
Reversible Cost

Adiabatic cost plus spacetime cost r a r
(2k-1)nc/t ST(k,n)rt
Minimizing over t gives r 2(2k-1)n
ST(k,n) c r1/2
But, in energy-dominated limit, c r ? i2 /
Ron/off,
So r 2i (2k-1)n ST(k,n) / Ron/off1/2

39
Tot. Cost, Orig. Cost, Advantage

Total cost i for irreversible operation
performed at end of algorithm, plus reversible
cost, gives tot i 1 2(2k-1)n
ST(k,n) / Ron/off1/2
Original irreversible machine performing kn ops
would use cost orig ikn, so,
Advantage ratio between reversible irreversible
cost,

40
Optimization Algorithm

For any given value on Ron/off,
Scan the possible values of n (up to some limit),
For each of those, scan the possible values of k,
Until the maximum R(i/r) for that n is found
(the function only has a single local maximum)
And return the max R(i/r) over all n tried.

41
Spacetime blowup
Energy saved
k
n
42
Asymptotic Scaling

The potential energy savings factor scales as
R(i/r) ? Ron/off0.4,
while the spacetime overhead goes only as
R(i/r) ? R(i/r)0.45, or Ron/off0.18.
E.g., with an Ron/off of 109, you can do
worst-case computation in an adiabatic circuit
with
An energy savings of up to a factor of 1,200 !
But, this point is 700,000 less
hardware-efficient!

43
Various Cost Measures

Entropy - advantage as per previous analysis
Area times time - scales w. entropy generated
Performance, given area constraint -
In leakage free-limit, advantage proportional to
d1/2
With leakage, whats the max advantage? (See hw)
NOW
Are there any performance/cost advantages from
adiabatics even when there is no cost or
constraint to entropy or to area?
YES, for flux-limited computations that require
communications. Lets see why

44
Perf. scaling w. of devices

If alg. is not limited by communications needs,
Use irreversible processors spread in a 2-D
layer.
Remove entropy along perpendicular dimension.
No entropy rate limits,
so no speed advantage from reversibility.
If alg. requires only local communication,latency
? cyc. time, in an NDNDND mesh,
Leak-free reversible machine perf. scales better!
Irreversible tcyc ?(ND1/3)
Reversible tcyc ?(ND1/4) ?(ND1/12) faster!
To boost reversibility speedup by 10, one must
consider 1036-CPU machines (1.7 trillion moles
of CPUs!)
1.7 trillion moles of H atoms weighs 1.7 million
metric tons!
A 100-m tall hill of H-atom sized CPUs!

45
Lower bound on irrev. time

Simulate Nproc ND3 cells for Nsteps ND steps.
Consider a sequence of ND update steps.
Final cell value depends on ND4 ops in time T.
All ops must occur within radius r cT of cell.
Surface area A ? T2, rate Rop ? T2 sustainable.
Nops ? Rop T ? T3 needs to be at least ND4.
? T must be ?(ND4/3) to do all ND steps.
Average time per step must be ?(ND1/3).
Any irreversible machine (of any technology or
architecture) must obey this bound!

46
Irreversible 3-D Mesh
47
Reversible 3-D Mesh
48
Non-local Communication

Best computational task for reversibility
Each processor must exchange messages with
another that is ND1/2 nodes away on each cycle
Unsure what real-world problem demands this
pattern!
In this case, reversible speedup scales with
number of CPUs to only the 1/18th power.
To boost reversibility speedup by 10, only
need 1018 (or 1.7 micromoles) of CPUs
If each was a 1-nm cluster of 100 C atoms, this
is only 2 mg mass, volume 1 mm3.
Current VLSI Need cost level of 25B before
you see a speedup.

49
Ballistic Machines

In the limit if cS ? 0, the asymptotic benefit
for 3-D meshes goes as ND1/3 or Nproc1/9.
Only need a billion devices to multiply
reversible speedup by 10.
With 1 nm3 devices, a cube 1 ?m on a side
(bacteria size) would do it!
Does rod logic have low enough cS and small
enough size to attain this prediction?
(Need to check.)

50
Minimizing volume via folding

Allows prev.solutions to bepacked in
min.volume.
Volume scalesproportionallyto mass.
No change inspeed or entropy flux.

51
Cooling Technologies
52
Irreversible Max Perf. Per Area
53
Reversible Entropy Coeffs.
54
Rev. vs. Irrev. Comparisons
55
Sizes of Winning Rev. Machines
56
Some Analytical Challenges

Combine Frank 02 emulation algorithm,
Analysis of its energy and space efficiency as a
function of n and k,
And plug it into the analysis for the 3-D meshes,
to see
What are the optimal speedups for arbitrary mesh
computations on rev. machines, as a function of
Ron/off, device volume, entropy flux limit,
machine size.
And, does perf./hw improve, and if so, how much?

57
Open issues for reversible comp.

Integrate realistic fundamental models of the
clocking system into the engineering analysis.
There is an open issue about the scalability of
clock distribution systems.
Exist quantum bounds on reusability of timing
signals.
Not yet clear if reversible clocking is scalable.
Fortunately, self-timed reversible computing also
appears to be a possibility.
Not yet clear if this approach works above 1-D
models.
Simulation experiments planned to investigate
this.
Develop efficient physical realizations of
nano-scale bit-devices timing systems.

58
Quantum Computing pros/cons

Pros
Removes an unnecessary restriction on the types
of quantum states ops usable for computation.
Opens up exponentially shorter paths to solving
some types of problems (e.g., factoring,
simulation)
Cons
Sensitive, requires overhead for error
correction.
Also, still remains subject to fundamental
physical bounds on info. density, rate of state
change!
Myth A quantum memory can store an
exponentially large amount of data.
Myth A quantum computer can perform operations
at an exponentially faster rate than a classical
one.

59
Some goals of my QC work

Develop a UMS model of computation that
incorporates quantum computing.
Design simulate quantum computer architectures,
programming languages, etc.
Describe how to do the systems-engineering
optimization of quantum computers for various
problems of interest.

60
Conclusion

As we near the physical limits of computing,
Further improvements will require an increasingly
sophisticated interdisciplinary integration of
concerns across many levels of engineering.
I am developing a principled nanocomputer systems
engineering methodology
And applying it to the problem of determining the
real cost-efficiency of new models of computing
Reversible computing
Quantum computing
Building the foundations of a new discipline that
will be critical in coming decades.

Write a Comment

User Comments (0)

About PowerShow.com

Physical Computing Theory, Ultimate Models, and the Tight Church - PowerPoint PPT Presentation

Physical Computing Theory, Ultimate Models, and the Tight Church

Physical Computing Theory, Ultimate Models, and the Tight Church's Thesis: ... Conjecture: A 2- or 3-D mesh multiprocessor with a fixed-size memory hierarchy per node ... – PowerPoint PPT presentation