Physical Computing Theory, Ultimate Models, and the Tight Church - PowerPoint PPT Presentation

About This Presentation
Title:

Physical Computing Theory, Ultimate Models, and the Tight Church

Description:

Physical Computing Theory, Ultimate Models, and the Tight Church's Thesis: ... Conjecture: A 2- or 3-D mesh multiprocessor with a fixed-size memory hierarchy per node ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 61
Provided by: Jam123
Learn more at: https://www.cise.ufl.edu
Category:

less

Transcript and Presenter's Notes

Title: Physical Computing Theory, Ultimate Models, and the Tight Church


1
Physical Computing Theory, Ultimate Models, and
the Tight Churchs Thesis A More Accurate
Complexity Theory for Future Nanocomputing Dr.
Mike FrankUniversity of FloridaCISE
Departmentmpf_at_cise.ufl.edu
  • Presented to
  • Algorithms Theory ClubWed., May 1, 2002

2
Source ITRS 99
3
½CV2 based on ITRS 99 figures for Vdd and
minimum transistor gate capacitance. T300 K
4
Physical Computing Theory
  • The study of theoretical models of computation
    that are based on (or closely tied to) physics
  • Make no nonphysical assumptions!
  • Includes the study of
  • Fundamental physical limits of computing
  • Physically-based models of computing
  • Includes reversible and/or quantum models
  • Ultimate (asymptotically optimal) models
  • An asymptotically tight Churchs thesis
  • Model-independent basis for complexity theory
  • Basis for design of future nanocomputer
    architectures
  • Asymptotic scaling of architectures algorithms
  • Physically optimal algorithms

5
Ultimate Models of Computing
  • We would like models of computing that match the
    real computational power of physics.
  • Not too weak, not too strong.
  • Most traditional models of computing only match
    physics to within polynomial factors.
  • Misleading asymptotic performance of algorithms.
  • Not good enough to form the basis for a real
    systems engineering optimization of
    architectures.
  • Develop models of computing that are
  • As powerful as physically possible on all
    problems
  • Realistic within asymptotic constant factors

6
Scalability Maximal Scalability
  • A multiprocessor architecture accompanying
    performance model is scalable if
  • it can be scaled up to arbitrarily large
    problem sizes, and/or arbitrarily large numbers
    of processors, without the predictions of the
    performance model breaking down.
  • An architecture ( model) is maximally scalable
    for a given problem if
  • it is scalable and if no other scalable
    architecture can claim asymptotically superior
    performance on that problem
  • It is universally maximally scalable (UMS) if it
    is maximally scalable on all problems!
  • I will briefly mention some characteristics of
    architectures that are universally maximally
    scalable

7
Universal Maximum Scalability
  • Existence proof for universally maximally
    scalable (UMS) architectures
  • Physics itself is a universal maximally scalable
    architecture because any real computer is
    merely a special case of a physical system.
  • Obviously, no real computer can beat the
    performance of physical systems in general.
  • Unfortunately, physics doesnt give us a very
    simple or convenient programming model.
  • Comprehensive expertise at programming physics
    means mastery of all physical engineering
    disciplines chemical, electrical, mechanical,
    optical, etc.
  • Wed like an easier programming model than this!

8
Physics Constrains the Ultimate Model
9
Simple UMS Architectures
  • (I propose) any practical UMS architecture will
    have the following features
  • Processing elements characterized by constant
    parameters (independent of of processors)
  • Mesh-type message-passing interconnection
    network, arbitrarily scalable in 2 dimensions
  • w. limited scalability in 3rd dimension.
  • Processing elements that can be operated in a
    highly reversible mode, at least up to some
    limit.
  • Enables improved 3-d scalability, in a limited
    regime
  • (In long term) Have capability for
    quantum-coherent operation, for extra perf. on
    some probs.

10
Ideally Scalable Architectures
Conjecture A 2- or 3-D mesh multiprocessor with
a fixed-size memory hierarchy per node is an
optimal scalable computer systems design (for any
application).
Processing Node
Processing Node
Processing Node
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Processing Node
Processing Node
Processing Node
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Local memory hierarchy(optimal fixed size)
Mesh interconnection network
11
Tight Churchs Thesis
  • Conjecture 3-d meshes of some variety of
    fixed-size reversible/quantum processing element
    gives the same or lesser asymptotic complexity
    (within a constant factor) for all problem
    classes as any special-purpose physical mechanism
    for solving the given problem class.

12
Reversibility of Physics
  • The universe is (apparently) a closed system
  • Closed systems evolve via unitary transforms
  • Apparent wavefunction collapse doesnt contradict
    this (confirmed by work of Everett, Zurek, etc.)
  • Time-evolution of concrete state of universe (or
    closed subsystems) is reversible
  • Invertible (bijective)
  • Deterministic looking backwards in time
  • Total info. (log of poss. states) doesnt
    decrease
  • Can increase, though, if volume is increasing
  • Information cannot be destroyed!

13
Illustrating Landauers principle
Before bit erasure
After bit erasure
s0
0
0
s0
Nstates



sN-1
0
0
sN-1
Unitary(1-1)evolution
2Nstates
s0
sN
1
0
Nstates



0
sN-1
s2N-1
1
14
Benefits of Reversible Computing
  • Reduces energy/cooling costs of computing
  • Improves performance per unit power consumed
  • Given heat flux limits in the cooling system,
  • Improves performance per unit convex hull area
  • A faster machine in a given size box.
  • For communication-intensive parallel algorithms,
  • Improves performance, period!
  • All these benefits are by small polynomial
    factors in the integration scale the device
    properties.

15
Reversible/Adiabatic CMOS
  • Chips designed at MIT, 1996-1999

16
(No Transcript)
17
Minimum Losses w. Leakage
Etot Eadia Eleak
Eleak Pleaktr
Eadia cE / tr
18
Systems Engineering (my defn)
  • The interdisciplinary study of the design of
    complex systems in which multiple areas of
    engineering may interact in nontrivial ways.
  • Optimizing a complete system will in general
    require considering the effect of concerns in
    different engineering disciplines on each other.
  • E.g., simultaneous consideration of
  • Mechanical (structural dynamic) engineering
  • Thermal/power engineering
  • Electrical/electronic/photonic engineering
  • Algorithmic/software engineering
  • Economic/social/financial engineering?

19
Cost-Efficiency
  • The primary concern of systems engineering.
  • Cost most generally can be any appropriate
    measure of resources consumed.
  • The cost-efficiency to achieve a task is the
    fraction min/ of cost that had to be spent.
  • Goal When designing a system to accomplish a
    given task, choose the design that minimizes the
    cost (thus maximizes cost-efficiency).
  • Include design cost, amortized over expected
    number of reuses of the design.

20
Two-pass system optimization
  • A general methodology for the integrated
    optimization of the design of complex systems.
  • Performance characteristicsof system expressed
    as afunction of design parameters,
    subsystemcharacteristics.
  • Then, optimize designparameters from top downto
    maximize overall systemcost-efficiency.

Top-levelsystems design
Optimizedesignparametersfrom topdownwards
High-levelsubsystems
Characterizecost-efficiency from bottomupwards
Mid-levelcomponents
Lowest-leveldesign elements
21
Computer Systems Engineering (CSE)
  • General systems engineering principles applied to
    the design of computer systems.
  • E.g. Electronic, algorithmic, thermal, and
    communications concerns interact when optimizing
    massively parallel computers for some problems
  • When looking ahead to the cross-disciplinary
    interactions that become more important for
    bit-devices at the nanometer scale,
  • Call the subject nanocomputer systems
    engineering (NCSE)
  • This is what I do.

22
Cost-Efficiency
  • Cost-efficiency of anything is min/,
  • The fraction of actual cost that really needed
    to be spent to get the thing, using the best
    poss. method.
  • Measures the relative number of instances of the
    thing that can be accomplished per unit cost,
  • compared to the maximum number possible
  • Inversely proportional to cost .
  • Maximizing means minimizing .
  • Regardless of what min actually is.
  • In computing, the thing is a computational task
    that we wish to be carried out.

23
Components of Cost
  • The cost of a computation may be a sum of terms
    for many different components
  • Time cost
  • Cost to user of having to wait for results
  • E.g., missing deadlines, incurring penalties.
  • May increase nonlinearly with time for long
    times.
  • Spacetime-related costs
  • Cost of raw physical spacetime occupied by
    computation.
  • Cost to rent the space.
  • Cost of hardware (amortized over its lifetime)
  • Cost of raw mass-energy, particles, atoms.
  • Cost of materials, parts.
  • Cost of assembly.
  • Cost of parts/labor for operation maintenance.
  • Cost of SW licenses

24
More cost components
  • Continued...
  • Area-time costs
  • Cost to rent portion of an enclosing convex hull
    for getting things in out of the system
  • Energy, heat, information, people, materials,
    entropy.
  • Some examples
  • Chip area, power level, cooling capacity, I/O
    bandwidth, desktop footprint, floor space, real
    estate, planetary surface
  • Area-time costs scale with the maximum number of
    items that can be sent/received.
  • Energy expenditure costs
  • Cost of raw free energy expenditure (entropy
    generation).
  • Cost of energy-delivery system. (Amortized.)
  • Cost of cooling system. (Amortized.)

25
General Cost Measures
  • The most comprehensive cost measure includes
    terms for all of these potential kinds of costs.
  • comprehensive Time SpaceTime AreaTime
    FreeEnergy
  • Time is an non-decreasing function
    f(?tstart?end)
  • Simple model Time ? ?tstart?end
  • FreeEnergy is most generally
  • Simple model FreeEnergy ? ?Sgenerated
  • SpaceTime and AreaTime are most generally
  • Simple model
  • SpaceTime ? Space ? Time
  • AreaTime ? Area ? Time

Max ops thatcould be done
Max items thatcould be I/Od
26
Generalized Amdahls Law
  • Given any cost that is a sum of components, tot
    1 n,
  • There are diminishing proportional returns to be
    gained from reducing any single cost component
    (or subset of components) to much less than the
    sum of the remaining components.
  • Optimization effort should focus on the cost
    components that are most significant in the
    application of interest.
  • At a design equilibrium, all cost components
    will be roughly equal (unless externally driven)

27
Reversible vs. Irreversible
  • Want to compare their cost-efficiency under
    various cost measures
  • Time
  • Entropy
  • Area-time
  • Spacetime
  • Note that space (volume, mass, etc.) by itself as
    a cost measure is only significant if either
  • (a) The computer isnt reusable so the cost to
    build it dominates operating costs.
  • (b) I/O latency ? V1/3 affects other costs.

Or, for some applications,one quantity might be
minimizedwhile another one (space, time,
area)is constrained by some hard limit.
28
Time Cost Comparison
  • For computations with unlimited power/cooling and
    no communication requirements
  • Reversible worse than irreversible by a factor of
    sgt1 (adiabatic slowdown factor), times maybe a
    small constant depending on logic style
    used. r,Time ? i,Time s

29
Time Cost Comparison, cont.
  • For parallelizable power-limited applications
  • With nonzero leakage r,Time ? i,Time /
    Ron/offg
  • Worst-case computations g ? 0.4
  • Best-case computations g 0.5.
  • For parallelizable area-limited,
    entropy-flux-limited, best case applications
  • with leakage ? 0 r,Time ? i,Time d 1/2
  • where d is systems physical diameter.

30
Time cost comparison, cont.
  • For entropy-flux limited, parallel, heavily
    communication-limited, best case applications
  • with leakage approaching 0 r,Time ? i,Time3/4
  • where i,Time scales up with the space
    requirement V as i,Time ? V2/9
  • so the reversible speedup scales with the 1/18
    power of system size.

31
Reversible Emulation - Ben89
k 2n 3
k 3n 2
32
Bennett 89 alg. is not optimal
k 2n 3
k 3n 2
Just look at all the spacetime it wastes!!!
33
Parallel Frank02 algorithm
  • We can simply squish the triangles closer
    together to eliminate the wasted spacetime!
  • Resulting algorithm is linear time for all n and
    k and dominates Ben89 for time, spacetime,
    energy!

k3n2
k2n3
Emulated time
k4n1
Real time
34
Setup for Analysis
  • For energy-dominated limit,
  • let cost equal energy.
  • c energy coefficient, r r(min) leakage
    power
  • i energy dissipation per irreversible
    state-change
  • Let the on/off ratio Ron/off r(max)/r(min)
    Pmax/Pmin.
  • Note that c ? itmin i (i / r(max)),
    so r(max) ? i2/c
  • So Ron/off ? i2 / cr(min) i2 / cr

35
Time Taken
  • There are n levels of recursion.
  • Each multiplies the width of the base of the
    triangle by k.
  • Lowest-level triangles take time ctop.
  • Total time is thus ctopkn.

k4n1
Width 4 sub-units
36
Number of Adiabatic Ops
  • Each triangle contains k (k ? 1) 2k ? 1
    immediate sub-triangles.
  • There are n levels of recursion.
  • Thus number of adiabatic ops is c(2k ? 1)n

k3n2
52 25little triangles(adiabaticoperations)
37
Spacetime Usage
  • Each triangle includes the spacetime usage of all
    k ? 1 of its subtriangles,
  • Plus,additional spacetime units, each
    consisting of 1 storage unit, for time
    topkn?1

k5n1
1 state of irrev. mach. Being stored
1
2
Time top kn-1
3
Resulting recurrence relationST(k,0) 1 (or
c)ST(k,n) (2k?1)ST(k,n?1) (k2?3k2)kn?1/2
123 units
38
Reversible Cost
  • Adiabatic cost plus spacetime cost r a r
    (2k-1)nc/t ST(k,n)rt
  • Minimizing over t gives r 2(2k-1)n
    ST(k,n) c r1/2
  • But, in energy-dominated limit, c r ? i2 /
    Ron/off,
  • So r 2i (2k-1)n ST(k,n) / Ron/off1/2

39
Tot. Cost, Orig. Cost, Advantage
  • Total cost i for irreversible operation
    performed at end of algorithm, plus reversible
    cost, gives tot i 1 2(2k-1)n
    ST(k,n) / Ron/off1/2
  • Original irreversible machine performing kn ops
    would use cost orig ikn, so,
  • Advantage ratio between reversible irreversible
    cost,

40
Optimization Algorithm
  • For any given value on Ron/off,
  • Scan the possible values of n (up to some limit),
  • For each of those, scan the possible values of k,
  • Until the maximum R(i/r) for that n is found
  • (the function only has a single local maximum)
  • And return the max R(i/r) over all n tried.

41
Spacetime blowup
Energy saved
k
n
42
Asymptotic Scaling
  • The potential energy savings factor scales as
    R(i/r) ? Ron/off0.4,
  • while the spacetime overhead goes only as
    R(i/r) ? R(i/r)0.45, or Ron/off0.18.
  • E.g., with an Ron/off of 109, you can do
    worst-case computation in an adiabatic circuit
    with
  • An energy savings of up to a factor of 1,200 !
  • But, this point is 700,000 less
    hardware-efficient!

43
Various Cost Measures
  • Entropy - advantage as per previous analysis
  • Area times time - scales w. entropy generated
  • Performance, given area constraint -
  • In leakage free-limit, advantage proportional to
    d1/2
  • With leakage, whats the max advantage? (See hw)
  • NOW
  • Are there any performance/cost advantages from
    adiabatics even when there is no cost or
    constraint to entropy or to area?
  • YES, for flux-limited computations that require
    communications. Lets see why

44
Perf. scaling w. of devices
  • If alg. is not limited by communications needs,
  • Use irreversible processors spread in a 2-D
    layer.
  • Remove entropy along perpendicular dimension.
  • No entropy rate limits,
  • so no speed advantage from reversibility.
  • If alg. requires only local communication,latency
    ? cyc. time, in an NDNDND mesh,
  • Leak-free reversible machine perf. scales better!
  • Irreversible tcyc ?(ND1/3)
  • Reversible tcyc ?(ND1/4) ?(ND1/12) faster!
  • To boost reversibility speedup by 10, one must
    consider 1036-CPU machines (1.7 trillion moles
    of CPUs!)
  • 1.7 trillion moles of H atoms weighs 1.7 million
    metric tons!
  • A 100-m tall hill of H-atom sized CPUs!

45
Lower bound on irrev. time
  • Simulate Nproc ND3 cells for Nsteps ND steps.
  • Consider a sequence of ND update steps.
  • Final cell value depends on ND4 ops in time T.
  • All ops must occur within radius r cT of cell.
  • Surface area A ? T2, rate Rop ? T2 sustainable.
  • Nops ? Rop T ? T3 needs to be at least ND4.
  • ? T must be ?(ND4/3) to do all ND steps.
  • Average time per step must be ?(ND1/3).
  • Any irreversible machine (of any technology or
    architecture) must obey this bound!

46
Irreversible 3-D Mesh
47
Reversible 3-D Mesh
48
Non-local Communication
  • Best computational task for reversibility
  • Each processor must exchange messages with
    another that is ND1/2 nodes away on each cycle
  • Unsure what real-world problem demands this
    pattern!
  • In this case, reversible speedup scales with
    number of CPUs to only the 1/18th power.
  • To boost reversibility speedup by 10, only
    need 1018 (or 1.7 micromoles) of CPUs
  • If each was a 1-nm cluster of 100 C atoms, this
    is only 2 mg mass, volume 1 mm3.
  • Current VLSI Need cost level of 25B before
    you see a speedup.

49
Ballistic Machines
  • In the limit if cS ? 0, the asymptotic benefit
    for 3-D meshes goes as ND1/3 or Nproc1/9.
  • Only need a billion devices to multiply
    reversible speedup by 10.
  • With 1 nm3 devices, a cube 1 ?m on a side
    (bacteria size) would do it!
  • Does rod logic have low enough cS and small
    enough size to attain this prediction?
  • (Need to check.)

50
Minimizing volume via folding
  • Allows prev.solutions to bepacked in
    min.volume.
  • Volume scalesproportionallyto mass.
  • No change inspeed or entropy flux.

51
Cooling Technologies
52
Irreversible Max Perf. Per Area
53
Reversible Entropy Coeffs.
54
Rev. vs. Irrev. Comparisons
55
Sizes of Winning Rev. Machines
56
Some Analytical Challenges
  • Combine Frank 02 emulation algorithm,
  • Analysis of its energy and space efficiency as a
    function of n and k,
  • And plug it into the analysis for the 3-D meshes,
    to see
  • What are the optimal speedups for arbitrary mesh
    computations on rev. machines, as a function of
  • Ron/off, device volume, entropy flux limit,
    machine size.
  • And, does perf./hw improve, and if so, how much?

57
Open issues for reversible comp.
  • Integrate realistic fundamental models of the
    clocking system into the engineering analysis.
  • There is an open issue about the scalability of
    clock distribution systems.
  • Exist quantum bounds on reusability of timing
    signals.
  • Not yet clear if reversible clocking is scalable.
  • Fortunately, self-timed reversible computing also
    appears to be a possibility.
  • Not yet clear if this approach works above 1-D
    models.
  • Simulation experiments planned to investigate
    this.
  • Develop efficient physical realizations of
    nano-scale bit-devices timing systems.

58
Quantum Computing pros/cons
  • Pros
  • Removes an unnecessary restriction on the types
    of quantum states ops usable for computation.
  • Opens up exponentially shorter paths to solving
    some types of problems (e.g., factoring,
    simulation)
  • Cons
  • Sensitive, requires overhead for error
    correction.
  • Also, still remains subject to fundamental
    physical bounds on info. density, rate of state
    change!
  • Myth A quantum memory can store an
    exponentially large amount of data.
  • Myth A quantum computer can perform operations
    at an exponentially faster rate than a classical
    one.

59
Some goals of my QC work
  • Develop a UMS model of computation that
    incorporates quantum computing.
  • Design simulate quantum computer architectures,
    programming languages, etc.
  • Describe how to do the systems-engineering
    optimization of quantum computers for various
    problems of interest.

60
Conclusion
  • As we near the physical limits of computing,
  • Further improvements will require an increasingly
    sophisticated interdisciplinary integration of
    concerns across many levels of engineering.
  • I am developing a principled nanocomputer systems
    engineering methodology
  • And applying it to the problem of determining the
    real cost-efficiency of new models of computing
  • Reversible computing
  • Quantum computing
  • Building the foundations of a new discipline that
    will be critical in coming decades.
Write a Comment
User Comments (0)
About PowerShow.com