NanoFabrics: Spatial Computing Using Molecular Electronics - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

NanoFabrics: Spatial Computing Using Molecular Electronics

Description:

Strategy: substitute compile time (cheap) for manufacturing precision (expensive) ... 100x speedup when compiled properly. Do not do speculative or parallel ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 36
Provided by: aaronb4
Category:

less

Transcript and Presenter's Notes

Title: NanoFabrics: Spatial Computing Using Molecular Electronics


1
NanoFabrics Spatial Computing Using Molecular
Electronics
  • Seth Copen Goldstein and Mihai Budiu
  • Carnegie Mellon University

2
What We Know
  • Processing power per dollar doubles every year,
    but the end is near.
  • New technologies necessary to continue the rapid
    progress.
  • Solution Chemically Assembled Electronic
    Nanotechnology.

3
Solution
  • Strategy substitute compile time (cheap) for
    manufacturing precision (expensive).
  • Reconfigurable computing
  • Defect tolerance
  • Architectural abstractions
  • Compiler technology.

4
Advantages of CAEN
  • CAEN devices are much smaller than CMOS. (Single
    RAM cell comparison 100nm2 vs. 100000nm2).
  • RESULTS
  • Lower Fabrication Costs
  • High Density
  • Low-Power Usage
  • Self-stored configuration of state in the
    switches as opposed to using an extra RAM cell.

5
Reconfigurable Computing
  • Changes functions of programmable logic elements
    including
  • Their connections to storage.
  • Highly Parallel processing kernels.
  • Changes for each application.
  • Defined in an instruction set called a
    configuration.

6
Research Limitations
  • Constrained themselves to using molecular devices
    which have similar I-V characteristics to CMOS
    technology.
  • This way the system can be modeled with SPICE and
    is not so far from normal that past experience
    cannot be a guide.

7
Steps of Fabrication
  • Wires constructed through chemical self-assembly
    heard in Dwyer.
  • Wires aligned akin to gridlines in a graph,
    crossing at 90 degree angles with a molecular
    switch at each junction.
  • Silicon-based die created using lithography. This
    die has holes that (nanoBlocks) are placed in and
    contains connections for power, clock lines, I/O
    and other operations.

8
Issues with Production
  • High defect densities resulting from
    self-assembly.
  • Only diode-resistor logic possible so there can
    be no inverters. Logic functions produce both the
    answer and the complement.
  • Signal restoration necessary through the use of a
    molecular latch (we have talked about this).
  • End to end connections not possible, but also not
    necessary with the given architecture.

9
All The Parts
  • The nanoBlock is a three bit Boolean function
    with a three bit output.
  • NanoBlocks are arranged into clusters and
    connected to their neighbors.
  • Long wires are used to route signals between
    those clusters.
  • SwitchBlock is defined as the area where the
    three wires from neighboring nanoBlocks overlap.
  • ALL COMMUNICATION OCCURS ON THE NANOSCALE!!!

10
NanoFabric
11
The NanoBlock
  • Composed of three sections
  • The Molecular Logic Array (MLA).
  • The latches.
  • The I/O area.
  • All blocks are facing either SE or NW. Essential
    for applying circuit netlists.

12
Example NanoBlocks
  • The MLA is a set of orthogonal wires.
  • The molecular switches act as diodes when on
    (explain).
  • The MLA operates by using diode-resistor logic.
  • To the right are two examples, an AND gate and a
    XOR gate.

13
Defect Tolerance
  • NEED DEFECT TOLERANCE!
  • FOUR KEY REASONS
  • Regularity ability to choose where in the MLA a
    function is implemented.
  • Configurability ability to pick and choose the
    parts of the nanoBlock that will be involved in
    any given circuit.
  • Fine-Grained-ness Like those beds where wine
    does not spill. In other words, one fault in a
    specific location does not affect the operation
    of the rest of the block.
  • Rich Interconnect Can choose the implementation
    path.
  • Conclusion With all defects known it is possible
    to create a circuit. Defect Testing is necessary.

14
Defect Discovery
  • Can configure a nanoFabric to test itself through
    the use of linear-feedback shift-registers.
  • Recursive testing, maybe imagine branching like
    fractals? There is a seeded starting point where
    CMOS tests the first few components to find a
    fault-free region. Then fault-free regions in the
    nanoFabric can be configured to self-test other
    parts in the nanoFabric.
  • Defect discovery is at worst linear.
  • Small percentage of switches used at any time, so
    faulty ones can be bypassed.

15
Configuration
  • Runtime configuration is used for testing and the
    function of the nanoFabric. Therefore it must be
    quick (scaled to size of fabric).
  • Two Factors
  • 1) Time taken to download a configuration.
  • 2) Time taken to distribute the configuration to
    the proper bits in the nanoFabric.
  • Programmable in Parallel by CMOS!
  • Configurations different for each NanoFabric
    because location of faults is different?

16
NanoFabric at Work
  • Factory-Programmable devices (like todays CMOS
    machines).
  • Reconfigurable Computing Devices

17
Factory Programmed Devices
  • Finished product nanoFabric chip and a ROM that
    stores the programming.
  • Ignores Potential!!!

18
Reconfigurable Fabrics
  • Configurations created at compile time.
  • Benefits
  • Exploit applications parallelism (MIMD, SIMD,
    pipeline, help).
  • Create customized function units for each
    program.
  • Eliminate control circuitry.
  • Reduce memory bandwidth requirements.
  • Size function units to the applications natural
    word size.
  • And more adaptability is obviously better.

19
More Reconfigurability
  • The added ability to adapt comes at the price of
    longer compiler times.
  • Place and route will not scale to devices when
    there are billions of wires and devices.
  • Solution Split-Phase abstract machine.

20
SAMs
  • Program broken up into autonomous unit in order
    to take advantage of parallelism.
  • The Process
  • Partition the application into threads, each with
    a split-phase operation at the end.
  • Split-Phase operation one that takes an
    unspecified amount of time.
  • Threads can act then in parallel or in series,
    and reduces the number of time constraints.

21
SAM Simulations
  • Goal determine how aggressive compiler
    technology needs to be for the nanoFabric to
    compete with a CMOS processor.

22
Area requirements
  • One unit one memory word.
  • Integer operation also occurs in 1 unit of area.
  • 1 unit of area 1 cluster (1 64).
  • Total area is between 2,000 and 250,000.

23
Methodology
  • Compared
  • Execution time of CPU(Alpha)
  • Running time on a nanoFabric
  • Ignored time spent in operating system.
  • 2 Parts
  • Trace collection and analysis
  • Trace-based simulation.

24
Trace Collection
  • My Summary
  • Clustering of nodes that use the same resources
    is done by comparing those with the same weight.
    Minimizes distance between communicating nodes.
  • Put those nodes together in a larger and larger
    rectangle, placing the heavier edges together.
  • I would welcome a more technical explanation.

25
Trace-based simulation
  • Running time of the circuit created in the
    previous step consists of
  • Time to pass control between the source and the
    destination.
  • Time to pass data from source node to
    destination.
  • Time to execute the instructions.
  • Explanation given.

26
Simulation continued
  • Execution of an instruction
  • Elementary instruction 1 cycle.
  • Read proportional to Manhattan distance to
    address.
  • Write one cycle and asynchronously.
  • Floating-point operations use double latency as
    opponent (Alpha).

27
All programs will fit within a 1 cm2 nanoFabric.
28
Signal Propagation versus Performance
29
Runtime
30
Performance
The majority of the time is spent reading
memory. That is the result of having no caches.
  • Media applications outperformed SpecInt apps.

31
Placed Graph
  • White Square Code
  • Edges show communication pattern
  • Edge width is log of of messages sent.
  • Edge color types of messages
  • dark mem reads
  • light control transfers
  • Shows big stars code regions that
  • touch most of the memory.

32
Steps taken to reduce Stars
  • Inlined functions such as memcpy.
  • Used for only a subset of memory.
  • Inlining uses less than 1 more area and saves
    significant amounts of performance time (though a
    comparison is not given).
  • Did not help calls only made from one place.

33
Assumptions
  • The placed graph depends on the input data.
    Cannot know what locations and nodes will be
    used.
  • The inputs to a node do not necessarily come
    from its immediate predecessor.
  • Control and Data transfer can be done directly
    between the two nodes involved.
  • Each procedure has a statically allocated stack
    frame.
  • Completely ignore routability issues between
    nodes. Assumed that there is enough bandwidth.
  • Ignore propagation delay from molecular latches.

34
Assumptions
  • Only issue read operation when instruction is
    executed. Could happen sooner.
  • No effort to customize circuit for the
    application. 100x speedup when compiled properly.
  • Do not do speculative or parallel execution.
  • Nodes have to wait for inputs before execution.
  • By not disambiguating uses of freed and
    re-allocated memory word, constrain placement of
    the graph.
  • Fewer registers than possible meaning costly mem
    ops for replacement.
  • Tons of area for a memory cell.
  • What does that mean for their results?

35
Where to go
  • Data caching important for a reduction in the
    largest runtime cost (memory access).
  • More code restructuring to better utilize memory.
Write a Comment
User Comments (0)
About PowerShow.com