Optical Microprocessor Chips - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Optical Microprocessor Chips

Description:

Optical surfaces can be connected with strong bonds using specific types of optical solutions. ... access - all memory takes the same time to access (UMA) vs. ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 38
Provided by: vpop
Category:

less

Transcript and Presenter's Notes

Title: Optical Microprocessor Chips


1
Optical Microprocessor Chips
  • Mayank Bhatnagar

2
Understanding an Optical processor
  • Optical surfaces can be connected with strong
    bonds using specific types of optical solutions.
    Rigidity is 100 times higher than supper glue.
  • 2. Once the surfaces are glued together, the
    surface is treated with water via flow pump in
    order to verify the connections.
  • 3. Then the substrate of placed on the Opaline
    hetero photonic crystal and then the layers are
    transferred into the layer transfer bin.

3
Why Optical connections?
  • Optical transmission in and between microchips
    has been considered superior to the traditional
    electrical transmission technology.
  • The reason is that the electrical technology can
    be slower and has a power cost that rises with
    the distance traveled whereas, optical
    transmission can be faster and has a fixed power
    cost to connect to any point within the system.

4
Optical Microprocessors
  • Studies are being done on how one can use the
    optical signals inside the processor to send the
    information around at much higher speed.
  • Sun Microsystems, who has won 8.1 million dollar
    over five years contract from military, believes
    that they have the knowhow that necessary in
    order to make these type of chips as they are
    very sophisticatedly put together using optical
    components.

5
Optical Microprocessors continued
  • The idea is to use electrical chip-to-chip
    input/output technology to construct arrays of
    low-cost chips into a single virtual macrochip.
  • This assemblage would perform as one very large
    chip and would eliminate the conventional
    soldered-chip interconnections.
  • Long connections across the macrochip would
    leverage the low latency, high bandwidth and low
    power of silicon-made optics.

6
Advantages of Optical Transmission over
Electrical Ones
  • Lower material cost
  • Lower cost of transmitters and receivers
  • Capability to carry electrical power as well as
    signals (in specially-designed cables)

7
Disadvantages of Optical Transmission over
Electrical Ones
  • Optical Fibers are more difficult and expensive
    to splice.
  • At higher optical powers, Optical Fibers and
    other optics are susceptible to fiber fuse
    wherein a bit too much light meeting with an
    imperfection can destroy several meters per
    second.
  • The installation of fiber fuse detection circuit
    at the transmitter can break the circuit and halt
    the failure to minimize damage.

8
Final Notes
  • IBM builds optical switch for multi-core chips
  • The new nanotech switch, which is 100 times
    smaller than a human hair, is designed to enable
    researchers to build future chips that will have
    greater performance but use less energy.
  • They are also working on a project to shrink
    supercomputers down to the size of a laptop by
    replacing the electrical wiring that now connects
    multiple cores inside a microprocessor with
    pulses of light.
  • The company said the laptop supercomputers should
    be ready by 2020. Calling it a "breakthrough" in
    chip design.

9
Final Notes
  • They also said optical communication between the
    cores would dramatically cut a processor's energy
    needs and the heat it emits.
  • The new chips would require the energy needed to
    power a light bulb, while today's supercomputers
    need enough energy to power hundreds of homes,
    the company noted.
  • http//www.linuxworld.com.au/index.php/id19819404
    10
  • http//www.photonics.com/Content/ReadArticle.aspx?
    ArticleID33505

10
Multiprocessors
11
Multiprocessor motivation
  • Many scientific applications take too long to run
    on a single processor
  • These are parallel applications which consist of
    loops which operate on independent data. Need
    multiprocessor machine with each loop iteration
    running on a different processor, operating on
    independent data
  • Many multi-user environments require more compute
    power than available from a single processor
    machine (airline reservation system, inventory
    system, file server) These consist of largely
    parallel transactions which operate on
    independent data.
  • Multiprocessor machines should not be confused
    with multi-core processors, although some
    functionality is similar.

12
Multiprocessor performance
  • They assure high throughput for independent tasks
  • Or a single program can be running on several
    processors (parallel processing) but programming
    is more difficult and there is no portability of
    code.
  • The processors can be on a single bus or can be
    connected on a LAN (up to 256 processors).
  • Which is better?

13
Multiprocessor examples
  • Multiprocessors can be found in low-end PCs such
    as dual-processor Xeons or Macs

14
Multiprocessor history
15
Multiprocessor history
Sun Fire x4150 1U server
16
Multiprocessor history
Sun Fire x4150 1U server
4 cores each
16 x 4GB 64GB DRAM
17
I/O System Design Example
  • Given a Sun Fire x4150 system with
  • Workload 64KB disk reads
  • Each I/O op requires 200,000 user-code
    instructions and 100,000 OS instructions
  • Each CPU 109 instructions/sec
  • FSB 10.6 GB/sec peak
  • DRAM DDR2 667MHz 5.336 GB/sec
  • PCI-E 8 bus 8 250MB/sec 2GB/sec
  • Disks 15,000 rpm, 2.9ms avg. seek time,
    112MB/sec transfer rate
  • What I/O rate can be sustained?
  • For random reads, and for sequential reads

18
Design Example (cont)
  • I/O rate for CPUs
  • Per core 109/(100,000 200,000) 3,333
  • 8 cores 26,667 ops/sec
  • Random reads, I/O rate for disks
  • Assume actual seek time is average/4
  • Time/op seek latency transfer 2.9ms/4
    4ms/2 64KB/(112MB/s) 3.3ms
  • 303 ops/sec per disk, 2424 ops/sec for 8 disks
  • Sequential reads
  • 112MB/s / 64KB 1750 ops/sec per disk
  • 14,000 ops/sec for 8 disks

19
Design Example (cont)
  • PCI-E I/O rate
  • 2GB/sec / 64KB 31,250 ops/sec
  • DRAM I/O rate
  • 5.336 GB/sec / 64KB 83,375 ops/sec
  • FSB I/O rate
  • Assume we can sustain half the peak rate
  • 5.3 GB/sec / 64KB 81,540 ops/sec per FSB
  • 163,080 ops/sec for 2 FSBs
  • Weakest link disks
  • 2424 ops/sec random, 14,000 ops/sec sequential
  • Other components have ample headroom to
    accommodate these rates

20
Questions
  • How do parallel processors share data?
  • single address space - communication through lw
    and sw which needs synchronization
  • uniform memory access - all memory takes the same
    time to access (UMA) vs.
  • Non-uniform memory access (NUMA) which scales to
    larger sizes up to 256 processors
  • private memory - communication through message
    passing up to 256 processors
  • How do parallel processors coordinate?
    synchronization (locks, semaphores),built into
    send/receive primitives, operating system
    protocols
  • How are they implemented? connected by a single
    bus, or connected by a network

21
Multiprocessors on a single bus
  • Up to 32 processors can share a single bus.
  • Each processor has its own cache, but share same
    memory space
  • Each cache stores the same data. This reduces
    latency and reduces bus traffic.
  • They communicate through shared memory and have
    UMA
  • Use a single copy of the OS
  • But difficult to scale to large number of
    processors.

22
Shared memory multi-processors
  • Major design issues is cache coherency ensuring
    that stores to cached data are seen by other
    processors
  • Coherent reading If a cache misses, another
    cache can supply the data,
  • Coherent writing when one processor writes data
    into a shared block, all other copies of that
    block located in other caches need to either be
    invalidated or updated (depending on protocol).
  • Synchronization the coordination among
    processors accessing shared data
  • Memory consistency definition of when a
    processor must observe a write from another
    processor

23
Cache coherency problem
  • Two write-back caches becoming incoherent

(1) CPU 0 reads block A
(2) CPU 1 reads block A
(3) CPU 0 writes block A
24
Snooping
  • Cache controllers need to monitor or snoop on
    the bus to see if their cache has a copy of the
    block being written to by another CPU- a snoop
    tag.
  • Each cache has a duplicate of the address bits
    and a second read port on the bus.

25
Two Snooping Protocols
  • Write-invalidate protocol
  • Processor has exclusive data access before the
    write operation to a shared block.
  • Before the write the CPU sends an invalidation
    signal to all other caches gt they will miss on
    next read
  • Most common protocol, with reduced bus traffic
    which allows more processors on the bus.
  • Write-update (Write-broadcast)
  • Processor continuously sends updated copies of
    writes to all other caches
  • Has the advantage of reduced latency
  • Very infrequent - high bandwidth requirements due
    to large bus traffic.

26
Write-Invalidate Protocol
  • Simultaneous writes - the bus arbiter decides
    which processor is allowed
  • First CPU to obtain the bus will invalidate the
    line in the cache of the other one
  • Then the second CPU does the same to the first.
  • Write doesnt complete until the bus access is
    obtained
  • How do we locate the data on cache miss?
  • In write-through caches memory
  • In write-back more tricky, so we will deal with
    this in more detail (MESI protocol)
  • Write with no interleaved activity by other CPUs
    very efficient (no bus activity)

27
Cache coherence problem revisited
(1) CPU 0 reads block A
(2) CPU 1 reads block A
(3) CPU 0 invalidates
(4) CPU 0 writes block A
28
Multiprocessors on a network
  • Single-bus multiprocessor architecture has limits
    in terms of number of allowed processors due to
    limited bus and memory bandwidths.
  • Solution is to have more than one bus - a network
  • Network can connect to memory, which is
    physically distributed


29
Multiprocessors on a network
  • Network can connect above memory (Sun E10,000)
  • Shared memory machines connected together over a
    network operate as a distributed memory (or DSM
    machine)


30
Distributed memory
  • Distributed memory is the opposite of centralized
    memory.
  • It can have a single address space (called shared
    memory), or each processors can have its own
    memory address space (called private memory)
  • In the case of shared memory communication is
    done through loads and stores
  • In the case of private memory communication is
    done through message passing (send and receive)
    used to access another processors memory

31
Shared Memory
  • Non-uniform memory access (NUMA) shared memory
    multiprocessors
  • All memory can be addressed by all processors,
    but access to a processors own local memory is
    faster than access to another processors remote
    memory
  • Looks like a distributed machine, but
    interconnection network is usually
    custom-designed switches and/or buses
  • Commodity hardware of a distributed memory
    multiprocessor, but all processors have the
    illusion of shared memory
  • Operating system handles accesses to remote
    memory transparently on behalf of the
    application
  • Relieves application developer of the burden of
    memory management across the network

32
Characteristics of multiprocessor computers
Name Number Memory
Communication Topology
of processors size BW/link
.

Cray T3E 2048
524 GB 1200 MB/sec
3-D torus HP/Convex 64
65 GB 980 MB/sec
8-was crossbar SGI Origin 128
131 GB 800 MB/sec
ring SUN Enterprise 64
65 GB 1600 MB/sec 16-way
crossbar
33
Cache coherency for single address space
  • Since there are multiple busses, snooping will
    not work - we need an alternative - use of
    directories
  • The directory keeps the state of every block in
    memory, including the sharing status of that
    block.
  • The directory sends over the network explicit
    messages to every processor whose cache has that
    data.
  • There are two levels of coherence - at the cache
    level the original data is in memory and
    replicated in the caches that need it.


34
Cache coherency for single address space
  • The second level of coherence is at the memory
    level
  • Requires more hardware and the OS takes care of
    moving data at the page level.

  • Miss penalties are very large, since data needs
    to be brought over the network.
  • However, by moving pages, the miss rate is
    reduced (due to co-located data)

35
Message Passing
  • For machines with private memories (each CPU has
    its own memory and cache)
  • Message passing over a network is used in
    clusters (discussed next)
  • Good for parallel programming techniques
  • Using MPI (Message Passing Interface)
  • Visible to the programmer
  • Example - Sum 100,000 numbers with a
    network-connected multiprocessor with 100
    processors using multiple private memories
  • Steps
  • Distribute 100 subsets for partial sums
  • Do partial sums on each processor
  • Split CPUs in half, one side sends, one side
    receives and adds

36
Clusters
  • Connect several (or several hundred)
    off-the-shelf computers over network
  • Strengths - Cheaper, available all the time,
    expandable
  • Can achieve very good performance
  • Most of the time good enough
  • Since each machine has its own copy of the OS, it
    is much easier to isolate in case of failure
  • Weaknesses compared to bus multi-processors are
  • System administration costs are higher since
    there are n independent machines
  • The bus is slower (I/O bus is slower than
    backplane bus)
  • Smaller memory
  • Applications where cost/performance is important
    use hybrid clusters of multiprocessors

37
Characteristics of clusters vs. multiprocessors
Multiprocessor Number Memory
Communication Topology Name of
processors size BW/link
.

Cray T3E 2048
524 GB 1200 MB/sec
3-D torus HP/Convex 64
65 GB 980 MB/sec
8-was crossbar SGI Origin 128
131 GB 800 MB/sec
ring SUN Enterprise 64
65 GB 1600 MB/sec 16-way
crossbar
Cluster Number Memory
Communication Node type and Name of
processors size BW/link
number
.

Tandem NonStop 4096 1,048
GB 40 MB/sec 16-way SMP, 256 IBM
RS6000 SP2 512 1,048 GB
150 MB/sec 16-way node, 32 IBM RS6000
R40 16 4 GB
12 MB/sec 8-way SMP, 2 SUN Enterprise
60 61 GB
100 MB/sec 30-way SMP, 2 Cluster
Write a Comment
User Comments (0)
About PowerShow.com