Buses: connecting cores to CPU and Memory PowerPoint PPT Presentation

presentation player overlay
1 / 41
About This Presentation
Transcript and Presenter's Notes

Title: Buses: connecting cores to CPU and Memory


1
Buses connecting cores to CPU and Memory
  • Brendan Boulter
  • November 12, 2002

2
Connecting I/O devices to CPU and Memory
  • In a system-on-chip architecture, the various
    cores may need to communicate with each other.
  • CPU to memory
  • CPU to I/O devices
  • Device to device
  • This communication is performed by a bus.
  • The bus is a shared communication link between
    cores.

3
Bus advantages
  • Two major advantages of a bus organization are
  • Low cost a single set of wires is shared between
    multiple cores
  • Versatility new cores can be added easily and
    cores can be reused in a different design if the
    bus interface is the same

4
Bus disadvantages
  • The major disadvantage of a bus is that it
    creates a communications bottleneck.
  • A bus limits the maximum I/O throughput.
  • For high performance systems, designing a bus
    system capable of meeting the demands of the
    processor is a major challenge.

5
Bus classification (1)
  • Buses are traditionally classified as
  • CPU-memory buses or
  • I/O buses
  • I/O buses
  • may have many types of devices connected to them
  • have a wide range in the data bandwidth of the
    connected devices
  • normally follow a bus standard

6
Bus classification (2)
  • CPU-memory buses
  • Are generally high speed
  • Matched to the memory system to maximize memory
    to CPU bandwidth
  • The designer of a CPU-memory bus knows all the
    types of devices that must connect together (at
    the design stage).
  • The I/O bus designer must allow for devices of
    varying latency and bandwidth characteristics.

7
Bus transaction
  • A bus transaction involves two parts
  • Sending the address
  • Sending/receiving the data
  • Bus transactions are normally defined by what
    they do to memory
  • A read transaction transfers data from memory to
    either the CPU or an I/O device
  • A write transaction writes data to memory

8
Typical bus read transaction (1)
Clock
Address
Data
Read
Wait
Figure 1 a typical bus read transaction
9
Typical bus read transaction (2)
  • In a read transaction, the address is first sent
    down the bus to memory, together with the
    appropriate control signals indicating a read.
  • In figure 1, a read is indicated by deasserting
    the read signal
  • The memory responds by returning the data on the
    bus, together with the appropriate control
    signals.
  • In this case, this is indicated by deasserting
    the wait signal.

10
Bus design decisions
  • The design of the bus presents many options to
    the designer.
  • Decisions depend on the design goal
  • Cost
  • Performance
  • Typical options and their impact on cost and
    performance are outlined in figure 2

11
Bus design decisions figure 2
12
Bus design decisions
  • The first three options in figure 2 are clear and
    obvious
  • Separate address and data lines, wider data lines
    and multiple word transfers all give higher
    performance at greater cost.
  • The next item concerns the number of bus masters.
  • A bus master is a device that can initiate a read
    or a write transaction

13
Bus masters
  • A bus has multiple masters when there are
    multiple CPUs in the system or when I/O devices
    can initiate a bus transaction.
  • If there are multiple masters, an arbitration
    scheme is required to decide who gets
    access/control of the bus next.
  • Arbitration is often implemented as
  • a fixed priority scheme (round-robin)
  • an approximately fair scheme that randomly
    chooses which master gets the bus

14
Split transaction bus (1)
  • With multiple masters, a bus can offer higher
    bandwidth by going to packets, as opposed to
    holding the bus for a full transaction.
  • This technique is called split transactions.
  • The read transaction is now split into
  • A read request transaction that contains the
    address
  • A memory reply transaction that contains the data.

15
Split transaction bus (2)
  • On a split transaction bus, each transaction must
    by tagged so that the processor and memory can
    tell what it is.
  • Split transactions make the bus available for
    other masters while the memory reads the words
    from the requested address.
  • The CPU must arbitrate for the bus in order to
    send the data and the memory must arbitrate in
    order to return the data.
  • The split transaction bus has higher bandwidth
    but at the cost of higher latency than a
    non-split bus.

16
Clocking (1)
  • Clocking concerns whether a bus is synchronous or
    asynchronous.
  • A synchronous bus includes a clock in the control
    lines and a fixed protocol for address and data
    relative to the clock.
  • Since little or no logic is required to decide
    what to do next, a synchronous bus is both fast
    and inexpensive.
  • Two major disadvantages
  • Everything on the bus must run at the same clock
    rate
  • Due to clock-skew, the bus cannot be long.
  • CPU-memory buses are typically synchronous.

17
Clocking (2)
  • Asynchrony makes it easier to accommodate a wide
    variety of devices and to lengthen the bus
    without worrying about clock-skew or
    synchronization problems.
  • With an asynchronous bus, there is an overhead
    associated with synchronizing the bus with each
    transaction.
  • Asynchronous buses scale better with
    technological changes.
  • I/O buses are typically asynchronous.

18
Clocking (3)
Long
Asynchronous is better
Clock skew (function of bus length)
Synchronous is better
Short
Similar
Varied
Mixture of I/O device speeds
19
Bus standards I/O buses
20
Bus standards CPU-memory buses
21
The memory system
  • The amount of time it takes to read or write a
    memory location is called the memory access time.
  • A related quantity is the memory cycle time.
  • The access time measures how quickly you can read
    a memory location.
  • The cycle time measure how quickly you can repeat
    memory references.
  • For example you can ask for data from a DRAM
    chip and receive it within 50 ns, but it may be
    100 ns before you can ask for more data.

22
The memory system vs the CPU
  • In the early 1980s, the access time of commodity
    DRAMs (200ns) was shorter than the processor
    clock cycle (4.77MHz 210ns)
  • This meant that DRAM could be connected to the
    system without worrying about over running the
    memory system.
  • However, CPUs have become faster a lot faster !
  • Wait states were added to make the memory system
    speed appear to match the processor speed.

23
The memory system vs the CPU
CPU speed
1000
100
10
DRAM speed
1
1975
1980
1985
1990
1995
2000
2005
2010
24
Memory hierarchies
  • The clock time for commodity processors has gone
    from 210 ns to less than 1 ns for 1 GHz
    processors.
  • However the access time for commodity DRAMs has
    decreased disproportionately less from around
    200 ns to around 50 ns.
  • We could use fast SRAM, but this would be very
    expensive.
  • The solution is to use a hierarchy of memories
  • Registers
  • 1-3 levels of SRAM cache
  • DRAM main memory

25
Cache memory
  • Caches are small amounts of SRAM that store a
    subset of the contents of the memory.
  • The actual cache architecture has had to change
    as the cycle time of the processors has improved.
  • Processors are now so fast that off-chip SRAM
    chips are not fast enough.
  • This has lead to a multilevel cache architecture.

26
The DEC 21164 memory system
Memory access speed on the DEC 21164 Alpha
27
Cache effectiveness
  • When every memory reference can be found in the
    cache, we have a 100 hit rate.
  • The hit rate of an application depends on a lot
    of factors
  • The algorithm and its locality of reference
  • The compiler, linker and other software tools
  • The availability of tuned libraries
  • The cache implementation method
  • When the hit rate is high, the system operates
    near the speed of the top of the hierarchy.
  • When the hit rate is low, the system operates
    near the speed of the bottom of the hierarchy.

28
Cache organization direct mapped
  • The process of pairing memory locations with
    cache lines is called mapping.
  • The simplest method is direct mapping.

Main memory
0
4K
8K
0
4K
4K cache
29
Direct mapped cache
  • Memory location 0, 4K, 8K, etc map into the same
    location in cache.
  • This can cause problems when alternating runtime
    memory references point to the same cache line.

real4 a(1024), b(1024) common /arrays/ a, b do i
1, 1024 a(i) b(i) c(i) end do
  • Each reference causes a cache miss ? thrashing

30
Fully associative cache
  • At the other extreme, fully associative cache
    allows any memory location to be mapped into any
    cache line.
  • It is difficult to find real-world examples of
    programs that will cause cache thrashing in fully
    associative cache.
  • However fully associative cache is expensive in
    terms of size, price and speed.

31
Set associative cache
  • Set associative cache is composed of a number of
    sets of direct mapped cache.
  • Common choices are 2- and 4-way set associativity.

Memory reference
  • In the previous example, references to the a
    array might be stored in set 1. Subsequent
    references to b would be stored in set 2.

32
Interfacing to the processor
CPU-memory bus
Main memory
Bus adapter
Cache
CPU
I/O bus
I/O controller
I/O controller
I/O controller
Network
Disk
Graphics output
33
Interfacing to the processor
  • Two methods of interfacing to an I/O device
  • Memory mapped
  • I/O mapped
  • For a memory mapped device, portions of the
    address space are assigned to I/O space.
  • Reads/writes to these addresses cause data to be
    transferred
  • Some part of the address space may also be
    reserved for control signals

34
Interfacing to the processor
  • The alternative is to use dedicated I/O opcodes.
  • Such devices are known as I/O mapped devices.
  • Intel 80x86 processor uses I/O mapped devices and
    special opcodes to communicate with these
    devices.
  • I/O mapping is less popular than memory mapping.

35
Controlling an I/O device
  • I/O devices typically have control and status
    registers.
  • The processor can control the device using two
    methods
  • Programmed I/O the processor periodically checks
    the status registers for completion of the
    transaction. This method puts a burden on the
    processor.
  • Interrupt-driven I/O allows the processor to work
    on some other process while waiting for I/O to
    complete. The I/O device raises an interrupt on
    completion of the transaction.

36
Direct memory access
  • Interrupt-driven I/O relieves the processor from
    waiting for every I/O event but
  • There may be a significant amount of CPU cycles
    required to move data.
  • Transferring a disk block of 2048 words might
    require 2048 reads and 2048 stores as well as the
    overhead for the interrupt.
  • Direct memory access (DMA) can help to relieve
    the processor from the burden of bulk data
    movement.

37
Direct memory access
  • A DMA engine is a specialized processor that
    transfers data between memory and an I/O device
    allowing the processor to work on other tasks.
  • A DMA engine is external to the CPU and must act
    as a bus master.
  • The processor writes the start address and number
    of words to the DMA control registers.
  • The DMA interrupts the processor when the
    transfer is complete.
  • There may be multiple DMA devices in a system.

38
Advance Microcontroller Bus Architecture
  • A system on chip (SoC) design consists of a
    collection of cores and an interconnection
    scheme.
  • Using an ad-hoc scheme each time wastes design
    cycles.
  • ARMs AMBA can be used to standardize the on-chip
    connections of different cores.
  • Use of a standard bus facilitates design reuse.

39
AMBA
  • Three buses are defined within the AMBA
    specification
  • The Advanced High-Performance Bus (AHB)
  • The Advanced System Bus (ASB)
  • The Advanced Peripheral Bus (APB)
  • A typical system will incorporate either an AHB
    or ASB together with an APB.
  • The ASB is the older form of the system bus.
  • The AHB was introduced to provide improved
    support for high performance, synthesis and
    timing verification.

40
AMBA system architecture
ARM processor core
On-chip RAM
External bus interface
AHB or ASB
DMA controller
Bridge
Test I/f
UART
APB
Timer
Parallel I/f
41
Arbitration
  • A bus transaction is initiated by a bus master
    which requests access from a central arbiter.
  • The arbiter decides priorities when there are
    conflicting requests.
  • The design of the arbiter is a system specific
    issue.
  • The ASB only specifies the protocol
  • The master issues a request to the arbiter
  • When the bus is available, the arbiter issues a
    grant to the master.
Write a Comment
User Comments (0)
About PowerShow.com