Davies Muche - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Davies Muche

Description:

More on Dual Voltage Design ... A split rail processor uses two different voltages. ... The internal or core voltage is lower: usually 2.5 to 2.9 volts. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 59
Provided by: mgn
Category:
Tags: davies | muche

less

Transcript and Presenter's Notes

Title: Davies Muche


1
Computer Architecture
  • Davies Muche
  • Mike Li Luo
  • CS521 Spring 2003

2
What is a digital computer ?
  • A digital computer is a machine composed of the
    following three basic components
  • - Input/Output
  • - Central Processing Unit (CPU)
  • - Memory

3
Early Computers
  • As early as the 1600s Calculating machines which
    could do Arithmetic operations had been made,
    but, non had the three basic components of a
    digital computer
  • In 1823, Charles Babbage undertook the design of
    the Difference Engine
  • The machine was to solve 6th Degree polynomials
    to 20 digit accuracy

4
  • the concepts of mechanical control and mechanical
    calculation put together into a machine that has
    the basic parts of a digital computer
  • He was given 17,000 Pounds to construct the
    machine but, the project was abandoned in 1842
    (uncompleted)
  • 1856, Babbage conceived the idea of the
    Analytical Machine (After his death his son Henry
    tried to build it but never succeeded)
  • In 1854, George Scheutz, built a working
    Difference machine based on Babbages design.
    (This machine printed mathematical, astronomical
    and actuarial tables with unprecedented accuracy,
    and was used by the British and American
    governments)

5
Between 1847 and 1849 Babbage designed the
Difference Engine No.2. He did not built it
Difference Engine No.1
6
  • However, in 1834, Charles Babbage, developed the
    hypothetical program to solve simultaneous
    equations on the Analytical Machine

7
  • The John von Neumann Architecture consists of
    five major components (1940s)

8
  • A refinement of the von Neumann model, the system
    bus model has a CPU (ALU and control), memory,
    and an input/output unit

9
(No Transcript)
10
The CPU
  • CPU (central processing unit) is an older term
    for processor and microprocessor, the central
    unit in a computer containing the logic circuitry
    that performs the instructions of a computer's
    programs. 
  • NOTABLE TYPES
  • - RISC Reduced Instruction Set Computer
  • -Introduced in the mid 1980s
  • -Requires few transistors
  • -capable of executing only a very limited set
    of
  • instructions
  • - CISC Complex Instruction Set Computer
  • -complex CPUs that had ever-larger sets
    of instructions

11
RISC or CISC The great Controversy
  • RISC proponents argue that RISC machines are both
    cheaper and faster, and are therefore the
    machines of the future.
  • Skeptics note that by making the hardware
    simpler, RISC architectures put a greater burden
    on the software. They argue that this is not
    worth the trouble because conventional
    microprocessors are becoming increasingly fast
    and cheap anyway.
  • The TRUTH!
  • CISC and RISC implementations are becoming more
    and more alike. Many of today's RISC chips
    support as many instructions as yesterday's CISC
    chips. And today's CISC chips use many techniques
    formerly associated with RISC chips.

12
Under the hood of a typical CPU
13
What you need to Know about a CPU
  • Processing speed
  • - The clock Frequency is one measure of how fast
    a computer is ( however, the length of time to
    carry out an operation depends not only on how
    fast the processor cycles, but how many cycles
    are required to perform a given operation.
  • Voltage requirement
  • Transistors (electronic switches) in the CPU
    requires some voltage to trigger them.
  • - In the pre-486DX66 days, everything was 5
    volts
  • - As chips got faster and power became a
    concern,
  • designers dropped the chip voltage down to
    3.3 volts (external Voltage) and 2.9V or 2.5V
    core voltage

14
More on Voltage Requirements
  • Power consumption equates largely with heat
    generation, which is a primary enemy in achieving
    increased performance. Newer processors are
    larger and faster, and keeping them cool can be a
    major concern.
  • Reducing power usage is a primary objective for
    the designers of notebook computers, since they
    run on batteries with a limited life. (They also
    are more sensitive to heat problems since their
    components are crammed into such a small space).
  • Compensate for by using lower-power semiconductor
    processes, and shrinking the circuit size and die
    size. Newer processors reduce voltage levels even
    more by using what is called a dual voltage, or
    split rail design

15
More on Dual Voltage Design
  • A split rail processor uses two different
    voltages.
  • The external or I/O voltage is higher, typically
    3.3V for compatibility with the other chips on
    the motherboard.
  • The internal or core voltage is lower usually
    2.5 to 2.9 volts. This design allows these
    lower-voltage CPUs to be used without requiring
    wholesale changes to motherboards, chipsets etc.

16
Power consumption verses speed of some processors
17
MEMORY
  • Computers have hierarchies of memories that may
    be classified according to Function, Capacity and
    Response Times.
  • -Function
  • "Reads" transfer information from the memory
    "Writes" transfer information to the memory
  • -Random Access Memory (RAM) performs both
    reads and writes.
  • -Read-Only Memory (ROM) contains
    information stored at the
  • time of manufacture that can only be
    read.
  • -Programmable Read-Only Memory (PROM) is ROM
    that can be written once
  • at some point after manufacture.
  • -Capacity
  • bit smallest unit of memory (value of 0 or
    1)
  • byte 8 bits
  • In modern computers, the total memory may range
    from say 16 MB in a small personal computer to
    several GB (gigabytes) in large supercomputers.

18
More on memory
  • Memory Response
  • Memory response is characterized by two
    different measures
  • -Access Time (also termed response time or
    latency) defines how quickly the memory can
    respond to a read or write request.
  • -Memory Cycle Time refers to the minimum period
    between two successive requests of the memory.
  • -Access times vary from about 80 ns ns
    nanosecond 10(-9) seconds for chips in small
    personal computers to about 10 ns or less for the
    fastest chips in caches and buffers. For various
    reasons, the memory cycle time is more than the
    speed of the memory chips (i.e., the length of
    time between successive requests is more than the
    80 ns speed of the chips in a small personal
    computer).

19
(No Transcript)
20
The I/O BUS
  • A Computer transfers data from disk to CPU, from
    CPU to memory, or from memory to the display
    adapter etc.
  • To avoid having a separate circuits between
    every pair of devices, the BUS is used.
  • Definition
  • The Bus is simply a common set of wires that
    connect all the computer devices and chips
    together

21
Different functions for Different wires of the bus
  • Some of these wires are used to transmit data.
  • Some send housekeeping signals, like the clock
    pulse. Some transmit a number (the "address")
    that identifies a particular device or memory
    location
  • Use of the address
  • The computer chips and devices watch the
    address wires and respond when their identifying
    number (address) is transmitted before they can
    transfer data
  • Problem!
  • Starting with machines that used the 386 CPU,
    CPUs and memory ran faster than other I/O devices
  • Solution
  • - Separate the CPU and memory from all the I/O.
    Today, memory is only added by plugging it into
    special sockets on the main computer board.

22
Bus Speeds
  • Multiple Buses with different speeds is an option
    or a single bus supporting different speeds is
    used
  • In a modern PC, there may be a half dozen
    different Bus areas.
  • There is certainly a "CPU area" that still
    contains the CPU, memory, and basic control
    logic.
  • There is a "High Speed I/O Device" area that is
    either a VESA Local Bus (VLB) or an PCI Bus

23
Some Bus Standards
  • ISA (Industry Standard Architecture) bus
  • In 1987 IBM introduced a new Microchannel (MCA)
    bus
  • The other vendors developed an extension of the
    older ISA interface called EISA
  • VESA Local Bus (VLB), which became popular at the
    start of 1993

24
More Bus Standards
  • The PCI bus was developed by Intel
  • PCI is a 64 bit interface in a 32 bit package
  • The PCI bus runs at 33 MHz and can transfer 32
    bits of data (four bytes) every clock tick.
  • That sounds like a 32-bit bus! However, a clock
    tick at 33 MHz is 30 nanoseconds, and memory only
    has a speed of 70 nanoseconds. When the CPU
    fetches data from RAM, it has to wait at least
    three clock ticks for the data. By transferring
    data every clock tick, the PCI bus can deliver
    the same throughput on a 32 bit interface that
    other parts of the machine deliver through a 64
    bit path.

25
Things to know about I/O Bus
  • Buses transfer information between parts of a
    computer. Smaller computers have a single bus
    more advanced computers have complex
    interconnection strategies.
  • Things to know about the bus
  • Transaction Unit of communication on bus.
  • Bus Master The module controlling the bus at a
    particular time.
  • Arbitration Protocol Set of signals exchanged
    to decide which of two competing modules will
    control a bus at a particular time.
  • Communication Protocol Algorithm used to
    transfer data on the bus.
  • Asynchronous Protocol Communication algorithm
    that can begin at any time requires overhead to
    notify receivers that transfer is about to begin.

26
Things to know about the bus continued
  • Synchronous Protocol Communication algorithm
    that can begin only at well-know times defined by
    a global clock.
  • Transfer Time Time for data to be transferred
    over the bus in single transaction.
  • Bandwidth Data transfer capacity of bus
    usually expressed in bits per second (bps).
    Sometimes termed throughput.
  • Bandwidth and Transfer Time measure related
    things, but bandwidth takes into account required
    overheads and is usually a more useful measure of
    the speed of the bus.

27
Supercomputer Architecture
  • Background
  • Architecture
  • Approaches
  • Trends
  • Challenges

28
What is parallel computing
  • Use of multiple computers or processors working
    together to do a common task.
  • Each processor works on its section of the
    problem
  • Processors are allow to exchange information with
    other processors

29
Why parallel computing
  • Limits of single computer
  • Available memory
  • Performance
  • Parallel computing allows
  • Solve problems that dont fit on a single
    computer
  • Solve problems that cant be solve in the
    reasonable time

30
First Supercomputer
  • 1976, first supercomputer, the Cray-1
  • It had a speed of tens of megaflops (one megaflop
    equals a million floating-point operations per
    second) and a memory capacity of 4 megabytes.
  • Contribution from Los Alamos Lab, and Seymour
    Cray
  • Less than the average speed of PC today

31
Growing Speed
  • The performance of the fastest computers has
    grown exponentially from 1945 to the present,
    averaging a factor of 10 every five years
  • Tens of floating-point operations per second, the
    parallel computers of the mid-1990s achieve tens
    of billions of operations per second

32
Pipeline
  • Pipeline start performing an operation on one
    piece of data while finishing the same operation
    on another piece of data
  • An operation consists of multiple stages.
  • After a set of operands complete a particular
    stage, they move into the next stage.
  • Then, another set of operands can move into the
    stage that was just abandoned.

33
SuperPipeline
  • Superpipeline perform multiple pipelined
    operations at the same time
  • So, a superpipeline is a collection of multiple
    pipelines that can operate simultaneously.
  • In other words, several different operations can
    execute simultaneously, and each of these
    operations can be broken into stages, each of
    which is filled all the time.
  • So you can get multiple operations per CPU cycle.
  • For example, a IBM Power4 can have over 200
    different operations in flight at the same time.

34
Sample of superpipeline design
35
Drawbacks for pipeline architecture---Pipeline
Hazards
  • structural hazards attempt to use the same
    resource two different ways at the same time
  • e.g., multiple memory accesses, multiple
    register writes
  • solutions multiple memories, stretch pipeline
  • control hazards attempt to make a decision
    before condition is evaluated
  • e.g., any conditional branch
  • solutions prediction, delayed branch
  • data hazards attempt to use item before it is
    ready
  • solutions forwarding/bypassing

36
Memory
  • shared memory system, there is one large virtual
    memory, and all processors have equal access to
    data and instructions in this memory.

37
Memory cont
  • distributed memory, in which each processor has a
    local memory that is not accessible from any
    other processor.

38
Difference of two kind f memories
  • Software issue not hardware
  • The difference determines how different parts of
    a parallel program will communicate.
  • shared memory with semaphores, etc. or
    distributed memory with message passing.
  • All problems run efficiently on a distributed
    memory BUT software is easier to develop

39
Cache Coherency
40
Styles of parallel computing (Hardware
Architecture)
  • SISD-single instruction stream, single data
    stream
  • SIMD-single instruction stream, multiple data
    streams
  • MISD-multiple instruction streams, single data
    stream
  • MIMD-multiple instruction streams, multiple data
    streams

41
SISD
  • Single Instruction, Single Data

42
SIMD
  • Single Instruction, Multiple Data

43
MISD
  • Multiple Instruction, Single Data

44
MIMD
  • Multiple Instruction, Multiple Data(simplest
    program controlled message passing)

45
Two parallel processing approaches
  • SMP symmetric multiprocessing
  • SMP is the processing of programs by multiple
    processors that share a common operating system
    and memory
  • MPP massively parallel processing
  • MPP is the coordinated processing of a program by
    multiple processors that work on different parts
    of the program, with each processor using its own
    operating system and memory

46
Current Trend
  • OpenMPOpenMP is an open standard for providing
    parallelization mechanisms on shared-memory
    multiprocessors.
  • C/C and FORTRAN, several of the most commonly
    used languages for writing parallel programs.
  • based on a thread paradigm

47
OpenMP execution model
48
New Trend
  • Clustering
  • The Widest Definition
  • Any number of computers communicating at any
    distance
  • The Common Definition
  • A relatively small number of computers (lt1000)
    communicating at a relatively small distance
    (within the same room) and used asa single,
    shared computing resource

49
Comparison
  • Programming
  • A Program written for Cluster Parallelism can run
    on an SMP right away
  • A Program written for an SMP can NOT run on a
    Cluster right away
  • Scalability
  • Clusters are Scalable
  • SMPs are NOT Scalable above a Small Number of
    Processors

50
Comparison cont..
  • One big advantage of SMPs is the Single System
    Image
  • Easier Administration and Support
  • But, Single Point of Failure
  • Cluster computing can be used for load balancing
    as well as for high availability

51
General highlights from Top 500
  • The Earth Simulator build by NEC remains the
    unchallenged 1.
  • 100 systems have peak performance above 1 TFlop/s
    up from 70 systems 6 month ago
  • PC Cluster are now present at all levels of
    performance
  • IBM is still leading the list with respect to the
    installed performance ahead of HP and NEC
  • Hewlett-Packard stays slightly ahead of IBM with
    respect to the number of systems installed (HP
    137 and IBM 131)

52
NEC Earth-Simulator/ 5120 from Japan
53
Basic Idea/Component
  • Environment Research
  • The Earth Simulator consists of 640
    supercomputers that are connected by a high-speed
    network (data transfer speed 12.3 GBytes). Each
    supercomputer (1 node) contains eight vector
    processors with a peak performance of 8GFlops and
    a high-speed memory of 16 GBytes. The total
    number of processors is 5120 (8 x 640), which
    translates to a total of approximately 40 TFlops
    peak performance, and a total main memory of 10
    TeraBytes.

54
Hewlett-Packard SuperDome supercomputer
55
Terms need to know
  • flops Acronym for floating-point operations per
    second. Note For example, 15 Mflops equals 15
    million floating-point arithmetic operations per
    second. It is a unit of measurement of the
    performance of a computer
  • LINPACK is a collection of Fortran subroutines
    that analyze and solve linear equations and
    linear least-squares problems.
  • Rmax----431.70 (Maximal LINPACK performance
    achieved )Rpeak----672.00 (Theoretical)

56
Challenges
  • Faster algorithms
  • Good data locality
  • Low communication requirement
  • Efficient software
  • High level problem solving environment
  • Changes of architecture

57
Reference
  • power comsuption of processor -
    http//www.macinfo.de/hardware/strom.html
  • Under the hood - http//www.kids-online.net/learn/
    clicknov/details/cpu.html
  • Difference Machine and Charles Babbage-
    http//www.cbi.umn.edu/exhibits/cb.html
  • John Von Neumann - http//ei.cs.vt.edu/history/Vo
    nNeumann.html
  • I/O - http//sophia.dtp.fmph.uniba.sk/pchardware/b
    us.html
  • cpu memory- http//csep1.phy.ornl.gov/guidry/phy
    s594/lectures/lectures.html
  • memory - http//www.howstuffworks.com/computer-mem
    ory.htm
  • general idea -
  • http//www.ccs.uky.edu/douglas/Classes/cs521-s02/
    index.html http//www.ccs.uky.edu/dougla
    s/Classes/cs521-s01/index.html
    http//www.ccs.uky.edu/douglas/Classes/cs521-s00/
    index.html
  • voltage - http//www.hardwarecentral.com/hardwarec
    entral/tutorials/19/1/http//www.hardwarecentral.c
    om/hardwarecentral/tutorials/19/1/
  • csep-http//www.ccs.uky.edu/csep/csep.html
  • top500 -http//www.top500.org
  • cray co. -http//www.cray.com/company/h_systems.ht
    ml
  • definition of terms-htt//www.whatis.com

58
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com