Davies Muche

About This Presentation

Title:

Davies Muche

Description:

More on Dual Voltage Design ... A split rail processor uses two different voltages. ... The internal or core voltage is lower: usually 2.5 to 2.9 volts. ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 59

Provided by: mgn

Category:

more less

Transcript and Presenter's Notes

Title: Davies Muche

1
Computer Architecture

Davies Muche
Mike Li Luo
CS521 Spring 2003

2
What is a digital computer ?

A digital computer is a machine composed of the
following three basic components
- Input/Output
- Central Processing Unit (CPU)
- Memory

3
Early Computers

As early as the 1600s Calculating machines which
could do Arithmetic operations had been made,
but, non had the three basic components of a
digital computer
In 1823, Charles Babbage undertook the design of
the Difference Engine
The machine was to solve 6th Degree polynomials
to 20 digit accuracy

the concepts of mechanical control and mechanical
calculation put together into a machine that has
the basic parts of a digital computer
He was given 17,000 Pounds to construct the
machine but, the project was abandoned in 1842
(uncompleted)
1856, Babbage conceived the idea of the
Analytical Machine (After his death his son Henry
tried to build it but never succeeded)
In 1854, George Scheutz, built a working
Difference machine based on Babbages design.
(This machine printed mathematical, astronomical
and actuarial tables with unprecedented accuracy,
and was used by the British and American
governments)

5
Between 1847 and 1849 Babbage designed the
Difference Engine No.2. He did not built it
Difference Engine No.1
6

However, in 1834, Charles Babbage, developed the
hypothetical program to solve simultaneous
equations on the Analytical Machine

The John von Neumann Architecture consists of
five major components (1940s)

A refinement of the von Neumann model, the system
bus model has a CPU (ALU and control), memory,
and an input/output unit

9
(No Transcript)
10
The CPU

CPU (central processing unit) is an older term
for processor and microprocessor, the central
unit in a computer containing the logic circuitry
that performs the instructions of a computer's
programs.
NOTABLE TYPES
- RISC Reduced Instruction Set Computer
-Introduced in the mid 1980s
-Requires few transistors
-capable of executing only a very limited set
of
instructions
- CISC Complex Instruction Set Computer
-complex CPUs that had ever-larger sets
of instructions

11
RISC or CISC The great Controversy

RISC proponents argue that RISC machines are both
cheaper and faster, and are therefore the
machines of the future.
Skeptics note that by making the hardware
simpler, RISC architectures put a greater burden
on the software. They argue that this is not
worth the trouble because conventional
microprocessors are becoming increasingly fast
and cheap anyway.
The TRUTH!
CISC and RISC implementations are becoming more
and more alike. Many of today's RISC chips
support as many instructions as yesterday's CISC
chips. And today's CISC chips use many techniques
formerly associated with RISC chips.

12
Under the hood of a typical CPU
13
What you need to Know about a CPU

Processing speed
- The clock Frequency is one measure of how fast
a computer is ( however, the length of time to
carry out an operation depends not only on how
fast the processor cycles, but how many cycles
are required to perform a given operation.
Voltage requirement
Transistors (electronic switches) in the CPU
requires some voltage to trigger them.
- In the pre-486DX66 days, everything was 5
volts
- As chips got faster and power became a
concern,
designers dropped the chip voltage down to
3.3 volts (external Voltage) and 2.9V or 2.5V
core voltage

14
More on Voltage Requirements

Power consumption equates largely with heat
generation, which is a primary enemy in achieving
increased performance. Newer processors are
larger and faster, and keeping them cool can be a
major concern.
Reducing power usage is a primary objective for
the designers of notebook computers, since they
run on batteries with a limited life. (They also
are more sensitive to heat problems since their
components are crammed into such a small space).
Compensate for by using lower-power semiconductor
processes, and shrinking the circuit size and die
size. Newer processors reduce voltage levels even
more by using what is called a dual voltage, or
split rail design

15
More on Dual Voltage Design

A split rail processor uses two different
voltages.
The external or I/O voltage is higher, typically
3.3V for compatibility with the other chips on
the motherboard.
The internal or core voltage is lower usually
2.5 to 2.9 volts. This design allows these
lower-voltage CPUs to be used without requiring
wholesale changes to motherboards, chipsets etc.

16
Power consumption verses speed of some processors
17
MEMORY

Computers have hierarchies of memories that may
be classified according to Function, Capacity and
Response Times.
-Function
"Reads" transfer information from the memory
"Writes" transfer information to the memory
-Random Access Memory (RAM) performs both
reads and writes.
-Read-Only Memory (ROM) contains
information stored at the
time of manufacture that can only be
read.
-Programmable Read-Only Memory (PROM) is ROM
that can be written once
at some point after manufacture.
-Capacity
bit smallest unit of memory (value of 0 or
1)
byte 8 bits
In modern computers, the total memory may range
from say 16 MB in a small personal computer to
several GB (gigabytes) in large supercomputers.

18
More on memory

Memory Response
Memory response is characterized by two
different measures
-Access Time (also termed response time or
latency) defines how quickly the memory can
respond to a read or write request.
-Memory Cycle Time refers to the minimum period
between two successive requests of the memory.
-Access times vary from about 80 ns ns
nanosecond 10(-9) seconds for chips in small
personal computers to about 10 ns or less for the
fastest chips in caches and buffers. For various
reasons, the memory cycle time is more than the
speed of the memory chips (i.e., the length of
time between successive requests is more than the
80 ns speed of the chips in a small personal
computer).

19
(No Transcript)
20
The I/O BUS

A Computer transfers data from disk to CPU, from
CPU to memory, or from memory to the display
adapter etc.
To avoid having a separate circuits between
every pair of devices, the BUS is used.
Definition
The Bus is simply a common set of wires that
connect all the computer devices and chips
together

21
Different functions for Different wires of the bus

Some of these wires are used to transmit data.
Some send housekeeping signals, like the clock
pulse. Some transmit a number (the "address")
that identifies a particular device or memory
location
Use of the address
The computer chips and devices watch the
address wires and respond when their identifying
number (address) is transmitted before they can
transfer data
Problem!
Starting with machines that used the 386 CPU,
CPUs and memory ran faster than other I/O devices
Solution
- Separate the CPU and memory from all the I/O.
Today, memory is only added by plugging it into
special sockets on the main computer board.

22
Bus Speeds

Multiple Buses with different speeds is an option
or a single bus supporting different speeds is
used
In a modern PC, there may be a half dozen
different Bus areas.
There is certainly a "CPU area" that still
contains the CPU, memory, and basic control
logic.
There is a "High Speed I/O Device" area that is
either a VESA Local Bus (VLB) or an PCI Bus

23
Some Bus Standards

ISA (Industry Standard Architecture) bus
In 1987 IBM introduced a new Microchannel (MCA)
bus
The other vendors developed an extension of the
older ISA interface called EISA
VESA Local Bus (VLB), which became popular at the
start of 1993

24
More Bus Standards

The PCI bus was developed by Intel
PCI is a 64 bit interface in a 32 bit package
The PCI bus runs at 33 MHz and can transfer 32
bits of data (four bytes) every clock tick.
That sounds like a 32-bit bus! However, a clock
tick at 33 MHz is 30 nanoseconds, and memory only
has a speed of 70 nanoseconds. When the CPU
fetches data from RAM, it has to wait at least
three clock ticks for the data. By transferring
data every clock tick, the PCI bus can deliver
the same throughput on a 32 bit interface that
other parts of the machine deliver through a 64
bit path.

25
Things to know about I/O Bus

Buses transfer information between parts of a
computer. Smaller computers have a single bus
more advanced computers have complex
interconnection strategies.
Things to know about the bus
Transaction Unit of communication on bus.
Bus Master The module controlling the bus at a
particular time.
Arbitration Protocol Set of signals exchanged
to decide which of two competing modules will
control a bus at a particular time.
Communication Protocol Algorithm used to
transfer data on the bus.
Asynchronous Protocol Communication algorithm
that can begin at any time requires overhead to
notify receivers that transfer is about to begin.

26
Things to know about the bus continued

Synchronous Protocol Communication algorithm
that can begin only at well-know times defined by
a global clock.
Transfer Time Time for data to be transferred
over the bus in single transaction.
Bandwidth Data transfer capacity of bus
usually expressed in bits per second (bps).
Sometimes termed throughput.
Bandwidth and Transfer Time measure related
things, but bandwidth takes into account required
overheads and is usually a more useful measure of
the speed of the bus.

27
Supercomputer Architecture

Background
Architecture
Approaches
Trends
Challenges

28
What is parallel computing

Use of multiple computers or processors working
together to do a common task.
Each processor works on its section of the
problem
Processors are allow to exchange information with
other processors

29
Why parallel computing

Limits of single computer
Available memory
Performance
Parallel computing allows
Solve problems that dont fit on a single
computer
Solve problems that cant be solve in the
reasonable time

30
First Supercomputer

1976, first supercomputer, the Cray-1
It had a speed of tens of megaflops (one megaflop
equals a million floating-point operations per
second) and a memory capacity of 4 megabytes.
Contribution from Los Alamos Lab, and Seymour
Cray
Less than the average speed of PC today

31
Growing Speed

The performance of the fastest computers has
grown exponentially from 1945 to the present,
averaging a factor of 10 every five years
Tens of floating-point operations per second, the
parallel computers of the mid-1990s achieve tens
of billions of operations per second

32
Pipeline

Pipeline start performing an operation on one
piece of data while finishing the same operation
on another piece of data
An operation consists of multiple stages.
After a set of operands complete a particular
stage, they move into the next stage.
Then, another set of operands can move into the
stage that was just abandoned.

33
SuperPipeline

Superpipeline perform multiple pipelined
operations at the same time
So, a superpipeline is a collection of multiple
pipelines that can operate simultaneously.
In other words, several different operations can
execute simultaneously, and each of these
operations can be broken into stages, each of
which is filled all the time.
So you can get multiple operations per CPU cycle.
For example, a IBM Power4 can have over 200
different operations in flight at the same time.

34
Sample of superpipeline design
35
Drawbacks for pipeline architecture---Pipeline
Hazards

structural hazards attempt to use the same
resource two different ways at the same time
e.g., multiple memory accesses, multiple
register writes
solutions multiple memories, stretch pipeline
control hazards attempt to make a decision
before condition is evaluated
e.g., any conditional branch
solutions prediction, delayed branch
data hazards attempt to use item before it is
ready
solutions forwarding/bypassing

36
Memory

shared memory system, there is one large virtual
memory, and all processors have equal access to
data and instructions in this memory.

37
Memory cont

distributed memory, in which each processor has a
local memory that is not accessible from any
other processor.

38
Difference of two kind f memories

Software issue not hardware
The difference determines how different parts of
a parallel program will communicate.
shared memory with semaphores, etc. or
distributed memory with message passing.
All problems run efficiently on a distributed
memory BUT software is easier to develop

39
Cache Coherency
40
Styles of parallel computing (Hardware
Architecture)

SISD-single instruction stream, single data
stream
SIMD-single instruction stream, multiple data
streams
MISD-multiple instruction streams, single data
stream
MIMD-multiple instruction streams, multiple data
streams

41
SISD

Single Instruction, Single Data

42
SIMD

Single Instruction, Multiple Data

43
MISD

Multiple Instruction, Single Data

44
MIMD

Multiple Instruction, Multiple Data(simplest
program controlled message passing)

45
Two parallel processing approaches

SMP symmetric multiprocessing
SMP is the processing of programs by multiple
processors that share a common operating system
and memory
MPP massively parallel processing
MPP is the coordinated processing of a program by
multiple processors that work on different parts
of the program, with each processor using its own
operating system and memory

46
Current Trend

OpenMPOpenMP is an open standard for providing
parallelization mechanisms on shared-memory
multiprocessors.
C/C and FORTRAN, several of the most commonly
used languages for writing parallel programs.
based on a thread paradigm

47
OpenMP execution model
48
New Trend

Clustering
The Widest Definition
Any number of computers communicating at any
distance
The Common Definition
A relatively small number of computers (lt1000)
communicating at a relatively small distance
(within the same room) and used asa single,
shared computing resource

49
Comparison

Programming
A Program written for Cluster Parallelism can run
on an SMP right away
A Program written for an SMP can NOT run on a
Cluster right away
Scalability
Clusters are Scalable
SMPs are NOT Scalable above a Small Number of
Processors

50
Comparison cont..

One big advantage of SMPs is the Single System
Image
Easier Administration and Support
But, Single Point of Failure
Cluster computing can be used for load balancing
as well as for high availability

51
General highlights from Top 500

The Earth Simulator build by NEC remains the
unchallenged 1.
100 systems have peak performance above 1 TFlop/s
up from 70 systems 6 month ago
PC Cluster are now present at all levels of
performance
IBM is still leading the list with respect to the
installed performance ahead of HP and NEC
Hewlett-Packard stays slightly ahead of IBM with
respect to the number of systems installed (HP
137 and IBM 131)

52
NEC Earth-Simulator/ 5120 from Japan
53
Basic Idea/Component

Environment Research
The Earth Simulator consists of 640
supercomputers that are connected by a high-speed
network (data transfer speed 12.3 GBytes). Each
supercomputer (1 node) contains eight vector
processors with a peak performance of 8GFlops and
a high-speed memory of 16 GBytes. The total
number of processors is 5120 (8 x 640), which
translates to a total of approximately 40 TFlops
peak performance, and a total main memory of 10
TeraBytes.

54
Hewlett-Packard SuperDome supercomputer
55
Terms need to know

flops Acronym for floating-point operations per
second. Note For example, 15 Mflops equals 15
million floating-point arithmetic operations per
second. It is a unit of measurement of the
performance of a computer
LINPACK is a collection of Fortran subroutines
that analyze and solve linear equations and
linear least-squares problems.
Rmax----431.70 (Maximal LINPACK performance
achieved )Rpeak----672.00 (Theoretical)

56
Challenges

Faster algorithms
Good data locality
Low communication requirement
Efficient software
High level problem solving environment
Changes of architecture

57
Reference

power comsuption of processor -
http//www.macinfo.de/hardware/strom.html
Under the hood - http//www.kids-online.net/learn/
clicknov/details/cpu.html
Difference Machine and Charles Babbage-
http//www.cbi.umn.edu/exhibits/cb.html
John Von Neumann - http//ei.cs.vt.edu/history/Vo
nNeumann.html
I/O - http//sophia.dtp.fmph.uniba.sk/pchardware/b
us.html
cpu memory- http//csep1.phy.ornl.gov/guidry/phy
s594/lectures/lectures.html
memory - http//www.howstuffworks.com/computer-mem
ory.htm
general idea -
http//www.ccs.uky.edu/douglas/Classes/cs521-s02/
index.html http//www.ccs.uky.edu/dougla
s/Classes/cs521-s01/index.html
http//www.ccs.uky.edu/douglas/Classes/cs521-s00/
index.html
voltage - http//www.hardwarecentral.com/hardwarec
entral/tutorials/19/1/http//www.hardwarecentral.c
om/hardwarecentral/tutorials/19/1/
csep-http//www.ccs.uky.edu/csep/csep.html
top500 -http//www.top500.org
cray co. -http//www.cray.com/company/h_systems.ht
ml
definition of terms-htt//www.whatis.com