Title: Davies Muche
1Computer Architecture
- Davies Muche
-
- Mike Li Luo
- CS521 Spring 2003
2What is a digital computer ?
- A digital computer is a machine composed of the
following three basic components - - Input/Output
- - Central Processing Unit (CPU)
- - Memory
3Early Computers
- As early as the 1600s Calculating machines which
could do Arithmetic operations had been made,
but, non had the three basic components of a
digital computer - In 1823, Charles Babbage undertook the design of
the Difference Engine - The machine was to solve 6th Degree polynomials
to 20 digit accuracy
4- the concepts of mechanical control and mechanical
calculation put together into a machine that has
the basic parts of a digital computer - He was given 17,000 Pounds to construct the
machine but, the project was abandoned in 1842
(uncompleted) - 1856, Babbage conceived the idea of the
Analytical Machine (After his death his son Henry
tried to build it but never succeeded) - In 1854, George Scheutz, built a working
Difference machine based on Babbages design.
(This machine printed mathematical, astronomical
and actuarial tables with unprecedented accuracy,
and was used by the British and American
governments)
5Between 1847 and 1849 Babbage designed the
Difference Engine No.2. He did not built it
Difference Engine No.1
6- However, in 1834, Charles Babbage, developed the
hypothetical program to solve simultaneous
equations on the Analytical Machine
7- The John von Neumann Architecture consists of
five major components (1940s)
8- A refinement of the von Neumann model, the system
bus model has a CPU (ALU and control), memory,
and an input/output unit
9(No Transcript)
10The CPU
- CPU (central processing unit) is an older term
for processor and microprocessor, the central
unit in a computer containing the logic circuitry
that performs the instructions of a computer's
programs. - NOTABLE TYPES
- - RISC Reduced Instruction Set Computer
- -Introduced in the mid 1980s
- -Requires few transistors
- -capable of executing only a very limited set
of - instructions
- - CISC Complex Instruction Set Computer
- -complex CPUs that had ever-larger sets
of instructions
11RISC or CISC The great Controversy
- RISC proponents argue that RISC machines are both
cheaper and faster, and are therefore the
machines of the future. - Skeptics note that by making the hardware
simpler, RISC architectures put a greater burden
on the software. They argue that this is not
worth the trouble because conventional
microprocessors are becoming increasingly fast
and cheap anyway. - The TRUTH!
- CISC and RISC implementations are becoming more
and more alike. Many of today's RISC chips
support as many instructions as yesterday's CISC
chips. And today's CISC chips use many techniques
formerly associated with RISC chips.
12Under the hood of a typical CPU
13What you need to Know about a CPU
- Processing speed
- - The clock Frequency is one measure of how fast
a computer is ( however, the length of time to
carry out an operation depends not only on how
fast the processor cycles, but how many cycles
are required to perform a given operation. - Voltage requirement
- Transistors (electronic switches) in the CPU
requires some voltage to trigger them. - - In the pre-486DX66 days, everything was 5
volts - - As chips got faster and power became a
concern, - designers dropped the chip voltage down to
3.3 volts (external Voltage) and 2.9V or 2.5V
core voltage -
14More on Voltage Requirements
- Power consumption equates largely with heat
generation, which is a primary enemy in achieving
increased performance. Newer processors are
larger and faster, and keeping them cool can be a
major concern. - Reducing power usage is a primary objective for
the designers of notebook computers, since they
run on batteries with a limited life. (They also
are more sensitive to heat problems since their
components are crammed into such a small space). - Compensate for by using lower-power semiconductor
processes, and shrinking the circuit size and die
size. Newer processors reduce voltage levels even
more by using what is called a dual voltage, or
split rail design
15More on Dual Voltage Design
- A split rail processor uses two different
voltages. - The external or I/O voltage is higher, typically
3.3V for compatibility with the other chips on
the motherboard. - The internal or core voltage is lower usually
2.5 to 2.9 volts. This design allows these
lower-voltage CPUs to be used without requiring
wholesale changes to motherboards, chipsets etc.
16Power consumption verses speed of some processors
17MEMORY
- Computers have hierarchies of memories that may
be classified according to Function, Capacity and
Response Times. - -Function
- "Reads" transfer information from the memory
"Writes" transfer information to the memory - -Random Access Memory (RAM) performs both
reads and writes. - -Read-Only Memory (ROM) contains
information stored at the - time of manufacture that can only be
read. - -Programmable Read-Only Memory (PROM) is ROM
that can be written once - at some point after manufacture.
- -Capacity
- bit smallest unit of memory (value of 0 or
1) - byte 8 bits
- In modern computers, the total memory may range
from say 16 MB in a small personal computer to
several GB (gigabytes) in large supercomputers.
18More on memory
- Memory Response
- Memory response is characterized by two
different measures - -Access Time (also termed response time or
latency) defines how quickly the memory can
respond to a read or write request. -
- -Memory Cycle Time refers to the minimum period
between two successive requests of the memory. -
- -Access times vary from about 80 ns ns
nanosecond 10(-9) seconds for chips in small
personal computers to about 10 ns or less for the
fastest chips in caches and buffers. For various
reasons, the memory cycle time is more than the
speed of the memory chips (i.e., the length of
time between successive requests is more than the
80 ns speed of the chips in a small personal
computer).
19(No Transcript)
20The I/O BUS
- A Computer transfers data from disk to CPU, from
CPU to memory, or from memory to the display
adapter etc. - To avoid having a separate circuits between
every pair of devices, the BUS is used. - Definition
- The Bus is simply a common set of wires that
connect all the computer devices and chips
together
21Different functions for Different wires of the bus
- Some of these wires are used to transmit data.
- Some send housekeeping signals, like the clock
pulse. Some transmit a number (the "address")
that identifies a particular device or memory
location - Use of the address
- The computer chips and devices watch the
address wires and respond when their identifying
number (address) is transmitted before they can
transfer data - Problem!
- Starting with machines that used the 386 CPU,
CPUs and memory ran faster than other I/O devices - Solution
- - Separate the CPU and memory from all the I/O.
Today, memory is only added by plugging it into
special sockets on the main computer board.
22Bus Speeds
- Multiple Buses with different speeds is an option
or a single bus supporting different speeds is
used -
- In a modern PC, there may be a half dozen
different Bus areas. -
- There is certainly a "CPU area" that still
contains the CPU, memory, and basic control
logic. - There is a "High Speed I/O Device" area that is
either a VESA Local Bus (VLB) or an PCI Bus
23Some Bus Standards
- ISA (Industry Standard Architecture) bus
- In 1987 IBM introduced a new Microchannel (MCA)
bus - The other vendors developed an extension of the
older ISA interface called EISA - VESA Local Bus (VLB), which became popular at the
start of 1993
24More Bus Standards
- The PCI bus was developed by Intel
- PCI is a 64 bit interface in a 32 bit package
- The PCI bus runs at 33 MHz and can transfer 32
bits of data (four bytes) every clock tick. - That sounds like a 32-bit bus! However, a clock
tick at 33 MHz is 30 nanoseconds, and memory only
has a speed of 70 nanoseconds. When the CPU
fetches data from RAM, it has to wait at least
three clock ticks for the data. By transferring
data every clock tick, the PCI bus can deliver
the same throughput on a 32 bit interface that
other parts of the machine deliver through a 64
bit path.
25Things to know about I/O Bus
- Buses transfer information between parts of a
computer. Smaller computers have a single bus
more advanced computers have complex
interconnection strategies. - Things to know about the bus
- Transaction Unit of communication on bus.
- Bus Master The module controlling the bus at a
particular time. - Arbitration Protocol Set of signals exchanged
to decide which of two competing modules will
control a bus at a particular time. - Communication Protocol Algorithm used to
transfer data on the bus. - Asynchronous Protocol Communication algorithm
that can begin at any time requires overhead to
notify receivers that transfer is about to begin.
26Things to know about the bus continued
- Synchronous Protocol Communication algorithm
that can begin only at well-know times defined by
a global clock. - Transfer Time Time for data to be transferred
over the bus in single transaction. - Bandwidth Data transfer capacity of bus
usually expressed in bits per second (bps).
Sometimes termed throughput. - Bandwidth and Transfer Time measure related
things, but bandwidth takes into account required
overheads and is usually a more useful measure of
the speed of the bus.
27Supercomputer Architecture
- Background
- Architecture
- Approaches
- Trends
- Challenges
28What is parallel computing
- Use of multiple computers or processors working
together to do a common task. - Each processor works on its section of the
problem - Processors are allow to exchange information with
other processors
29Why parallel computing
- Limits of single computer
- Available memory
- Performance
- Parallel computing allows
- Solve problems that dont fit on a single
computer - Solve problems that cant be solve in the
reasonable time
30First Supercomputer
- 1976, first supercomputer, the Cray-1
- It had a speed of tens of megaflops (one megaflop
equals a million floating-point operations per
second) and a memory capacity of 4 megabytes. - Contribution from Los Alamos Lab, and Seymour
Cray - Less than the average speed of PC today
31Growing Speed
- The performance of the fastest computers has
grown exponentially from 1945 to the present,
averaging a factor of 10 every five years - Tens of floating-point operations per second, the
parallel computers of the mid-1990s achieve tens
of billions of operations per second
32Pipeline
- Pipeline start performing an operation on one
piece of data while finishing the same operation
on another piece of data - An operation consists of multiple stages.
- After a set of operands complete a particular
stage, they move into the next stage. - Then, another set of operands can move into the
stage that was just abandoned.
33SuperPipeline
- Superpipeline perform multiple pipelined
operations at the same time - So, a superpipeline is a collection of multiple
pipelines that can operate simultaneously. - In other words, several different operations can
execute simultaneously, and each of these
operations can be broken into stages, each of
which is filled all the time. - So you can get multiple operations per CPU cycle.
- For example, a IBM Power4 can have over 200
different operations in flight at the same time.
34Sample of superpipeline design
35Drawbacks for pipeline architecture---Pipeline
Hazards
- structural hazards attempt to use the same
resource two different ways at the same time - e.g., multiple memory accesses, multiple
register writes - solutions multiple memories, stretch pipeline
- control hazards attempt to make a decision
before condition is evaluated - e.g., any conditional branch
- solutions prediction, delayed branch
- data hazards attempt to use item before it is
ready - solutions forwarding/bypassing
36Memory
- shared memory system, there is one large virtual
memory, and all processors have equal access to
data and instructions in this memory.
37Memory cont
- distributed memory, in which each processor has a
local memory that is not accessible from any
other processor.
38Difference of two kind f memories
- Software issue not hardware
- The difference determines how different parts of
a parallel program will communicate. - shared memory with semaphores, etc. or
distributed memory with message passing. - All problems run efficiently on a distributed
memory BUT software is easier to develop
39Cache Coherency
40Styles of parallel computing (Hardware
Architecture)
- SISD-single instruction stream, single data
stream - SIMD-single instruction stream, multiple data
streams - MISD-multiple instruction streams, single data
stream - MIMD-multiple instruction streams, multiple data
streams
41SISD
- Single Instruction, Single Data
42SIMD
- Single Instruction, Multiple Data
43MISD
- Multiple Instruction, Single Data
44MIMD
- Multiple Instruction, Multiple Data(simplest
program controlled message passing)
45Two parallel processing approaches
- SMP symmetric multiprocessing
- SMP is the processing of programs by multiple
processors that share a common operating system
and memory - MPP massively parallel processing
- MPP is the coordinated processing of a program by
multiple processors that work on different parts
of the program, with each processor using its own
operating system and memory
46Current Trend
- OpenMPOpenMP is an open standard for providing
parallelization mechanisms on shared-memory
multiprocessors. - C/C and FORTRAN, several of the most commonly
used languages for writing parallel programs. - based on a thread paradigm
47OpenMP execution model
48New Trend
- Clustering
- The Widest Definition
- Any number of computers communicating at any
distance - The Common Definition
- A relatively small number of computers (lt1000)
communicating at a relatively small distance
(within the same room) and used asa single,
shared computing resource
49Comparison
- Programming
- A Program written for Cluster Parallelism can run
on an SMP right away - A Program written for an SMP can NOT run on a
Cluster right away - Scalability
- Clusters are Scalable
- SMPs are NOT Scalable above a Small Number of
Processors
50Comparison cont..
- One big advantage of SMPs is the Single System
Image - Easier Administration and Support
- But, Single Point of Failure
- Cluster computing can be used for load balancing
as well as for high availability
51General highlights from Top 500
- The Earth Simulator build by NEC remains the
unchallenged 1. - 100 systems have peak performance above 1 TFlop/s
up from 70 systems 6 month ago - PC Cluster are now present at all levels of
performance - IBM is still leading the list with respect to the
installed performance ahead of HP and NEC - Hewlett-Packard stays slightly ahead of IBM with
respect to the number of systems installed (HP
137 and IBM 131)
52NEC Earth-Simulator/ 5120 from Japan
53Basic Idea/Component
- Environment Research
- The Earth Simulator consists of 640
supercomputers that are connected by a high-speed
network (data transfer speed 12.3 GBytes). Each
supercomputer (1 node) contains eight vector
processors with a peak performance of 8GFlops and
a high-speed memory of 16 GBytes. The total
number of processors is 5120 (8 x 640), which
translates to a total of approximately 40 TFlops
peak performance, and a total main memory of 10
TeraBytes.
54Hewlett-Packard SuperDome supercomputer
55Terms need to know
- flops Acronym for floating-point operations per
second. Note For example, 15 Mflops equals 15
million floating-point arithmetic operations per
second. It is a unit of measurement of the
performance of a computer - LINPACK is a collection of Fortran subroutines
that analyze and solve linear equations and
linear least-squares problems. - Rmax----431.70 (Maximal LINPACK performance
achieved )Rpeak----672.00 (Theoretical)
56Challenges
- Faster algorithms
- Good data locality
- Low communication requirement
- Efficient software
- High level problem solving environment
- Changes of architecture
57Reference
- power comsuption of processor -
http//www.macinfo.de/hardware/strom.html - Under the hood - http//www.kids-online.net/learn/
clicknov/details/cpu.html - Difference Machine and Charles Babbage-
http//www.cbi.umn.edu/exhibits/cb.html - John Von Neumann - http//ei.cs.vt.edu/history/Vo
nNeumann.html - I/O - http//sophia.dtp.fmph.uniba.sk/pchardware/b
us.html - cpu memory- http//csep1.phy.ornl.gov/guidry/phy
s594/lectures/lectures.html - memory - http//www.howstuffworks.com/computer-mem
ory.htm - general idea -
- http//www.ccs.uky.edu/douglas/Classes/cs521-s02/
index.html http//www.ccs.uky.edu/dougla
s/Classes/cs521-s01/index.html
http//www.ccs.uky.edu/douglas/Classes/cs521-s00/
index.html - voltage - http//www.hardwarecentral.com/hardwarec
entral/tutorials/19/1/http//www.hardwarecentral.c
om/hardwarecentral/tutorials/19/1/ - csep-http//www.ccs.uky.edu/csep/csep.html
- top500 -http//www.top500.org
- cray co. -http//www.cray.com/company/h_systems.ht
ml - definition of terms-htt//www.whatis.com
58