Introduction to Hardware/Architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Hardware/Architecture

Description:

Instruction Set Architecture (subset of Computer Arch. ... State-of-the-art PC 'when you graduate' (1997-2001) ... Multilevel Caches (helps clocks / instruction) ... – PowerPoint PPT presentation

Number of Views:259
Avg rating:3.0/5.0
Slides: 87
Provided by: csBer
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Hardware/Architecture


1
Introduction to Hardware/Architecture
  • David A. Patterson

http//cs.berkeley.edu/patterson/talks patters
on,kkeeton_at_cs.berkeley.edu EECS, University of
California Berkeley, CA 94720-1776
2
What is a Computer System?
Application (Netscape)
Operating
Compiler
System (Windows 98)
Software
Assembler
Instruction Set Architecture
Hardware
I/O system
Processor
Memory
Datapath Control
Digital Design
Circuit Design
transistors
  • Coordination of many levels of abstraction

3
Levels of Representation
temp vk vk vk1 vk1 temp
High Level Language Program (e.g., C)
Compiler
  • lw to, 0(2)
  • lw t1, 4(2)
  • sw t1, 0(2)
  • sw t0, 4(2)

Assembly Language Program (e.g.,MIPS)
Assembler
Machine Language Program (MIPS)
0000 1001 1100 0110 1010 1111 0101 1000 1010 1111
0101 1000 0000 1001 1100 0110 1100 0110 1010
1111 0101 1000 0000 1001 0101 1000 0000 1001
1100 0110 1010 1111
Machine Interpretation
Control Signal Specification

4
The Instruction Set a Critical Interface
software
instruction set
hardware
5
Instruction Set Architecture (subset of Computer
Arch.)
  • ... the attributes of a computing system as
    seen by the programmer, i.e. the conceptual
    structure and functional behavior, as distinct
    from the organization of the data flows and
    controls the logic design, and the physical
    implementation. Amdahl, Blaaw, and
    Brooks, 1964

-- Organization of Programmable Storage --
Data Types Data Structures Encodings
Representations -- Instruction Set --
Instruction Formats -- Modes of Addressing and
Accessing Data Items and Instructions --
Exceptional Conditions
6
Anatomy 5 components of any Computer
Personal Computer
Keyboard, Mouse
Computer
Processor (active)
Memory (passive) (where programs, data live
when running)
Devices
Disk (where programs, data live when not
running)
Input
Control (brain)
Datapath (brawn)
Output
Display, Printer
Processor often called (IBMese) CPU for
Central Processor Unit
7
Technology Trends Microprocessor Capacity
Alpha 21264 15 million Pentium Pro 5.5
million PowerPC 620 6.9 million Alpha 21164 9.3
million Sparc Ultra 5.2 million
Moores Law
2X transistors/Chip Every 1.5 years Called
Moores Law
8
Technology Trends Processor Performance
1.54X/yr
Processor performance increase/yr mistakenly
referred to as Moores Law (transistors/chip)
9
Computer TechnologygtDramatic Change
  • Processor
  • 2X in speed every 1.5 years 1000X performance in
    last 15 years
  • Memory
  • DRAM capacity 2x / 1.5 years 1000X size in last
    15 years
  • Cost per bit improves about 25 per year
  • Disk
  • capacity gt 2X in size every 1.5 years
  • Cost per bit improves about 60 per year
  • 120X size in last decade
  • State-of-the-art PC when you graduate
    (1997-2001)
  • Processor clock speed 1500 MegaHertz (1.5
    GigaHertz)
  • Memory capacity 500 MegaByte (0.5 GigaBytes)
  • Disk capacity 100 GigaBytes (0.1 TeraBytes)
  • New units! Mega gt Giga, Giga gt Tera

10
Integrated Circuit Costs
Die cost Wafer cost
Dies per Wafer Die yield

Dies
Flaws
Die Cost is goes roughly with the cube of the
area fewer dies per wafer yield worse with
die area
11
Die Yield (1993 data)
Raw Dices Per Wafer wafer diameter die area
(mm2) 100 144 196 256 324 400 6/15cm 139
90 62 44 32 23 8/20cm 265 177 124 90 68
52 10/25cm 431 290 206 153 116 90 die
yield 23 19 16 12 11 10 typical CMOS
process ? 2, wafer yield90, defect
density2/cm2, 4 test sites/wafer Good Dices Per
Wafer (Before Testing!) 6/15cm 31 16 9 5 3 2
8/20cm 59 32 19 11 7 5 10/25cm 96 53 32 20 13
9 typical cost of an 8, 4 metal layers, 0.5um
CMOS wafer 2000
12
1993 Real World Examples
  • Chip Metal Line Wafer Defect Area Dies/ Yield Di
    e Cost layers width cost /cm2 mm2 wafer
  • 386DX 2 0.90 900 1.0 43 360 71 4
  • 486DX2 3 0.80 1200 1.0 81 181 54 12
  • PowerPC 601 4 0.80 1700 1.3 121 115 28 53
  • HP PA 7100 3 0.80 1300 1.0 196 66 27 73
  • DEC Alpha 3 0.70 1500 1.2 234 53 19 149
  • SuperSPARC 3 0.70 1700 1.6 256 48 13 272
  • Pentium 3 0.80 1500 1.5 296 40 9 417
  • From "Estimating IC Manufacturing Costs, by
    Linley Gwennap, Microprocessor Report, August 2,
    1993, p. 15

13
Other Costs
  • IC cost Die cost Testing cost
    Packaging cost
  • Final
    test yield
  • Packaging Cost depends on pins, heat dissipation

Chip Die Package Test Total cost pins ty
pe cost Assembly 386DX 4 132 QFP 1 4 9
486DX2 12 168 PGA 11 12 35 PowerPC
601 53 304 QFP 3 21 77 HP PA 7100 73
504 PGA 35 16 124 DEC Alpha 149
431 PGA 30 23 202 SuperSPARC 272
293 PGA 20 34 326 Pentium 417
273 PGA 19 37 473
14
System Cost 1995-96 Workstation
  • System Subsystem of total cost
  • Cabinet Sheet metal, plastic 1 Power supply,
    fans 2 Cables, nuts, bolts 1 (Subtotal) (4)
  • Motherboard Processor 6 DRAM (64MB) 36 Video
    system 14 I/O system 3 Printed Circuit
    board 1 (Subtotal) (60)
  • I/O Devices Keyboard, mouse 1 Monitor 22 Hard
    disk (1 GB) 7 Tape drive (DAT) 6
    (Subtotal) (36)

15
COST v. PRICE
(WSPC)
list price
Q What of company income on Research and
Development (RD)?
5080
Average Discount
(3345)
avg. selling price
Gross Margin
gross margin
25100
(3314)
Direct Costs
direct costs
direct costs
33
(810)
Component Cost
component cost
component cost
component cost
(2531)
Input chips, displays, ...
Making it labor, scrap, returns, ...
Overhead RD, rent, marketing, profits, ...
Commision channel profit, volume discounts,
16
Outline
  • Review of Five Technologies Processor, Memory,
    Disk, Network Systems
  • Description / History / Performance Model
  • State of the Art / Trends / Limits / Innovation
  • Common Themes across Technologies
  • Perform. per access (latency) per byte
    (bandwidth)
  • Fast Capacity, BW, Cost Slow Latency,
    Interfaces
  • Moores Law affecting all chips in system

17
Processor Trends/ History
  • Microprocessor main CPU of all computers
  • lt 1986, 35/ yr. performance increase
    (2X/2.3yr)
  • gt1987 (RISC), 60/ yr. performance increase
    (2X/1.5yr)
  • Cost fixed at 500/chip, power whatever can cool
  • History of innovations to 2X / 1.5 yr
  • Pipelining (helps seconds / clock, or clock rate)
  • Out-of-Order Execution (helps clocks /
    instruction)
  • Superscalar (helps clocks / instruction)
  • Multilevel Caches (helps clocks / instruction)

18
Pipelining is Natural!
  • Laundry Example
  • Ann, Brian, Cathy, Dave each have one load of
    clothes to wash, dry, fold, and put away
  • Washer takes 30 minutes
  • Dryer takes 30 minutes
  • Folder takes 30 minutes
  • Stasher takes 30 minutesto put clothes into
    drawers

A
B
C
D
19
Sequential Laundry
2 AM
6 PM
12
8
1
7
10
11
9
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
T a s k O r d e r
Time
A
B
C
D
  • Sequential laundry takes 8 hours for 4 loads

20
Pipelined Laundry Start work ASAP
2 AM
12
6 PM
8
1
7
10
11
9
Time
30
30
30
30
30
30
30
T a s k O r d e r
A
B
C
D
  • Pipelined laundry takes 3.5 hours for 4 loads!

21
Pipeline Hazard Stall
2 AM
12
6 PM
8
1
7
10
11
9
Time
T a s k O r d e r
A
B
C
E
F
  • A depends on D stall since folder tied up

22
Out-of-Order Laundry Dont Wait
2 AM
12
6 PM
8
1
7
10
11
9
Time
30
30
30
30
30
30
30
T a s k O r d e r
A
B
C
D
E
F
  • A depends on D rest continue need more
    resources to allow out-of-order

23
Superscalar Laundry Parallel per stage
2 AM
12
6 PM
8
1
7
10
11
9
Time
T a s k O r d e r
D
E
F
  • More resources, HW match mix of parallel tasks?

24
Superscalar Laundry Mismatch Mix
2 AM
12
6 PM
8
1
7
10
11
9
Time
30
30
30
30
30
30
30
T a s k O r d e r
(light clothing)
(dark clothing)
(light clothing)
  • Task mix underutilizes extra resources

25
State of the Art Alpha 21264
  • 15M transistors
  • 2 64KB caches on chip 16MB L2 cache off chip
  • Clock lt1.7 nsec, or gt600 MHz (Fastest Cray
    Supercomputer T90 2.2 nsec)
  • 90 watts
  • Superscalar fetch up to 6 instructions/clock
    cycle, retires up to 4 instruction/clock cycle
  • Execution out-of-order

26
Todays Situation Microprocessor
  • MIPS MPUs R5000 R10000 10k/5k
  • Clock Rate 200 MHz 195 MHz 1.0x
  • On-Chip Caches 32K/32K 32K/32K 1.0x
  • Instructions/Cycle 1( FP) 4 4.0x
  • Pipe stages 5 5-7 1.2x
  • Model In-order Out-of-order ---
  • Die Size (mm2) 84 298 3.5x
  • without cache, TLB 32 205 6.3x
  • Development (man yr..) 60 300 5.0x
  • SPECint_base95 5.7 8.8 1.6x

27
Memory History/Trends/State of Art
  • DRAM main memory of all computers
  • Commodity chip industry no company gt20 share
  • Packaged in SIMM or DIMM (e.g.,16 DRAMs/SIMM)
  • State of the Art 152, 128 MB DIMM (16 64-Mbit
    DRAMs),10 ns x 64b (800MB/sec)
  • Capacity 4X/3 yrs (60/yr..)
  • Moores Law
  • MB/ 25/yr.
  • Latency 7/year, Bandwidth 20/yr. (so far)

source www.pricewatch.com, 5/21/98
28
Memory Summary
  • DRAM rapid improvements in capacity, MB/,
    bandwidth slow improvement in latency
  • Processor-memory interface (cachememory bus) is
    bottleneck to delivered bandwidth
  • Like network, memory protocol is major overhead

29
Processor Innovations/Limits
  • Low cost , low power embedded processors
  • Lots of competition, innovation
  • Integer perf. embedded proc. 1/2 desktop
    processor
  • Strong ARM 110 233 MHz, 268 MIPS, 0.36W typ.,
    49
  • Very Long Instruction Word (Intel,HP
    IA-64/Merced)
  • multiple ops/ instruction, compiler controls
    parallelism
  • Consolidation of desktop industry? Innovation?

x86
IA-64
SPARC
Alpha
PowerPC
MIPS
PA-RISC
30
Processor Summary
  • SPEC performance doubling / 18 months
  • Growing CPU-DRAM performance gap tax
  • Running out of ideas, competition? Back to 2X /
    2.3 yrs?
  • Processor tricks not as useful for transactions?
  • Clock rate increase compensated by CPI increase?
  • When gt 100 MIPS on TPC-C?
  • Cost fixed at 500/chip, power whatever can cool
  • Embedded processors promising
  • 1/10 cost, 1/100 power, 1/2 integer performance?

31
Processor Limit DRAM Gap
  • Alpha 21264 full cache miss in instructions
    executed 180 ns/1.7 ns 108 clks x 4 or 432
    instructions
  • Caches in Pentium Pro 64 area, 88 transistors

32
The Goal Illusion of large, fast, cheap memory
  • Fact Large memories are slow, fast memories are
    small
  • How do we create a memory that is large, cheap
    and fast (most of the time)?
  • Hierarchy of Levels
  • Similar to Principle of Abstraction hide
    details of multiple levels

33
Hierarchy Analogy Term Paper in Library
  • Working on paper in library at a desk
  • Option 1 Every time need a book
  • Leave desk to go to shelves (or stacks)
  • Find the book
  • Bring one book back to desk
  • Read section interested in
  • When done with section, leave desk and go to
    shelves carrying book
  • Put the book back on shelf
  • Return to desk to work
  • Next time need a book, go to first step

34
Memory Hierarchy Analogy Library
  • Option 2 Every time need a book
  • Leave some books on desk after fetching them
  • Only go to shelves when need a new book
  • When go to shelves, bring back related books in
    case you need them sometimes youll need to
    return books not used recently to make space for
    new books on desk
  • Return to desk to work
  • When done, replace books on shelves, carrying as
    many as you can per trip
  • Illusion whole library on your desktop
  • Buzzword cache from French for hidden treasure

35
Why Hierarchy works Natural Locality
  • The Principle of Locality
  • Program access a relatively small portion of the
    address space at any instant of time.
  • What programming constructs lead to Principle of
    Locality?

36
Memory Hierarchy How Does it Work?
  • Temporal Locality (Locality in Time)
  • ? Keep most recently accessed data items closer
    to the processor
  • Library Analogy Recently read books are kept on
    desk
  • Block is unit of transfer (like book)
  • Spatial Locality (Locality in Space)
  • ? Move blocks consists of contiguous words to the
    upper levels
  • Library Analogy Bring back nearby books on
    shelves when fetch a book hope that you might
    need it later for your paper

37
Memory Hierarchy Pyramid
  • Levels in memory hierarchy

Level n
(data cannot be in level i unless also in i1)
38
Big Idea of Memory Hierarchy
  • Temporal locality keep recently accessed data
    items closer to processor
  • Spatial locality moving contiguous words in
    memory to upper levels of hierarchy
  • Uses smaller and faster memory technologies close
    to the processor
  • Fast hit time in highest level of hierarchy
  • Cheap, slow memory furthest from processor
  • If hit rate is high enough, hierarchy has access
    time close to the highest (and fastest) level and
    size equal to the lowest (and largest) level

39
Recall 5 components of any Computer
Keyboard, Mouse
Computer
Processor (active)
Devices
Memory (passive) (where programs, data live
when running)
Input
Control (brain)
Disk, Network
Output
Datapath (brawn)
Display, Printer
40
Disk Description / History
Track
Embed. Proc. (ECC, SCSI)
Sector
Track Buffer
Arm
Head
Platter
1973 1. 7 Mbit/sq. in 140 MBytes
1979 7. 7 Mbit/sq. in 2,300 MBytes
Cylinder
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even more data into
even smaller spaces
41
Disk History
2000 10,100 Mb/s. i. 25,000 MBytes
2000 11,000 Mb/s. i. 73,400 MBytes
1989 63 Mbit/sq. in 60,000 MBytes
1997 1450 Mbit/sq. in 2300 Mbytes (2.5
diameter)
1997 3090 Mbit/s. i. 8100 Mbytes (3.5 diameter)
source N.Y. Times, 2/23/98, page C3
42
State of the Art Ultrastar 72ZX
  • 73.4 GB, 3.5 inch disk
  • 2/MB
  • 16 MB track buffer
  • 11 platters, 22 surfaces
  • 15,110 cylinders
  • 7 Gbit/sq. in. areal density
  • 17 watts (idle)
  • 0.1 ms controller time
  • 5.3 ms avg. seek (seek 1 track gt 0.6 ms)
  • 3 ms 1/2 rotation
  • 37 to 22 MB/s to media

Embed. Proc.
Track
Sector
Cylinder
Track Buffer
Platter
Arm
Head
source www.ibm.com www.pricewatch.com 2/14/00
43
Disk Limit
  • Continued advance in capacity (60/yr) and
    bandwidth (40/yr.)
  • Slow improvement in seek, rotation (8/yr)
  • Time to read whole disk
  • Year Sequentially Randomly
  • 1990 4 minutes 6 hours
  • 2000 12 minutes 1 week
  • Dynamically change data layout to reduce seek,
    rotation delay? Leverage space vs. spindles?

44
A glimpse into the future?
  • IBM microdrive for digital cameras
  • 340 Mbytes
  • Disk target in 5-7 years?
  • building block 2006 MicroDrive
  • 9GB disk, 50 MB/sec from disk
  • 10,000 nodes fit into one rack!

45
Disk Summary
  • Continued advance in capacity, cost/bit, BW slow
    improvement in seek, rotation
  • External I/O bus bottleneck to transfer rate,
    cost?gt move to fast serial lines (FC-AL)?
  • What to do with increasing speed of embedded
    processor inside disk?

46
Connecting to Networks (and Other I/O)
  • Bus - shared medium of communication that can
    connect to many devices
  • Hierarchy of Buses in a PC

47
Buses in a PC
  • Data rates
  • Memory 100 MHz, 8 bytes? 800 MB/s (peak)
  • PCI 33 MHz, 4 bytes wide ? 132 MB/s (peak)
  • SCSI Ultra2 (40 MHz), Wide (2 bytes) ? 80
    MB/s (peak)

48
Why Networks?
  • Originally sharing I/O devices between computers
    (e.g., printers)
  • Then Communicating between computers (e.g, file
    transfer protocol)
  • Then Communicating between people (e.g., email)
  • Then Communicating between networks of computers
    ? Internet, WWW

49
Types of Networks
  • Local Area Network (Ethernet)
  • Inside a building Up to 1 km
  • (peak) Data Rate 10 Mbits/sec, 100
    Mbits/sec,1000 Mbits/sec
  • Run, installed by network administrators
  • Wide Area Network
  • Across a continent (10km to 10000 km)
  • (peak) Data Rate 1.5 Mbits/sec to 2500
    Mbits/sec
  • Run, installed by telephone companies

50
ABCs of Networks 2 Computers
  • Starting Point Send bits between 2 computers
  • Queue (First In First Out) on each end
  • Can send both ways (Full Duplex)
  • Information sent called a message
  • Note Messages also called packets

51
A Simple Example 2 Computers
  • What is Message Format?
  • (Similar in idea to Instruction Format)
  • Fixed size? Number bits?

0 Please send data from address in your
memory 1 Packet contains data corresponding to
request
  • Header(Trailer) information to deliver message
  • Payload data in message (1 word above)

52
Questions About Simple Example
  • What if more than 2 computers want to
    communicate?
  • Need computer address field in packet to know
    which computer should receive it (destination),
    and to which computer it came from for reply
    (source)

53
Questions About Simple Example
  • What if message is garbled in transit?
  • Add redundant information that is checked when
    message arrives to be sure it is OK
  • 8-bit sum of other bytes called Check sum
    upon arrival compare check sum to sum of rest of
    information in message

54
Questions About Simple Example
  • What if message never arrives?
  • If tell sender it has arrived (and tell receiver
    reply has arrived), can resend upon failure
  • Dont discard message until get ACK
    (acknowledgment) (Also, if check sum fails,
    dont send ACK)

55
Observations About Simple Example
  • Simple questions such as those above lead to more
    complex procedures to send/receive message and
    more complex message formats
  • Protocol algorithm for properly sending and
    receiving messages (packets)

56
Ethernet (popular LAN) Packet Format
Preamble
Dest Addr
Src Addr
Data
Check
Pad
8 Bytes
6 Bytes
6 Bytes
0-1500B
0-46B
4B
Length of Data2 Bytes
  • Preamble to recognize beginning of packet
  • Unique Address per Ethernet Network Interface
    Card so can just plug in use (privacy issue?)
  • Pad ensures minimum packet is 64 bytes
  • Easier to find packet on the wire
  • Header Trailer 24B Pad

57
Software Protocol to Send and Receive
  • SW Send steps
  • 1 Application copies data to OS buffer
  • 2 OS calculates checksum, starts timer
  • 3 OS sends data to network interface HW and says
    start
  • SW Receive steps
  • 3 OS copies data from network interface HW to OS
    buffer
  • 2 OS calculates checksum, if OK, send ACK if
    not, delete message (sender resends when timer
    expires)
  • 1 If OK, OS copies data to user address space,
    signals application to continue

58
Protocol for Networks of Networks (WAN)?
  • Internetworking allows computers on independent
    and incompatible networks to communicate reliably
    and efficiently
  • Enabling technologies SW standards that allow
    reliable communications without reliable networks
  • Hierarchy of SW layers, giving each layer
    responsibility for portion of overall
    communications task, called protocol families or
    protocol suites
  • Abstraction to cope with complexity of
    communication vs. Abstraction for complexity of
    computation

59
Protocol for Network of Networks
  • Transmission Control Protocol/Internet Protocol
    (TCP/IP)
  • This protocol family is the basis of the
    Internet, a WAN protocol
  • IP makes best effort to deliver
  • TCP guarantees delivery
  • TCP/IP so popular it is used even when
    communicating locally even across homogeneous LAN

60
FTP From Stanford to Berkeley
Hennessy
FDDI
Ethernet
FDDI
T3
FDDI
Patterson
Ethernet
Ethernet
  • BARRNet is WAN for Bay Area
  • T3 is 45 Mbit/s leased line (WAN) FDDI is 100
    Mbit/s LAN
  • IP sets up connection, TCP sends file

61
Protocol Family Concept
Message
Message
Message
62
Protocol Family Concept
  • Key to protocol families is that communication
    occurs logically at the same level of the
    protocol, called peer-to-peer, but is implemented
    via services at the lower level
  • Danger is each level lower performance if family
    is implemented as hierarchy (e.g., multiple
    check sums)

63
TCP/IP packet, Ethernet packet, protocols
  • Application sends message
  • TCP breaks into 64KB segments, adds 20B header
  • IP adds 20B header, sends to network
  • If Ethernet, broken into 1500B packets with
    headers, trailers (24B)
  • All Headers, trailers have length field,
    destination, ...

64
Shared vs. Switched Based Networks
  • Shared Media vs. Switched pairs communicate at
    same time point-to-point connections
  • Aggregate BW in switched network is many times
    shared
  • point-to-point faster since no arbitration,
    simpler interface

65
Heart of Todays Data Switch
Covert serial bit stream to, say, 128 bit words
Covert 128 bit words into serial bit stream
Memory
Unpack header to find destination and place
message into memory of proper outgoing port OK
as long as memory much faster than switch rate
66
Network Media (if time)
67
I/O Pitfall Relying on Peak Data Rates
  • Using the peak transfer rate of a portion of the
    I/O system to make performance projections or
    performance comparisons
  • Peak bandwidth measurements often based on
    unrealistic assumptions about system or
    unattainable because of other system limitations
  • In example, Peak Bandwidth FDDI vs.10 Mbit
    Ethernet 101, but delivered BW ratio (due to
    software overhead) is 1.011
  • Peak PCI BW is 132 MByte/sec, but combined with
    memory often lt 80 MB/s

68
Network Description/Innovations
  • Shared Media vs. Switched pairs communicate at
    same time
  • Aggregate BW in switched network is many times
    shared
  • point-to-point faster only single destination,
    simpler interface
  • Serial line 1 5 Gbit/sec
  • Moores Law for switches, too
  • 1 chip 32 x 32 switch, 1.5 Gbit/sec links,
    39648 Gbit/sec aggregate bandwidth (AMCC S2025)

69
Network History/Limits
  • TCP/UDP/IP protocols for WAN/LAN in 1980s
  • Lightweight protocols for LAN in 1990s
  • Limit is standards and efficient SW protocols
  • 10 Mbit Ethernet in 1978 (shared)
  • 100 Mbit Ethernet in 1995 (shared, switched)
  • 1000 Mbit Ethernet in 1998 (switched)
  • FDDI ATM Forum for scalable LAN (still meeting)
  • Internal I/O bus limits delivered BW
  • 32-bit, 33 MHz PCI bus 1 Gbit/sec
  • future 64-bit, 66 MHz PCI bus 4 Gbit/sec

70
Network Summary
  • Fast serial lines, switches offer high bandwidth,
    low latency over reasonable distances
  • Protocol software development and standards
    committee bandwidth limit innovation rate
  • Ethernet forever?
  • Internal I/O bus interface to network is
    bottleneck to delivered bandwidth, latency

71
Network Summary
  • Protocol suites allow heterogeneous networking
  • Another use of principle of abstraction
  • Protocols ? operation in presence of failures
  • Standardization key for LAN, WAN
  • Integrated circuit revolutionizing network
    switches as well as processors
  • Switch just a specialized computer
  • High bandwidth networks with slow SW overheads
    dont deliver their promise

72
Systems History, Trends, Innovations
  • Cost/Performance leaders from PC industry
  • Transaction processing, file service based on
    Symmetric Multiprocessor (SMP)servers
  • 4 - 64 processors
  • Shared memory addressing
  • Decision support based on SMP and Cluster (Shared
    Nothing)
  • Clusters of low cost, small SMPs getting popular

73
1997 State of the Art System PC
  • 1140 OEM
  • 1 266 MHz Pentium II
  • 64 MB DRAM
  • 2 UltraDMA EIDE disks, 3.1 GB each
  • 100 Mbit Ethernet Interface
  • (PennySort winner)

source www.research.microsoft.com/research/barc/S
ortBenchmark/PennySort.ps
74
1997 State of the Art SMP Sun E10000
4 address buses
  • TPC-D,Oracle 8, 3/98
  • SMP 64 336 MHz CPUs, 64GB dram, 668 disks (5.5TB)
  • Disks,shelf 2,128k
  • Boards,encl. 1,187k
  • CPUs 912k
  • DRAM 768k
  • Power 96k
  • Cables,I/O 69k
  • HW total 5,161k

data crossbar switch
Xbar
Xbar
Mem
Mem

16
1
s c s i
s c s i
s c s i
s c s i









23
source www.tpc.org
1
75
State of the Art Cluster Tandem/Compaq SMP
  • ServerNet switched network
  • Rack mounted equipment
  • SMP 4-PPro, 3GB dram, 3 disks (6/rack)
  • 10 Disk shelves/rack_at_ 7 disks/shelf
  • Total 6 SMPs (24 CPUs, 18 GB DRAM), 402 disks
    (2.7 TB)
  • TPC-C, Oracle 8, 4/98
  • CPUs 191k
  • DRAM, 122k
  • Diskscntlr 425k
  • Disk shelves 94k
  • Networking 76k
  • Racks 15k
  • HW total 926k

76
1997 Berkeley Cluster Zoom Project
  • 3 TB storage system
  • 370 8 GB disks, 20 200 MHz PPro PCs, 100Mbit
    Switched Ethernet
  • System cost small delta (30) over raw disk cost
  • Application San Francisco Fine Arts Museum
    Server
  • 70,000 art images online
  • Zoom in 32X try it yourself!
  • www.Thinker.org (statue)

77
User Decision Support Demand vs. Processor speed
Database demand 2X / 9-12 months
Database-Proc. Performance Gap
Gregs Law
CPU speed 2X / 18 months
Moores Law
78
Berkeley Perspective on Post-PC Era
  • PostPC Era will be driven by 2 technologies
  • 1) GadgetsTiny Embedded or Mobile Devices
  • ubiquitous in everything
  • e.g., successor to PDA, cell phone, wearable
    computers
  • 2) Infrastructure to Support such Devices
  • e.g., successor to Big Fat Web Servers, Database
    Servers

79
Intelligent RAM IRAM
  • Microprocessor DRAM on a single chip
  • 10X capacity vs. SRAM
  • on-chip memory latency 5-10X, bandwidth 50-100X
  • improve energy efficiency 2X-4X (no off-chip
    bus)
  • serial I/O 5-10X v. buses
  • smaller board area/volume
  • IRAM advantages extend to
  • a single chip system
  • a building block for larger systems

80
Other examples IBM Blue Gene
  • 1 PetaFLOPS in 2005 for 100M?
  • Application Protein Folding
  • Blue Gene Chip
  • 32 Multithreaded RISC processors ??MB Embedded
    DRAM high speed Network Interface on single 20
    x 20 mm chip
  • 1 GFLOPS / processor
  • 2 x 2 Board 64 chips (2K CPUs)
  • Rack 8 Boards (512 chips,16K CPUs)
  • System 64 Racks (512 boards,32K chips,1M CPUs)
  • Total 1 million processors in just 2000 sq. ft.

81
Other examples Sony Playstation 2
  • Emotion Engine 6.2 GFLOPS, 75 million polygons
    per second (Microprocessor Report, 135)
  • Superscalar MIPS core vector coprocessor
    graphics/DRAM
  • Claim Toy Story realism brought to games

82
The problem space big data
  • Big demand for enormous amounts of data
  • today high-end enterprise and Internet
    applications
  • enterprise decision-support, data mining
    databases
  • online applications e-commerce, mail, web,
    archives
  • future infrastructure services, richer data
  • computational storage back-ends for mobile
    devices
  • more multimedia content
  • more use of historical data to provide better
    services
  • Todays SMP server designs cant easily scale
  • Bigger scaling problems than performance!

83
The real scalability problems AME
  • Availability
  • systems should continue to meet quality of
    service goals despite hardware and software
    failures
  • Maintainability
  • systems should require only minimal ongoing human
    administration, regardless of scale or complexity
  • Evolutionary Growth
  • systems should evolve gracefully in terms of
    performance, maintainability, and availability as
    they are grown/upgraded/expanded
  • These are problems at todays scales, and will
    only get worse as systems grow

84
ISTORE-1 hardware platform
  • 80-node x86-based cluster, 1.4TB storage
  • cluster nodes are plug-and-play, intelligent,
    network-attached storage bricks
  • a single field-replaceable unit to simplify
    maintenance
  • each node is a full x86 PC w/256MB DRAM, 18GB
    disk
  • more CPU than NAS fewer disks/node than cluster

Intelligent Disk Brick Portable PC CPU Pentium
II/266 DRAM Redundant NICs (4 100 Mb/s
links) Diagnostic Processor
  • ISTORE Chassis
  • 80 nodes, 8 per tray
  • 2 levels of switches
  • 20 100 Mbit/s
  • 2 1 Gbit/s
  • Environment Monitoring
  • UPS, redundant PS,
  • fans, heat and vibration sensors...

85
Conclusion
  • IRAM attractive for two Post-PC applications
    because of low power, small size, high memory
    bandwidth
  • Gadgets Embedded/Mobile devices
  • Infrastructure Intelligent Storage and Networks
  • PostPC infrastructure requires
  • New Goals Availability, Maintainability,
    Evolution
  • New Principles Introspection, Performance
    Robustness
  • New Techniques Isolation/fault insertion,
    Software scrubbing
  • New Benchmarks measure, compare AME metrics

86
Questions?
  • Contact us if youre interestedemail
    patterson_at_cs.berkeley.edu http//iram.cs.berkeley
    .edu/
Write a Comment
User Comments (0)
About PowerShow.com