What Happens When Processing Storage Bandwidth are Free and Infinite - PowerPoint PPT Presentation

About This Presentation
Title:

What Happens When Processing Storage Bandwidth are Free and Infinite

Description:

3x bandwidth/year for 25 more years. Today: 10 Gbps per ... Year 2000 4B Machine. The Year 2000 commodity PC. Billion Instructions/Sec .1 Billion Bytes RAM ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 70
Provided by: jimg178
Category:

less

Transcript and Presenter's Notes

Title: What Happens When Processing Storage Bandwidth are Free and Infinite


1
What Happens WhenProcessingStorageBandwidth
are Free and Infinite?
  • Jim Gray
  • Microsoft Research

2
Outline
  • Hardware CyberBricks
  • all nodes are very intelligent
  • Software CyberBricks
  • standard way to interconnect intelligent nodes
  • What next?
  • Processing migrates to where the power is
  • Disk, network, display controllers have
    full-blown OS
  • Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA)
    to them
  • Computer is a federated distributed system.

3
A Hypothetical QuestionTaking things to the limit
  • Moores law 100x per decade
  • Exa-instructions per second in 30 years
  • Exa-bit memory chips
  • Exa-byte disks
  • Gilders Law of the Telecosom 3x/year more
    bandwidth 60,000x per decade!
  • 40 Gbps per fiber today

4
Groves Law
  • Link Bandwidth doubles every 100 years!
  • Not much has happened to telephones lately
  • Still twisted pair

5
Gilders Telecosom Law 3x bandwidth/year for 25
more years
  • Today
  • 10 Gbps per channel
  • 4 channels per fiber 40 Gbps
  • 32 fibers/bundle 1.2 Tbps/bundle
  • In lab 3 Tbps/fiber (400 x WDM)
  • In theory 25 Tbps per fiber
  • 1 Tbps USA 1996 WAN bisection bandwidth

1 fiber 25 Tbps
6
ThesisMany little beat few big
1 million
100 K
10 K
Pico Processor
Micro
Nano
10 pico-second ram
1 MB
Mini
Mainframe
10
0

MB
1
0 GB
1
TB
1
00 TB
1.8"
2.5"
3.5"
5.25"
1 M SPEC marks, 1TFLOP 106 clocks to bulk
ram Event-horizon on chip VM reincarnated Multi
-program cache, On-Chip SMP
9"
14"
  • Smoking, hairy golf ball
  • How to connect the many little parts?
  • How to program the many little parts?
  • Fault tolerance?

7
Year 2000 4B Machine
  • The Year 2000 commodity PC
  • Billion Instructions/Sec
  • .1 Billion Bytes RAM
  • Billion Bits/s Net
  • 10 B Bytes Disk
  • Billion Pixel display
  • 3000 x 3000 x 24
  • 1,000

8
4 B PCs The Bricks of Cyberspace
  • Cost 1,000
  • Come with
  • OS (NT, POSIX,..)
  • DBMS
  • High speed Net
  • System management
  • GUI / OOUI
  • Tools
  • Compatible with everyone else
  • CyberBricks

9
Super Server 4T Machine
  • Array of 1,000 4B machines
  • 1 b ips processors
  • 1 B B DRAM
  • 10 B B disks
  • 1 Bbps comm lines
  • 1 TB tape robot
  • A few megabucks
  • Challenge
  • Manageability
  • Programmability
  • Security
  • Availability
  • Scaleability
  • Affordability
  • As easy as a single system

Cyber Brick a 4B machine
Future servers are CLUSTERS of processors,
discs Distributed database techniques make
clusters work
10
Functionally Specialized Cards
P mips processor
  • Storage
  • Network
  • Display

Today P50 mips M 2 MB
ASIC
M MB DRAM
In a few years P 200 mips M 64 MB
ASIC
ASIC
11
Its Already True of PrintersPeripheral
CyberBrick
  • You buy a printer
  • You get a
  • several network interfaces
  • A Postscript engine
  • cpu,
  • memory,
  • software,
  • a spooler (soon)
  • and a print engine.

12
System On A Chip
  • Integrate Processing with memory on one chip
  • chip is 75 memory now
  • 1MB cache gtgt 1960 supercomputers
  • 256 Mb memory chip is 32 MB!
  • IRAM, CRAM, PIM, projects abound
  • Integrate Networking with processing on one chip
  • system bus is a kind of network
  • ATM, FiberChannel, Ethernet,.. Logic on chip.
  • Direct IO (no intermediate bus)
  • Functionally specialized cards shrink to a chip.

13
All Device Controllers will be Cray 1s
  • TODAY
  • Disk controller is 10 mips risc engine with 2MB
    DRAM
  • NIC is similar power
  • SOON
  • Will become 100 mips systems with 100 MB DRAM.
  • They are nodes in a federation (can run Oracle
    on NT in disk controller).
  • Advantages
  • Uniform programming model
  • Great tools
  • Security
  • economics (cyberbricks)
  • Move computation to data (minimize traffic)

Central Processor Memory
Tera Byte Backplane
14
With Tera Byte Interconnectand Super Computer
Adapters
  • Processing is incidental to
  • Networking
  • Storage
  • UI
  • Disk Controller/NIC is
  • faster than device
  • close to device
  • Can borrow device package power
  • So use idle capacity for computation.
  • Run app in device.

15
Implications
Conventional
Radical
  • Move app to NIC/device controller
  • higher-higher level protocols CORBA / DCOM.
  • Cluster parallelism is VERY important.
  • Offload device handling to NIC/HBA
  • higher level protocols I2O, NASD, VIA
  • SMP and Cluster parallelism is important.

16
How Do They Talk to Each Other?
  • Each node has an OS
  • Each node has local resources A federation.
  • Each node does not completely trust the others.
  • Nodes use RPC to talk to each other
  • CORBA? DCOM? IIOP? RMI?
  • One or all of the above.
  • Huge leverage in high-level interfaces.
  • Same old distributed system story.

Applications
Applications
datagrams
datagrams
streams
RPC
?
streams
RPC
?
VIAL/VIPL
VIAL/VIPL
Wire(s)
17
Outline
  • Hardware CyberBricks
  • all nodes are very intelligent
  • Software CyberBricks
  • standard way to interconnect intelligent nodes
  • What next?
  • Processing migrates to where the power is
  • Disk, network, display controllers have
    full-blown OS
  • Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA)
    to them
  • Computer is a federated distributed system.

18
Objects!
  • Its a zoo
  • ORBs, COM, CORBA,..
  • Object Relationa1 Databases
  • Objects and 3-tier computing

19
History and Alphabet Soup
1985
X/Open
1990
1995
Open Group
COM
20
The Promise
  • Both camps
  • Share key goals
  • Encapsulation hide implementation
  • Polymorphism generic opskey to GUI and reuse
  • Uniform Naming
  • Discovery finding a service
  • Fault handling transactions
  • Versioning allow upgrades
  • Transparency local/remote
  • Security who has authority
  • Shrink-wrap minimal inheritance
  • Automation easy
  • Objects are Software CyberBricks
  • productivity breakthrough (plug ins)
  • manageability breakthrough (modules)
  • Microsoft Promises Cairo distributed objects,
    secure, transparent, fast invocation
  • IBM/Sun/Oracle/Netscape promise CORBA Open
    Doc Java Beans
  • All will deliver
  • Customers can pick the best one

21
The OLE-COM Experience
  • Macintosh had Publish Subscribe
  • PowerPoint needed graphs
  • plugged MS Graph in as an component.
  • Office adopted OLE
  • one graph program for all of office
  • Internet arrived
  • URLs are object references,
  • Office is Web Enabled right away!
  • Office97 smaller than Office95 because of shared
    components
  • It works!!

22
Linking And EmbeddingObjects are data
modulestransactions are execution modules
  • Link pointer to object somewhere else
  • Think URL in Internet
  • Embed bytesare here
  • Objects may be active can callback to subscribers

23
Objects Meet Databasesbasis for universal data
servers, access, integration
  • Object-oriented (COM oriented) interface to data
  • Breaks DBMS into components
  • Anything can be a data source
  • Optimization/navigation on top of other data
    sources
  • Makes an RDBMS anO-R DBMS assuming optimizer
    understands objects

DBMS engine
24
The BIG PictureComponents and transactions
  • Software modules are objects
  • Object Request Broker (a.k.a., Transaction
    Processing Monitor) connects objects (clients
    to servers)
  • Standard interfaces allow software plug-ins
  • Transaction ties execution of a job into an
    atomic unit all-or-nothing, durable,
    isolated
  • ActiveX Components are a 250M/year business.

Object Request Broker
25
Object Request Broker (ORB) Orchestrates RPC
  • Registers Servers
  • Manages pools of servers
  • Connects clients to servers
  • Does Naming, request-level authorization,
  • Provides transaction coordination
  • Direct and queued invocation
  • Old names
  • Transaction Processing Monitor,
  • Web server,
  • NetWare

Object-Request Broker
26
The OO Points So Far
  • Objects are software Cyber Bricks
  • Object interconnect standards are emerging
  • Cyber Bricks become Federated Systems.
  • Next points
  • put processing close to data
  • do parallel processing.

27
Three Tier Computing
  • Clients do presentation, gather input
  • Clients do some workflow (Xscript)
  • Clients send high-level requests to ORB
  • ORB dispatches work-flows and business objects
    -- proxies for client, orchestrate flows
    queues
  • Server-side workflow scripts call on distributed
    business objects to execute task

Presentation
workflow
Business Objects
Database
28
The Three Tiers
Object Data server.
29
Transaction Processing Evolution to Three
TierIntelligence migrated to clients
Mainframe
cards
  • Mainframe Batch processing (centralized)
  • Dumb terminals Remote Job Entry
  • Intelligent terminals database backends
  • Workflow SystemsObject Request
    BrokersApplication Generators

TP Monitor
ORB
30
Web Evolution to Three TierIntelligence migrated
to clients (like TP)
Web Server
WAIS
  • Character-mode clients, smart servers
  • GUI Browsers - Web file servers
  • GUI Plugins - Web dispatchers - CGI
  • Smart clients - Web dispatcher (ORB)pools of app
    servers (ISAPI, Viper)workflow scripts at client
    server

archie ghopher green screen
31
PC Evolution to Three Tier Intelligence migrated
to server
  • Stand-alone PC (centralized)
  • PC File print server message per I/O
  • PC Database server message per SQL statement
  • PC App server message per transaction
  • ActiveX Client, ORB ActiveX server, Xscript

IO request reply
disk I/O
SQL Statement
Transaction
32
Why Did Everyone Go To Three-Tier?
  • Manageability
  • Business rules must be with data
  • Middleware operations tools
  • Performance (scaleability)
  • Server resources are precious
  • ORB dispatches requests to server pools
  • Technology Physics
  • Put UI processing near user
  • Put shared data processing near shared data
  • Minimizes data moves
  • Encapsulate / modularity

Presentation
workflow
Business Objects
Database
33
Why Put Business Objects at Server?
34
The OO Points So Far
  • Objects are software Cyber Bricks
  • Object interconnect standards are emerging
  • Cyber Bricks become Federated Systems.
  • Put processing close to data
  • Next point
  • do parallel processing.

35
Parallelism the OTHER half of Super-Servers
  • Clusters of machines allow two kinds of
    parallelism
  • Many little jobs Online transaction processing
  • TPC A, B, C,
  • A few big jobs data search analysis
  • TPC D, DSS, OLAP
  • Both give automatic Parallelism

36
Why Parallel Access To Data?
At 10 MB/s 1.2 days to scan
1,000 x parallel 100 second SCAN.
BANDWIDTH
Parallelism divide a big problem into many
smaller ones to be solved in parallel.
37
Kinds of Parallel Execution
Any
Any
Sequential
Sequential
Pipeline
Program
Program
Sequential
Sequential
Partition outputs split N ways inputs merge
M ways
Any
Any
Sequential
Sequential
Sequential
Sequential
Program
Program
38
Why are Relational OperatorsSuccessful for
Parallelism?
Relational data model uniform operators on
uniform data stream Closed under
composition Each operator consumes 1 or 2 input
streams Each stream is a uniform collection of
data Sequential data in and out Pure
dataflow partitioning some operators (e.g.
aggregates, non-equi-join, sort,..) requires
innovation AUTOMATIC PARALLELISM
39
Database Systems Hide Parallelism
  • Automate system management via tools
  • data placement
  • data organization (indexing)
  • periodic tasks (dump / recover / reorganize)
  • Automatic fault tolerance
  • duplex failover
  • transactions
  • Automatic parallelism
  • among transactions (locking)
  • within a transaction (parallel execution)

40
SQL a Non-Procedural Programming Language
  • SQL functional programming language
    describes answer set.
  • Optimizer picks best execution plan
  • Picks data flow web (pipeline),
  • degree of parallelism (partitioning)
  • other execution parameters (process placement,
    memory,...)

Execution
Planning
Monitor
Schema
Plan
GUI
Optimizer
Rivers
41
Automatic Data Partitioning
Split a SQL table to subset of nodes
disks Partition within set Range Hash Round
Robin
Good for equijoins, range queries group-by
Good for equijoins
Good to spread load
Shared disk and memory less sensitive to
partitioning, Shared nothing benefits from
"good" partitioning
42
N x M way Parallelism
N inputs, M outputs, no bottlenecks.
43
Parallel Objects?
  • How does all this DB parallelism connect to
    hardware/software Cyber Bricks?
  • To scale to large client sets
  • need lots of independent parallel execution.
  • Comes for from from ORB.
  • To scale to large data sets
  • need intra-program parallelism (like parallel
    DBs)
  • Requires some invention.

44
Outline
  • Hardware CyberBricks
  • all nodes are very intelligent
  • Software CyberBricks
  • standard way to interconnect intelligent nodes
  • What next?
  • Processing migrates to where the power is
  • Disk, network, display controllers have
    full-blown OS
  • Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA)
    to them
  • Computer is a federated distributed system.
  • Parallel execution is important

45
MORE SLIDESbut there is only so much time.
  • Too bad

46
The Disk Farm On a Card
  • The 100GB disc card
  • An array of discs
  • Can be used as
  • 100 discs
  • 1 striped disc
  • 10 Fault Tolerant discs
  • ....etc
  • LOTS of accesses/second
  • bandwidth

14"
Life is cheap, its the accessories that cost
ya. Processors are cheap, its the peripherals
that cost ya (a 10k disc card).
47
Parallelism Performance is the Goal
Goal is to get 'good' performance. Trade time
for money.
Law 1 parallel system should be faster than
serial system
Law 2 parallel system should give near-linear
scaleup or near-linear speedup or both.
Parallel DBMSs obey these laws
48
Success Stories
  • Online Transaction Processing
  • many little jobs
  • SQL systems support
  • 50 k tpm-C (44 cpu, 600 disk 2 node )
  • Batch (decision support and Utility)
  • few big jobs, parallelism inside
  • Scan data at 100 MB/s
  • Linear Scaleup to 1,000 processors

transactions / sec
hardware
recs/ sec
hardware
49
The New Law of Computing
Grosch's Law
Parallel Law Needs Linear Speedup and
Linear Scaleup Not always possible
50
Clusters being built
  • Teradata 1,000 nodes (30k/slice)
  • Tandem,VMScluster 150 nodes (100k/slice)
  • Intel, 9,000 nodes _at_ 55M ( 6k/slice)
  • Teradata, Tandem, DEC moving to NTlow slice
    price
  • IBM 512 nodes ASCI _at_ 100m
    (200k/slice)
  • PC clusters (bare handed) at dozens of nodes web
    servers (msn, PointCast,), DB servers
  • KEY TECHNOLOGY HERE IS THE APPS.
  • Apps distribute data
  • Apps distribute execution

51
Great Debate Shared What?SMP or Cluster?
Shared Memory (SMP)
Shared Nothing (network)
Shared Disk
Easy to program Difficult to build Difficult to
scaleup
Hard to program Easy to build Easy to scaleup
Sequent, SGI, Sun
VMScluster, Sysplex
Tandem, Teradata, SP2
Winner will be a synthesis of these
ideas Distributed shared memory (DASH, Encore)
blurs distinction between Network and Bus
(locality still important) But gives Shared
memory message cost.
52
BOTH SMP and Cluster?
Grow Up with SMP 4xP6 is now standard Grow Out
with Cluster Cluster has inexpensive parts
Cluster of PCs
53
Clusters Have Advantages
  • Clients and Servers made from the same stuff.
  • Inexpensive
  • Built with commodity components
  • Fault tolerance
  • Spare modules mask failures
  • Modular growth
  • grow by adding small modules

54
Meta-Message Technology Ratios Are Important
  • If everything gets faster cheaper at the
    same rate THEN nothing really changes.
  • Things getting MUCH BETTER
  • communication speed cost 1,000x
  • processor speed cost 100x
  • storage size cost 100x
  • Things staying about the same
  • speed of light (more or less constant)
  • people (10x more expensive)
  • storage speed (only 10x better)

55
Storage Ratios Changed
  • 10x better access time
  • 10x more bandwidth
  • 4,000x lower media price
  • DRAM/DISK 1001 to 1010 to 501

56
Todays Storage Hierarchy Speed Capacity vs
Cost Tradeoffs
Size vs Speed
Price vs Speed
Cache
Nearline
Tape
Offline
Main
Tape
Disc
Secondary
Online
Online
/MB
Secondary
Tape
Tape
Typical System (bytes)
Disc
Main
Offline
Nearline
Tape
Tape
Cache
-9
-6
-3
0
3
-9
-6
-3
0
3
10
10
10
10
10
10
10
10
10
10
Access Time (seconds)
Access Time (seconds)
57
Network Speeds
  • Speed of light did not change
  • Link bandwidth grew 60 / year
  • WAN speeds limited by politics
  • if voice is X/minute, how much is video?
  • Gbps to desktop today!
  • 10 Gbps channel is coming.
  • 3Tbps fibers in laboratory thru parallelism
    (WDM).
  • Paradox
  • WAN link has 40Gbps
  • Processor bus is 2..40 Gbps

Comm Speedups
Processors (i/s)
LANs WANs (b/s)
1960
1970
1980
1990
2000
Year
58
MicroProcessor Speeds Went Up
  • Clock rates went from 10Khz to 400Mhz
  • Processors now 6x issue
  • SPECInt fits in Cache,
  • it tracks cpu speed
  • Peak Advertised Performance (PAP) is 1.2 BIPS
  • Real Application Performance (RAP) is 100
    MIPS
  • Similar curves for
  • DEC VAX Alpha
  • HP/PA
  • IBM R6000/ PowerPC
  • MIPS SGI
  • SUN

59
Performance Storage Accesses not Instructions
Executed
  • In the old days we counted instructions and
    IOs
  • Now we count memory references
  • Processors wait most of the time

Where the time goes
clock ticks used by AlphaSort Components
70 MIPS real apps have worse Icache misses so
run at 60 MIPS if well tuned, 20 MIPS if not
60
Storage Latency How Far Away is the Data?
61
Tape Farms for Tertiary StorageNot Mainframe
Silos
100 robots
1M
50TB
50/GB
3K Maps
10K robot

14 tapes
27 hr Scan
500 GB
5 MB/s
20/GB
Scan in 27 hours. many independent tape
robots (like a disc farm)
30 Maps
62
The Metrics Disk and Tape Farms Win
Data Motel Data checks in, but it never checks
out
GB/K
1
,
000
,
000
Kaps
100
,
000
Maps
SCANS/Day
10
,
000
1
,
000
100
10
1
0.1
0.01
1000 x
D
i
sc Farm
100x DLT
Tape Farm
STC Tape Robot
6,000 tapes, 8 readers
63
Tape Optical Beware of the Media Myth
Optical is cheap 200 /platter
2 GB/platter gt 100/GB (2x
cheaper than disc) Tape is cheap 30 /tape
20 GB/tape gt 1.5 /GB (100x
cheaper than disc).
64
Tape Optical Reality Media is 10 of System
Cost
Tape needs a robot (10 k ... 3 m ) 10 ...
1000 tapes (at 20GB each) gt 20/GB ... 200/GB
(1x10x cheaper than disc) Optical needs a
robot (100 k ) 100 platters 200GB ( TODAY )
gt 400 /GB ( more expensive than mag disc )
Robots have poor access times Not good for
Library of Congress (25TB) Data motel data
checks in but it never checks out!
65
The Access Time Myth
  • The Myth seek or pick time dominates
  • The reality (1) Queuing dominates
  • (2) Transfer dominates BLOBs
  • (3) Disk seeks often short
  • Implication many cheap servers better than
    one fast expensive server
  • shorter queues
  • parallel transfer
  • lower cost/access and cost/byte
  • This is now obvious for disk arrays
  • This will be obvious for tape arrays

66
Billions Of Clients
  • Every device will be intelligent
  • Doors, rooms, cars
  • Computing will be ubiquitous

67
Billions Of ClientsNeed Millions Of Servers
  • All clients networked to servers
  • May be nomadicor on-demand
  • Fast clients wantfaster servers
  • Servers provide
  • Shared Data
  • Control
  • Coordination
  • Communication

Clients
Mobileclients
Fixedclients
Servers
Server
Super server
68
1987 256 tps Benchmark
  • 14 M computer (Tandem)
  • A dozen people
  • False floor, 2 rooms of machines

Admin expert
Hardware experts
A 32 node processor array
Auditor
Network expert
Simulate 25,600 clients
Manager
Performance expert
OS expert
DB expert
A 40 GB disk array (80 drives)
69
1988 DB2 CICS Mainframe65 tps
  • IBM 4391
  • Simulated network of 800 clients
  • 2m computer
  • Staff of 6 to do benchmark

2 x 3725 network controllers
Refrigerator-sized CPU
16 GB disk farm 4 x 8 x .5GB
70
1997 10 years later1 Person and 1 box 1250 tps
  • 1 Breadbox 5x 1987 machine room
  • 23 GB is hand-held
  • One person does all the work
  • Cost/tps is 1,000x less25 micro dollars per
    transaction

4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk
Hardware expert OS expert Net expert DB
expert App expert
3 x7 x 4GB disk arrays
71
What Happened?
  • Moores law Things get 4x better every 3
    years (applies to computers, storage, and
    networks)
  • New Economics Commodityclass price/mips
    software /mips
    k/yearmainframe 10,000 100
    minicomputer 100 10microcomputer
    10 1
  • GUI Human - computer tradeoffoptimize for
    people, not computers

72
What Happens Next
  • Last 10 years 1000x improvement
  • Next 10 years ????
  • Today text and image servers are free 25
    m/hit gt advertising pays for them
  • Futurevideo, audio, servers are freeYou
    aint seen nothing yet!

73
Smart Cards
Then (1979)
Courtesy of Dennis Roberson NCR.
74
Smart Card Memory Capacity
Write a Comment
User Comments (0)
About PowerShow.com