What Happens When Processing Storage Bandwidth are Free and Infinite

About This Presentation

Title:

What Happens When Processing Storage Bandwidth are Free and Infinite

Description:

3x bandwidth/year for 25 more years. Today: 10 Gbps per ... Year 2000 4B Machine. The Year 2000 commodity PC. Billion Instructions/Sec .1 Billion Bytes RAM ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 70

Provided by: jimg178

Category:

more less

Transcript and Presenter's Notes

Title: What Happens When Processing Storage Bandwidth are Free and Infinite

1
What Happens WhenProcessingStorageBandwidth
are Free and Infinite?

Jim Gray
Microsoft Research

2
Outline

Hardware CyberBricks
all nodes are very intelligent
Software CyberBricks
standard way to interconnect intelligent nodes
What next?
Processing migrates to where the power is
Disk, network, display controllers have
full-blown OS
Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA)
to them
Computer is a federated distributed system.

3
A Hypothetical QuestionTaking things to the limit

Moores law 100x per decade
Exa-instructions per second in 30 years
Exa-bit memory chips
Exa-byte disks
Gilders Law of the Telecosom 3x/year more
bandwidth 60,000x per decade!
40 Gbps per fiber today

4
Groves Law

Link Bandwidth doubles every 100 years!
Not much has happened to telephones lately
Still twisted pair

5
Gilders Telecosom Law 3x bandwidth/year for 25
more years

Today
10 Gbps per channel
4 channels per fiber 40 Gbps
32 fibers/bundle 1.2 Tbps/bundle
In lab 3 Tbps/fiber (400 x WDM)
In theory 25 Tbps per fiber
1 Tbps USA 1996 WAN bisection bandwidth

1 fiber 25 Tbps
6
ThesisMany little beat few big
1 million
100 K
10 K
Pico Processor
Micro
Nano
10 pico-second ram
1 MB
Mini
Mainframe
10
0

MB
1
0 GB
1
TB
1
00 TB
1.8"
2.5"
3.5"
5.25"
1 M SPEC marks, 1TFLOP 106 clocks to bulk
ram Event-horizon on chip VM reincarnated Multi
-program cache, On-Chip SMP
9"
14"

Smoking, hairy golf ball
How to connect the many little parts?
How to program the many little parts?
Fault tolerance?

7
Year 2000 4B Machine

The Year 2000 commodity PC
Billion Instructions/Sec
.1 Billion Bytes RAM
Billion Bits/s Net
10 B Bytes Disk
Billion Pixel display
3000 x 3000 x 24
1,000

8
4 B PCs The Bricks of Cyberspace

Cost 1,000
Come with
OS (NT, POSIX,..)
DBMS
High speed Net
System management
GUI / OOUI
Tools
Compatible with everyone else
CyberBricks

9
Super Server 4T Machine

Array of 1,000 4B machines
1 b ips processors
1 B B DRAM
10 B B disks
1 Bbps comm lines
1 TB tape robot
A few megabucks
Challenge
Manageability
Programmability
Security
Availability
Scaleability
Affordability
As easy as a single system

Cyber Brick a 4B machine
Future servers are CLUSTERS of processors,
discs Distributed database techniques make
clusters work
10
Functionally Specialized Cards
P mips processor

Storage
Network
Display

Today P50 mips M 2 MB
ASIC
M MB DRAM
In a few years P 200 mips M 64 MB
ASIC
ASIC
11
Its Already True of PrintersPeripheral
CyberBrick

You buy a printer
You get a
several network interfaces
A Postscript engine
cpu,
memory,
software,
a spooler (soon)
and a print engine.

12
System On A Chip

Integrate Processing with memory on one chip
chip is 75 memory now
1MB cache gtgt 1960 supercomputers
256 Mb memory chip is 32 MB!
IRAM, CRAM, PIM, projects abound
Integrate Networking with processing on one chip
system bus is a kind of network
ATM, FiberChannel, Ethernet,.. Logic on chip.
Direct IO (no intermediate bus)
Functionally specialized cards shrink to a chip.

13
All Device Controllers will be Cray 1s

TODAY
Disk controller is 10 mips risc engine with 2MB
DRAM
NIC is similar power
SOON
Will become 100 mips systems with 100 MB DRAM.
They are nodes in a federation (can run Oracle
on NT in disk controller).
Advantages
Uniform programming model
Great tools
Security
economics (cyberbricks)
Move computation to data (minimize traffic)

Central Processor Memory
Tera Byte Backplane
14
With Tera Byte Interconnectand Super Computer
Adapters

Processing is incidental to
Networking
Storage
UI
Disk Controller/NIC is
faster than device
close to device
Can borrow device package power
So use idle capacity for computation.
Run app in device.

15
Implications
Conventional
Radical

Move app to NIC/device controller
higher-higher level protocols CORBA / DCOM.
Cluster parallelism is VERY important.

Offload device handling to NIC/HBA
higher level protocols I2O, NASD, VIA
SMP and Cluster parallelism is important.

16
How Do They Talk to Each Other?

Each node has an OS
Each node has local resources A federation.
Each node does not completely trust the others.
Nodes use RPC to talk to each other
CORBA? DCOM? IIOP? RMI?
One or all of the above.
Huge leverage in high-level interfaces.
Same old distributed system story.

Applications
Applications
datagrams
datagrams
streams
RPC
?
streams
RPC
?
VIAL/VIPL
VIAL/VIPL
Wire(s)
17
Outline

Hardware CyberBricks
all nodes are very intelligent
Software CyberBricks
standard way to interconnect intelligent nodes
What next?
Processing migrates to where the power is
Disk, network, display controllers have
full-blown OS
Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA)
to them
Computer is a federated distributed system.

18
Objects!

Its a zoo
ORBs, COM, CORBA,..
Object Relationa1 Databases
Objects and 3-tier computing

19
History and Alphabet Soup
1985
X/Open
1990
1995
Open Group
COM
20
The Promise

Both camps
Share key goals
Encapsulation hide implementation
Polymorphism generic opskey to GUI and reuse
Uniform Naming
Discovery finding a service
Fault handling transactions
Versioning allow upgrades
Transparency local/remote
Security who has authority
Shrink-wrap minimal inheritance
Automation easy

Objects are Software CyberBricks
productivity breakthrough (plug ins)
manageability breakthrough (modules)
Microsoft Promises Cairo distributed objects,
secure, transparent, fast invocation
IBM/Sun/Oracle/Netscape promise CORBA Open
Doc Java Beans
All will deliver
Customers can pick the best one

21
The OLE-COM Experience

Macintosh had Publish Subscribe
PowerPoint needed graphs
plugged MS Graph in as an component.
Office adopted OLE
one graph program for all of office
Internet arrived
URLs are object references,
Office is Web Enabled right away!
Office97 smaller than Office95 because of shared
components
It works!!

22
Linking And EmbeddingObjects are data
modulestransactions are execution modules

Link pointer to object somewhere else
Think URL in Internet
Embed bytesare here
Objects may be active can callback to subscribers

23
Objects Meet Databasesbasis for universal data
servers, access, integration

Object-oriented (COM oriented) interface to data
Breaks DBMS into components
Anything can be a data source
Optimization/navigation on top of other data
sources
Makes an RDBMS anO-R DBMS assuming optimizer
understands objects

DBMS engine
24
The BIG PictureComponents and transactions

Software modules are objects
Object Request Broker (a.k.a., Transaction
Processing Monitor) connects objects (clients
to servers)
Standard interfaces allow software plug-ins
Transaction ties execution of a job into an
atomic unit all-or-nothing, durable,
isolated
ActiveX Components are a 250M/year business.

Object Request Broker
25
Object Request Broker (ORB) Orchestrates RPC

Registers Servers
Manages pools of servers
Connects clients to servers
Does Naming, request-level authorization,
Provides transaction coordination
Direct and queued invocation
Old names
Transaction Processing Monitor,
Web server,
NetWare

Object-Request Broker
26
The OO Points So Far

Objects are software Cyber Bricks
Object interconnect standards are emerging
Cyber Bricks become Federated Systems.
Next points
put processing close to data
do parallel processing.

27
Three Tier Computing

Clients do presentation, gather input
Clients do some workflow (Xscript)
Clients send high-level requests to ORB
ORB dispatches work-flows and business objects
-- proxies for client, orchestrate flows
queues
Server-side workflow scripts call on distributed
business objects to execute task

Presentation
workflow
Business Objects
Database
28
The Three Tiers
Object Data server.
29
Transaction Processing Evolution to Three
TierIntelligence migrated to clients
Mainframe
cards

Mainframe Batch processing (centralized)
Dumb terminals Remote Job Entry
Intelligent terminals database backends
Workflow SystemsObject Request
BrokersApplication Generators

TP Monitor
ORB
30
Web Evolution to Three TierIntelligence migrated
to clients (like TP)
Web Server
WAIS

Character-mode clients, smart servers
GUI Browsers - Web file servers
GUI Plugins - Web dispatchers - CGI
Smart clients - Web dispatcher (ORB)pools of app
servers (ISAPI, Viper)workflow scripts at client
server

archie ghopher green screen
31
PC Evolution to Three Tier Intelligence migrated
to server

Stand-alone PC (centralized)
PC File print server message per I/O
PC Database server message per SQL statement
PC App server message per transaction
ActiveX Client, ORB ActiveX server, Xscript

IO request reply
disk I/O
SQL Statement
Transaction
32
Why Did Everyone Go To Three-Tier?

Manageability
Business rules must be with data
Middleware operations tools
Performance (scaleability)
Server resources are precious
ORB dispatches requests to server pools
Technology Physics
Put UI processing near user
Put shared data processing near shared data
Minimizes data moves
Encapsulate / modularity

Presentation
workflow
Business Objects
Database
33
Why Put Business Objects at Server?
34
The OO Points So Far

Objects are software Cyber Bricks
Object interconnect standards are emerging
Cyber Bricks become Federated Systems.
Put processing close to data
Next point
do parallel processing.

35
Parallelism the OTHER half of Super-Servers

Clusters of machines allow two kinds of
parallelism
Many little jobs Online transaction processing
TPC A, B, C,
A few big jobs data search analysis
TPC D, DSS, OLAP
Both give automatic Parallelism

36
Why Parallel Access To Data?
At 10 MB/s 1.2 days to scan
1,000 x parallel 100 second SCAN.
BANDWIDTH
Parallelism divide a big problem into many
smaller ones to be solved in parallel.
37
Kinds of Parallel Execution
Any
Any
Sequential
Sequential
Pipeline
Program
Program
Sequential
Sequential
Partition outputs split N ways inputs merge
M ways
Any
Any
Sequential
Sequential
Sequential
Sequential
Program
Program
38
Why are Relational OperatorsSuccessful for
Parallelism?
Relational data model uniform operators on
uniform data stream Closed under
composition Each operator consumes 1 or 2 input
streams Each stream is a uniform collection of
data Sequential data in and out Pure
dataflow partitioning some operators (e.g.
aggregates, non-equi-join, sort,..) requires
innovation AUTOMATIC PARALLELISM
39
Database Systems Hide Parallelism

Automate system management via tools
data placement
data organization (indexing)
periodic tasks (dump / recover / reorganize)
Automatic fault tolerance
duplex failover
transactions
Automatic parallelism
among transactions (locking)
within a transaction (parallel execution)

40
SQL a Non-Procedural Programming Language

SQL functional programming language
describes answer set.
Optimizer picks best execution plan
Picks data flow web (pipeline),
degree of parallelism (partitioning)
other execution parameters (process placement,
memory,...)

Execution
Planning
Monitor
Schema
Plan
GUI
Optimizer
Rivers
41
Automatic Data Partitioning
Split a SQL table to subset of nodes
disks Partition within set Range Hash Round
Robin
Good for equijoins, range queries group-by
Good for equijoins
Good to spread load
Shared disk and memory less sensitive to
partitioning, Shared nothing benefits from
"good" partitioning
42
N x M way Parallelism
N inputs, M outputs, no bottlenecks.
43
Parallel Objects?

How does all this DB parallelism connect to
hardware/software Cyber Bricks?
To scale to large client sets
need lots of independent parallel execution.
Comes for from from ORB.
To scale to large data sets
need intra-program parallelism (like parallel
DBs)
Requires some invention.

44
Outline

Hardware CyberBricks
all nodes are very intelligent
Software CyberBricks
standard way to interconnect intelligent nodes
What next?
Processing migrates to where the power is
Disk, network, display controllers have
full-blown OS
Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA)
to them
Computer is a federated distributed system.
Parallel execution is important

45
MORE SLIDESbut there is only so much time.

Too bad

46
The Disk Farm On a Card

The 100GB disc card
An array of discs
Can be used as
100 discs
1 striped disc
10 Fault Tolerant discs
....etc
LOTS of accesses/second
bandwidth

14"
Life is cheap, its the accessories that cost
ya. Processors are cheap, its the peripherals
that cost ya (a 10k disc card).
47
Parallelism Performance is the Goal
Goal is to get 'good' performance. Trade time
for money.
Law 1 parallel system should be faster than
serial system
Law 2 parallel system should give near-linear
scaleup or near-linear speedup or both.
Parallel DBMSs obey these laws
48
Success Stories

Online Transaction Processing
many little jobs
SQL systems support
50 k tpm-C (44 cpu, 600 disk 2 node )
Batch (decision support and Utility)
few big jobs, parallelism inside
Scan data at 100 MB/s
Linear Scaleup to 1,000 processors

transactions / sec
hardware
recs/ sec
hardware
49
The New Law of Computing
Grosch's Law
Parallel Law Needs Linear Speedup and
Linear Scaleup Not always possible
50
Clusters being built

Teradata 1,000 nodes (30k/slice)
Tandem,VMScluster 150 nodes (100k/slice)
Intel, 9,000 nodes _at_ 55M ( 6k/slice)
Teradata, Tandem, DEC moving to NTlow slice
price
IBM 512 nodes ASCI _at_ 100m
(200k/slice)
PC clusters (bare handed) at dozens of nodes web
servers (msn, PointCast,), DB servers
KEY TECHNOLOGY HERE IS THE APPS.
Apps distribute data
Apps distribute execution

51
Great Debate Shared What?SMP or Cluster?
Shared Memory (SMP)
Shared Nothing (network)
Shared Disk
Easy to program Difficult to build Difficult to
scaleup
Hard to program Easy to build Easy to scaleup
Sequent, SGI, Sun
VMScluster, Sysplex
Tandem, Teradata, SP2
Winner will be a synthesis of these
ideas Distributed shared memory (DASH, Encore)
blurs distinction between Network and Bus
(locality still important) But gives Shared
memory message cost.
52
BOTH SMP and Cluster?
Grow Up with SMP 4xP6 is now standard Grow Out
with Cluster Cluster has inexpensive parts
Cluster of PCs
53
Clusters Have Advantages

Clients and Servers made from the same stuff.
Inexpensive
Built with commodity components
Fault tolerance
Spare modules mask failures
Modular growth
grow by adding small modules

54
Meta-Message Technology Ratios Are Important

If everything gets faster cheaper at the
same rate THEN nothing really changes.
Things getting MUCH BETTER
communication speed cost 1,000x
processor speed cost 100x
storage size cost 100x
Things staying about the same
speed of light (more or less constant)
people (10x more expensive)
storage speed (only 10x better)

55
Storage Ratios Changed

10x better access time
10x more bandwidth
4,000x lower media price
DRAM/DISK 1001 to 1010 to 501

56
Todays Storage Hierarchy Speed Capacity vs
Cost Tradeoffs
Size vs Speed
Price vs Speed
Cache
Nearline
Tape
Offline
Main
Tape
Disc
Secondary
Online
Online
/MB
Secondary
Tape
Tape
Typical System (bytes)
Disc
Main
Offline
Nearline
Tape
Tape
Cache
-9
-6
-3
0
3
-9
-6
-3
0
3
10
10
10
10
10
10
10
10
10
10
Access Time (seconds)
Access Time (seconds)
57
Network Speeds

Speed of light did not change
Link bandwidth grew 60 / year
WAN speeds limited by politics
if voice is X/minute, how much is video?
Gbps to desktop today!
10 Gbps channel is coming.
3Tbps fibers in laboratory thru parallelism
(WDM).
Paradox
WAN link has 40Gbps
Processor bus is 2..40 Gbps

Comm Speedups
Processors (i/s)
LANs WANs (b/s)
1960
1970
1980
1990
2000
Year
58
MicroProcessor Speeds Went Up

Clock rates went from 10Khz to 400Mhz
Processors now 6x issue
SPECInt fits in Cache,
it tracks cpu speed
Peak Advertised Performance (PAP) is 1.2 BIPS
Real Application Performance (RAP) is 100
MIPS
Similar curves for
DEC VAX Alpha
HP/PA
IBM R6000/ PowerPC
MIPS SGI
SUN

59
Performance Storage Accesses not Instructions
Executed

In the old days we counted instructions and
IOs
Now we count memory references
Processors wait most of the time

Where the time goes
clock ticks used by AlphaSort Components
70 MIPS real apps have worse Icache misses so
run at 60 MIPS if well tuned, 20 MIPS if not
60
Storage Latency How Far Away is the Data?
61
Tape Farms for Tertiary StorageNot Mainframe
Silos
100 robots
1M
50TB
50/GB
3K Maps
10K robot

14 tapes
27 hr Scan
500 GB
5 MB/s
20/GB
Scan in 27 hours. many independent tape
robots (like a disc farm)
30 Maps
62
The Metrics Disk and Tape Farms Win
Data Motel Data checks in, but it never checks
out
GB/K
1
,
000
,
000
Kaps
100
,
000
Maps
SCANS/Day
10
,
000
1
,
000
100
10
1
0.1
0.01
1000 x
D
i
sc Farm
100x DLT
Tape Farm
STC Tape Robot
6,000 tapes, 8 readers
63
Tape Optical Beware of the Media Myth
Optical is cheap 200 /platter
2 GB/platter gt 100/GB (2x
cheaper than disc) Tape is cheap 30 /tape
20 GB/tape gt 1.5 /GB (100x
cheaper than disc).
64
Tape Optical Reality Media is 10 of System
Cost
Tape needs a robot (10 k ... 3 m ) 10 ...
1000 tapes (at 20GB each) gt 20/GB ... 200/GB
(1x10x cheaper than disc) Optical needs a
robot (100 k ) 100 platters 200GB ( TODAY )
gt 400 /GB ( more expensive than mag disc )
Robots have poor access times Not good for
Library of Congress (25TB) Data motel data
checks in but it never checks out!
65
The Access Time Myth

The Myth seek or pick time dominates
The reality (1) Queuing dominates
(2) Transfer dominates BLOBs
(3) Disk seeks often short
Implication many cheap servers better than
one fast expensive server
shorter queues
parallel transfer
lower cost/access and cost/byte
This is now obvious for disk arrays
This will be obvious for tape arrays

66
Billions Of Clients

Every device will be intelligent
Doors, rooms, cars
Computing will be ubiquitous

67
Billions Of ClientsNeed Millions Of Servers

All clients networked to servers
May be nomadicor on-demand
Fast clients wantfaster servers
Servers provide
Shared Data
Control
Coordination
Communication

Clients
Mobileclients
Fixedclients
Servers
Server
Super server
68
1987 256 tps Benchmark

14 M computer (Tandem)
A dozen people
False floor, 2 rooms of machines

Admin expert
Hardware experts
A 32 node processor array
Auditor
Network expert
Simulate 25,600 clients
Manager
Performance expert
OS expert
DB expert
A 40 GB disk array (80 drives)
69
1988 DB2 CICS Mainframe65 tps

IBM 4391
Simulated network of 800 clients
2m computer
Staff of 6 to do benchmark

2 x 3725 network controllers
Refrigerator-sized CPU
16 GB disk farm 4 x 8 x .5GB
70
1997 10 years later1 Person and 1 box 1250 tps

1 Breadbox 5x 1987 machine room
23 GB is hand-held
One person does all the work
Cost/tps is 1,000x less25 micro dollars per
transaction

4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk
Hardware expert OS expert Net expert DB
expert App expert
3 x7 x 4GB disk arrays
71
What Happened?

Moores law Things get 4x better every 3
years (applies to computers, storage, and
networks)
New Economics Commodityclass price/mips
software /mips
k/yearmainframe 10,000 100
minicomputer 100 10microcomputer
10 1
GUI Human - computer tradeoffoptimize for
people, not computers

72
What Happens Next