Title: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com
1Scaleable ComputingJim GrayMicrosoft
CorporationGray_at_Microsoft.com
2Thesis Scaleable Servers
- Scaleable Servers
- Commodity hardware allows new applications
- New applications need huge servers
- Clients and servers are built of the same stuff
- Commodity software and
- Commodity hardware
- Servers should be able to
- Scale up (grow node by adding CPUs, disks,
networks) - Scale out (grow by adding nodes)
- Scale down (can start small)
- Key software technologies
- Objects, Transactions, Clusters, Parallelism
31987 256 tps Benchmark
- 14 M computer (Tandem)
- A dozen people
- False floor, 2 rooms of machines
Admin expert
Hardware experts
A 32 node processor array
Auditor
Network expert
Simulate 25,600 clients
Manager
Performance expert
OS expert
DB expert
A 40 GB disk array (80 drives)
41988 DB2 CICS Mainframe65 tps
- IBM 4391
- Simulated network of 800 clients
- 2m computer
- Staff of 6 to do benchmark
2 x 3725 network controllers
Refrigerator-sized CPU
16 GB disk farm 4 x 8 x .5GB
51997 10 years later1 Person and 1 box 1250 tps
- 1 Breadbox 5x 1987 machine room
- 23 GB is hand-held
- One person does all the work
- Cost/tps is 1,000x less25 micro dollars per
transaction
4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk
Hardware expert OS expert Net expert DB
expert App expert
3 x7 x 4GB disk arrays
6What Happened?
- Moores law Things get 4x better every 3
years (applies to computers, storage, and
networks) - New Economics Commodityclass price/mips
software /mips
k/yearmainframe 10,000 100 minicomputer
100 10microcomputer 10
1 - GUI Human - computer tradeoffoptimize for
people, not computers
7What Happens Next
- Last 10 years 1000x improvement
- Next 10 years ????
- Today text and image servers are free 25
m/hit gt advertising pays for them - Futurevideo, audio, servers are freeYou
aint seen nothing yet!
8Kinds Of Information Processing
Point-to-point
Broadcast
Lecture Concert
Conversation Money
Network
Immediate
Book Newspaper
Mail
Time-shifted
Database
Its ALL going electronic Immediate is being
stored for analysis (so ALL database) Analysis
and automatic processing are being added
9Why Put EverythingIn Cyberspace?
Point-to-point OR broadcast
Low rent - min /byte Shrinks time - now
or later Shrinks space - here or
there Automate processing - knowbots
Network
Immediate OR time-delayed
Locate Process Analyze Summarize
Database
10Magnetic Storage Cheaper Than Paper
- File cabinet cabinet (four drawer) 250 paper
(24,000 sheets) 250 space (2x3 _at_
10/ft2) 180 total 700 3/sheet - Disk disk (4 GB ) 800 ASCII 2 mil pages
0.04/sheet (80x cheaper) - Image 200,000 pages 0.4/sheet (8x cheaper)
- Store everything on disk
11DatabasesInformation at Your Fingertips
Information NetworkKnowledge Navigator
- All information will be in anonline database
(somewhere) - You might record everything you
- Read 10MB/day, 400 GB/lifetime(eight tapes
today) - Hear 400MB/day, 16 TB/lifetime(three
tapes/year today) - See 1MB/s, 40GB/day, 1.6 PB/lifetime (maybe
someday)
12Database StoreALL Data Types
- The new world
- Billions of objects
- Big objects (1 MB)
- Objects have behavior (methods)
- The old world
- Millions of objects
- 100-byte objects
- Paperless office
- Library of Congress online
- All information online
- Entertainment
- Publishing
- Business
- WWW and Internet
People
Name
Address
Papers
Picture
Voice
NY
David
Mike
Berk
Won
Austin
13Billions Of Clients
- Every device will be intelligent
- Doors, rooms, cars
- Computing will be ubiquitous
14Billions Of ClientsNeed Millions Of Servers
- All clients networked to servers
- May be nomadicor on-demand
- Fast clients wantfaster servers
- Servers provide
- Shared Data
- Control
- Coordination
- Communication
Clients
Mobileclients
Fixedclients
Servers
Server
Super server
15ThesisMany little beat few big
1 million
100 K
10 K
Pico Processor
Micro
Nano
10 pico-second ram
1 MB
Mini
Mainframe
10
0
MB
1
0 GB
1
TB
1
00 TB
1.8"
2.5"
3.5"
5.25"
1 M SPECmarks, 1TFLOP 106 clocks to bulk
ram Event-horizon on chip VM reincarnated Multi
program cache, On-Chip SMP
9"
14"
- Smoking, hairy golf ball
- How to connect the many little parts?
- How to program the many little parts?
- Fault tolerance?
16Future Super Server4T Machine
- Array of 1,000 4B machines
- 1 bps processors
- 1 BB DRAM
- 10 BB disks
- 1 Bbps comm lines
- 1 TB tape robot
- A few megabucks
- Challenge
- Manageability
- Programmability
- Security
- Availability
- Scaleability
- Affordability
- As easy as a single system
Cyber Brick a 4B machine
Future servers are CLUSTERS of processors,
discs Distributed database techniques make
clusters work
17The Hardware Is In PlaceAnd then a miracle
occurs
?
- SNAP scaleable networkand platforms
- Commodity-distributedOS built on
- Commodity platforms
- Commodity networkinterconnect
- Enables parallel applications
18Thesis Scaleable Servers
- Scaleable Servers
- Commodity hardware allows new applications
- New applications need huge servers
- Clients and servers are built of the same stuff
- Commodity software and
- Commodity hardware
- Servers should be able to
- Scale up (grow node by adding CPUs, disks,
networks) - Scale out (grow by adding nodes)
- Scale down (can start small)
- Key software technologies
- Objects, Transactions, Clusters, Parallelism
19Scaleable ServersBOTH SMP And Cluster
Grow up with SMP 4xP6is now standard Grow out
with cluster Cluster has inexpensive parts
SMP superserver Departmentalserver Personalsy
stem
Clusterof PCs
20SMPs Have Advantages
- Single system image easier to manage, easier to
program threads in shared memory, disk, Net - 4x SMP is commodity
- Software capable of 16x
- Problems
- gt4 not commodity
- Scale-down problem (starter systems expensive)
- There is a BIGGEST one
SMP superserver Departmentalserver Personalsy
stem
21Building the Largest Node
- There is a biggest node (size grows over time)
- Today, with NT, it is probably 1TB
- We are building it (with help from DEC and SPIN2)
- 1 TB GeoSpatial SQL Server database
- (1.4 TB of disks 320 drives).
- 30K BTU, 8 KVA, 1.5 metric tons.
- Will put it on the Web as a demo app.
- 10 meter image of the ENTIRE PLANET.
- 2 meter image of interesting parts (2 of
land) One pixel per meter 500 TB
uncompressed. - Better resolution in US (courtesy of USGS).
22Whats TeraByte?
- 1 Terabyte
- 1,000,000,000 business letters 150 miles
of book shelf - 100,000,000 book pages 15 miles of
book shelf - 50,000,000 FAX images 7 miles of
book shelf - 10,000,000 TV pictures (mpeg)
10 days of video 4,000 LandSat images 16
earth images (100m) - 100,000,000 web page 10 copies of
the web HTML - Library of Congress (in ASCII) is 25 TB
-
- 1980 200 million of disc
10,000 discs - 5 million of tape silo 10,000 tapes
- 1997 200 k of magnetic disc
48 discs - 30 k nearline tape
20 tapes - Terror Byte !
23 TB DB User Interface
Next
24Tpc-C Web-Based Benchmarks
- Client is a Web browser (7,500 of them!)
- Submits
- Order
- Invoice
- Query to server via Web page interface
- Web server translates to DB
- SQL does DB work
- Net
- easy to implement
- performance is GREAT!
HTTP
IIS Web
ODBC
SQL
25Grow UP and OUT
1 Terabyte DB
- Cluster
- a collection of nodes
- as easy to program and manage as a single node
1 billion transactions per day
26Clusters Have Advantages
- Clients and servers made from the same stuff
- Inexpensive
- Built with commodity components
- Fault tolerance
- Spare modules mask failures
- Modular growth
- Grow by adding small modules
- Unlimited growth no biggest one
27Windows NT Clusters
- Microsoft 60 vendors defining NT clusters
- Almost all big hardware and software vendors
involved - No special hardware needed - but it may help
- Fault-tolerant first, scaleable second
- Microsoft, Oracle, SAP giving demos today
- Enables
- Commodity fault-tolerance
- Commodity parallelism (data mining, virtual
reality) - Also great for workgroups!
28Billion Transactions per DayProject
- Building a 20-node Windows NT Cluster (with help
from Intel)gt 800 disks - All commodity parts
- Using SQL Server DTC distributed transactions
- Each node has 1/20 th of the DB
- Each node does 1/20 th of the work
- 15 of the transactions are distributed
29How Much Is 1 Billion Transactions Per Day?
- 1 Btpd 11,574 tps (transactions per second)
700,000 tpm (transactions/minute) - ATT
- 185 million calls (peak day worldwide)
- Visa 20 M tpd
- 400 M customers
- 250,000 ATMs worldwide
- 7 billion transactions / year (cardcheque) in
1994
Millions of transactions per day
1,000.
100.
10.
Mtpd
1.
0.1
ATT
Visa
BofA
NYSE
1 Btpd
30ParallelismThe OTHER aspect of clusters
- Clusters of machines allow two kinds of
parallelism - Many little jobs online transaction processing
- TPC-A, B, C
- A few big jobs data search and analysis
- TPC-D, DSS, OLAP
- Both give automatic parallelism
31Kinds of Parallel Execution
Any
Any
Sequential
Sequential
Pipeline
Program
Program
Partition outputs split N ways inputs merge
M ways
Any
Any
Sequential
Sequential
Program
Program
Jim Gray Gordon Bell VLDB 95 Parallel
Database Systems Survey
32Data Rivers Split Merge Streams
N X M Data Streams
M Consumers
N producers
River
Producers add records to the river, Consumers
consume records from the river Purely sequential
programming. River does flow control and
buffering does partition and merge of data
records River Split/Merge in Gamma Exchange
operator in Volcano.
Jim Gray Gordon Bell VLDB 95 Parallel
Database Systems Survey
33Partitioned Execution
Spreads computation and IO among processors
Partitioned data gives
NATURAL parallelism
Jim Gray Gordon Bell VLDB 95 Parallel
Database Systems Survey
34N x M way Parallelism
N inputs, M outputs, no bottlenecks. Partitioned
Data Partitioned and Pipelined Data Flows
Jim Gray Gordon Bell VLDB 95 Parallel
Database Systems Survey
35The Parallel Law Of Computing
Grosch's Law
Parallel Law Needs Linear speedup and
linear scale-up Not always possible
2x is 4x performance
2x is2x performance
1,000 MIPS 1,000
1 MIPS 1
36Thesis Scaleable Servers
- Scaleable Servers
- Commodity hardware allows new applications
- New applications need huge servers
- Clients and servers are built of the same stuff
- Commodity software and
- Commodity hardware
- Servers should be able to
- Scale up (grow node by adding CPUs, disks,
networks) - Scale out (grow by adding nodes)
- Scale down (can start small)
- Key software technologies
- Objects, Transactions, Clusters, Parallelism
37The BIG PictureComponents and transactions
- Software modules are objects
- Object Request Broker (a.k.a., Transaction
Processing Monitor) connects objects(clients to
servers) - Standard interfaces allow software plug-ins
- Transaction ties execution of a job into an
atomic unit all-or-nothing, durable, isolated
Object Request Broker
38ActiveX and COM
- COM is Microsoft model, engine inside OLE ALL
Microsoft software is based on COM (ActiveX) - CORBA OpenDoc is equivalent
- Heated debate over which is best
- Both share same key goals
- Encapsulation hide implementation
- Polymorphism generic operationskey to GUI and
reuse - Versioning allow upgrades
- Transparency local/remote
- Security invocation can be remote
- Shrink-wrap minimal inheritance
- Automation easy
- COM now managed by the Open Group
39Linking And EmbeddingObjects are data
modulestransactions are execution modules
- Link pointer to object somewhere else
- Think URL in Internet
- Embed bytesare here
- Objects may be active can callback to subscribers
40Commodity Software ComponentsInexpensive OS,
DBMSand plug-ins
- Recent TPC-C prices
- Oracle on DEC UNIX 30.4 k tpmC _at_ 305/tpmC
- Informix on DEC UNIX 13.6 k tpmC _at_ 277/tpmC
- DB2 on Solaris 6.4 ktpmC _at_ 200/tpmC
- SQL Server on Compaq, Windows NT 6.7 ktpmC _at_
90/tpmC (using Web, no TP monitor!) - Oracle on Windows NT 3.1 ktpmC _at_ 198/tpmC
- Net Open solutionscan do even biggest jobs
thousands of online users per node of cluster - ActiveX, VBX, andJava plug-ins
- Spreadsheets, GeoQuery, FAX, voice, image
libraries, commodity component market
41Objects Meet DatabasesThe basis for universal
data servers, access, integration
- object-oriented (COM oriented) programming
interface to data - Breaks DBMS into components
- Anything can be a data source
- Optimization/navigation on top of other data
sources - A way to componentized a DBMS
- Makes an RDBMS and O-RDBMS (assumes optimizer
understands objects)
DBMS engine
42The Pattern Three Tier Computing
Presentation
- Clients do presentation, gather input
- Clients do some workflow (Xscript)
- Clients send high-level requests to ORB (Object
Request Broker) - ORB dispatches workflows and business objects --
proxies for client, orchestrate flows queues - Server-side workflow scripts call on distributed
business objects to execute task
workflow
Business Objects
Database
43The Three Tiers
Object Data server.
44Why Did Everyone Go To Three-Tier?
- Manageability
- Business rules must be with data
- Middleware operations tools
- Performance (scaleability)
- Server resources are precious
- ORB dispatches requests to server pools
- Technology Physics
- Put UI processing near user
- Put shared data processing near shared data
Presentation
workflow
Business Objects
Database
45Why Put Business Objects at Server?
46What Middleware Does ORB, TP Monitor, Workflow
Mgr, Web Server
- Registers transaction programs workflow and
business objects (DLLs) - Pre-allocates server pools
- Provides server execution environment
- Dynamically checks authority (request-level
security) - Does parameter binding
- Dispatches requests to servers
- parameter binding
- load balancing
- Provides Queues
- Operator interface
47Server Side Objects Easy Server-Side Execution
A Server
- Give simple execution environment
- Object gets
- start
- invoke
- shutdown
- Everything else is automatic
- Drag Drop Business Objects
Network
Receiver
Queue
Management
Connections
Context
Security
Configuration
Thread Pool
Service logic
Synchronization
Shared Data
48A new programming paradigm
- Develop object on the desktop
- Better yet download them from the Net
- Script work flows as method invocations
- All on desktop
- Then, move work flows and objects to server(s)
- Gives
- desktop development
- three-tier deployment
- Software Cyberbricks
49Transactions Coordinate Components (ACID)
- Transaction properties
- Atomic all or nothing
- Consistent old and new values
- Isolated automatic locking or versioning
- Durable once committed, effects survive
- Transactions are built into modern OSs
- MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC
50Transactions Objects
- Application requests transaction identifier (XID)
- XID flows with method invocations
- Object Managers join (enlist)in transaction
- Distributed Transaction Manager coordinates
commit/abort
51Transactions Coordinate Components (ACID)
- Programmers view bracket a collection of
actions - A simple failure model
- Only two outcomes
Begin() action action action
action Commit()
Begin() action action action Rollback()
Begin() action action action Rollback()
Fail !
Success!
Failure!
52Distributed Transactions Enable Huge Throughput
- Each node capable of 7 KtmpC (7,000 active
users!) - Can add nodes to cluster (to support 100,000
users) - Transactions coordinate nodes
- ORB / TP monitor spreads work among nodes
53Distributed Transactions Enable Huge DBs
- Distributed database technology spreads data
among nodes - Transaction processing technology manages nodes
54Thesis Scaleable Servers
- Scaleable Servers Built from Cyberbricks
- Allow new applications
- Servers should be able to
- Scale up, out, down
- Key software technologies
- Clusters (ties the hardware together)
- Parallelism (uses the independent cpus, stores,
wires - Objects (software CyberBricks)
- Transactions masks errors.
55Computer Industry Laws (Rules of thumb)
- Metcalfs law
- Moores first law
- Bells computer classes (7 price tiers)
- Bells platform evolution
- Bells platform economics
- Bills law
- Software economics
- Groves law
- Moores second law
- Is info-demand infinite?
- The death of Groschs law
56Metcalfs LawNetwork Utility Users2
- How many connections can it make?
- 1 user no utility
- 100,000 users a few contacts
- 1 million users many on Net
- 1 billion users everyone on Net
- That is why the Internet is so hot
- Exponential benefit
57Moores First Law
- XXX doubles every 18 months 60 increase per
year - Micro processor speeds
- Chip density
- Magnetic disk density
- Communications bandwidthWAN bandwidth
approaching LANs - Exponential growth
- The past does not matter
- 10x here, 10x there, soon youre talking REAL
change - PC costs decline faster than any other platform
- Volume and learning curves
- PCs will be the building bricks of all future
systems
58Bumps In The Moores Law Road
- DRAM
- 1988 United States anti-dumping
rules - 1993-1995 ?price flat
- Magnetic disk
- 1965-1989 10x/decade
- 1989-1996 4x/3year! 100X/decade
59Gordon Bells 1975 VAX Planning Model... He
Didnt Believe It!
System Price 5 x 3 x .04 x memory size/ 1.26
(t-1972) K
- 5x Memory is20 of cost3x DEC markup.04x
per byte - He didnt believethe projection500 machine
- He couldntcomprehendthe implications
60Gordon Bells ProcessingMemories, And Comm 100
Years
Sec. Mem.
Processing
Pri. Mem
Backbone
POTS(bps)
61Gordon Bells Seven Price Tiers
- 10 wrist watch computers
- 100 pocket/ palm computers
- 1,000 portable computers
- 10,000 personal computers (desktop)
- 100,000 departmental computers
(closet) - 1,000,000 site computers (glass house)
- 10,000,000 regional computers (glass
castle)
Super server costs more than 100,000Mainframe
costs more than 1 million Must be an array
of processors, disks, tapes, comm ports
62Bells Evolution Of Computer Classes
Technology enables two evolutionary paths 1.
constant performance, decreasing cost 2.
constant price, increasing performance
1.26 2x/3 yrs -- 10x/decade 1/1.26 .8 1.6
4x/3 yrs --100x/decade 1/1.6 .62
63Gordon Bells Platform Economics
- Traditional computers custom or semi-custom,
high-tech and high-touch - New computers high-tech and no-touch
100000
10000
Price (K)
1000
Volume (K)
Applicationprice
100
10
1
0.1
0.01
Mainframe
WS
Browser
Computer type
64Software Economics
Microsoft 9 billion
- An engineer costs about150,000/year
- RD gets 515of budget
- Need 3 million1 million revenue per
engineer
Profit 24
RD 16
SGA 34
Tax 13
Productand Service 13
Intel 16 billion
IBM 72 billion
Oracle 3 billion
Profit 15
Profit 6
RD 9
RD 8
Profit
22
Tax 7
SGA
11
Tax
SGA
12
PS 59
43
PS 47
PS 26
65Software Economics Bills Law
Fixed_
Cost
Price
Marginal _Cost
Units
- Bill Joys law (Sun) dont write software for
less than 100,000 platforms _at_10 million
engineering expense, 1,000 price - Bill Gates lawdont write software for less
than 1,000,000 platforms _at_10 engineering
expense, 100 price - Examples
- UNIX versus Windows NT 3,500 versus 500
- Oracle versus SQL-Server 100,000 versus 6,000
- No spreadsheet or presentation pack on
UNIX/VMS/... - Commoditization of base software and hardware
66Groves LawThe New Computer Industry
- Horizontal integrationis new structure
- Each layer picks best from lower layer
- Desktop (C/S) market
- 1991 50
- 1995 75
Example
Function
Operation
ATT
Integration
EDS
Applications
SAP
Middleware
Oracle
Baseware
Microsoft
Systems
Compaq
Intel Seagate
Silicon Oxide
67Moores Second Law
- The cost of fab linesdoubles every generation
(three years) - Money limit hard to imagine
- 10-billion line
- 20-billion line
- 40-billion line
- Physical limit
- Quantum effects at 0.25 micron now 0.05 micron
seems hard 12 years, three generations - Lithograph need Xray below 0.13 micron
68Constant Dollars Versus Constant Work
- Constant work
- One SuperServer can doall the worlds
computations
- Constant dollars
- The world spends 10 oninformation processing
- Computers are moving from5 penetration to 50
- 300 billion to 3 trillion
- We have the patenton the byte and algorithm
69Crossing The Chasm
New market
No product no customers
Product finds customers
Hard
Veryhard
Old market
Hard
Boring competitive slow growth
Customers find product
Old technology
New technology