Title: Scaleable Servers
1Scaleable Servers
- Jim Gray
- Microsoft
- Gray_at_Microsoft.com
- http//www.research.Microsoft.com/Gray
2Thesis Scaleable Servers
- Scaleable Servers
- Commodity hardware allows new applications
- New applications need huge servers
- Clients and servers are built of the same stuff
- Commodity software and
- Commodity hardware
- Servers should be able to
- Scale up (grow node by adding CPUs, disks,
networks) - Scale out (grow by adding nodes)
- Scale down (can start small)
- Key software technologies
- Objects, Transactions, Clusters, Parallelism
31987 256 tps Benchmark
- 14 M computer (Tandem)
- A dozen people
- False floor, 2 rooms of machines
Admin expert
Hardware experts
A 32 node processor array
Auditor
Network expert
Simulate 25,600 clients
Manager
Performance expert
OS expert
DB expert
A 40 GB disk array (80 drives)
41988 DB2 CICS Mainframe65 tps
- IBM 4391
- Simulated network of 800 clients
- 2m computer
- Staff of 6 to do benchmark
2 x 3725 network controllers
Refrigerator-sized CPU
16 GB disk farm 4 x 8 x .5GB
51997 10 years later1 Person and 1 box 1250 tps
- 1 Breadbox 5x 1987 machine room
- 23 GB is hand-held
- One person does all the work
- Cost/tps is 1,000x less25 micro dollars per
transaction
4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk
Hardware expert OS expert Net expert DB
expert App expert
3 x7 x 4GB disk arrays
6What Happened?
- Moores law Things get 4x better every 3
years (applies to computers, storage, and
networks) - New Economics Commodityclass price/mips
software /mips
k/yearmainframe 10,000 100 minicomputer
100 10microcomputer 10
1 - GUI Human - computer tradeoffoptimize for
people, not computers
7Billions Of ClientsNeed Millions Of Servers
- All clients networked to servers
- May be nomadicor on-demand
- Fast clients wantfaster servers
- Servers provide
- Shared Data
- Control
- Coordination
- Communication
Clients
Mobileclients
Fixedclients
Servers
Server
Super server
8ThesisMany little beat few big
1 million
100 K
10 K
Pico Processor
Micro
Nano
10 pico-second ram
1 MB
Mini
Mainframe
10
0
MB
1
0 GB
1
TB
1
00 TB
1.8"
2.5"
3.5"
5.25"
1 M SPECmarks, 1TFLOP 106 clocks to bulk
ram Event-horizon on chip VM reincarnated Multi
program cache, On-Chip SMP
9"
14"
- Smoking, hairy golf ball
- How to connect the many little parts?
- How to program the many little parts?
- Fault tolerance?
9Future Super Server4T Machine
- Array of 1,000 4B machines
- 1 bps processors
- 1 BB DRAM
- 10 BB disks
- 1 Bbps comm lines
- 1 TB tape robot
- A few megabucks
- Challenge
- Manageability
- Programmability
- Security
- Availability
- Scaleability
- Affordability
- As easy as a single system
Cyber Brick a 4B machine
Future servers are CLUSTERS of processors,
discs Distributed database techniques make
clusters work
10The Hardware Is In PlaceAnd then a miracle
occurs
?
- SNAP scaleable networkand platforms
- Commodity-distributedOS built on
- Commodity platforms
- Commodity networkinterconnect
- Enables parallel applications
11Thesis Scaleable Servers
- Scaleable Servers
- Commodity hardware allows new applications
- New applications need huge servers
- Clients and servers are built of the same stuff
- Commodity software and
- Commodity hardware
- Servers should be able to
- Scale up (grow node by adding CPUs, disks,
networks) - Scale out (grow by adding nodes)
- Scale down (can start small)
- Key software technologies
- Objects, Transactions, Clusters, Parallelism
12Scaleable ServersBOTH SMP And Cluster
Grow up with SMP 4xP6is now standard Grow out
with cluster Cluster has inexpensive parts
SMP superserver Departmentalserver Personalsy
stem
Clusterof PCs
13SMPs Have Advantages
- Single system image easier to manage, easier to
program threads in shared memory, disk, Net - 4x SMP is commodity
- Software capable of 16x
- Problems
- gt4 not commodity
- Scale-down problem (starter systems expensive)
- There is a BIGGEST one
SMP superserver Departmentalserver Personalsy
stem
14Tpc-C Web-Based Benchmarks
- Client is a Web browser (9,200 of them!)
- Submits
- Order
- Invoice
- Query to server via Web page interface
- Web server translates to DB
- SQL does DB work
- Net
- easy to implement
- performance is GREAT!
HTTP
IIS Web
ODBC
SQL
15TPC-C Shows How Far SMPs have come
- Performance is amazing
- 2,000 users is the min!
- 30,000 users on a 4x12 alpha cluster (Oracle)
- Peak Performance 30,390 tpmC _at_ 305/tpmC
(Oracle/DEC) - Best Price/Perf 7,693 tpmC _at_ 43/tpmC (MS
SQL/Dell) - graphs show UNIX high price diseconomy of
scaleup
16TPC C SMP Performance
- SMPs do offer speedup
- but 4x P6 is better than some 18x MIPSco
17The TPC-C Revolution Shows How Far NT and SQL
Server have Come
- Economy of scale on Windows NT
- Recent Microsoft SQL Server benchmarks are
Web-based
tpmC and /tpmC
MS
SQL Server Economy of Scale Low Price
250
DB2
200
Informix
150
Better
Price /TPM-C
Microsoft
100
Oracle
50
Sybase
0
0
1000
2000
3000
4000
5000
6000
7000
8000
Performance tpmC
18What Happens To Prices?
- No expensive UNIX front end (20/tpmC)
- No expensive TP monitor software (10/tpmC)
- gt 65/tpmC
19Building the Largest NT Node
- Build a 1 TB SQL Server database
- Show off NT and SQL Server Scaleability
- Stress test the product
- Demo it on the Internet
- WWW accessible by anyone
- So data must be
- 1 TB
- Unencumbered
- Interesting to everyone everywhere
- AND not offensive to anyone anywhere
20Whats TeraByte?
- 1 Terabyte
- 1,000,000,000 business letters 150 miles
of book shelf - 100,000,000 book pages 15 miles of
book shelf - 50,000,000 FAX images 7 miles of
book shelf - 10,000,000 TV pictures (mpeg)
10 days of video 4,000 LandSat images 16
earth images (100m) - 100,000,000 web page 10 copies of
the web HTML - Library of Congress (in ASCII) is 25 TB
-
- 1980 200 million of disc
10,000 discs - 5 million of tape silo 10,000 tapes
- 1997 200 k of magnetic disc
48 discs - 30 k nearline tape
20 tapes - Terror Byte !
21The Plan
- DEC Alpha
- 324 StorageWorks Drives (1.4 TB)
- 30K BTU, 8 KW, 1.5 metric tons.
- SQL 7.0
- USGS data(1 meter)
- Russian Spacedata (2 meter)
DEC 4100 4 x 400 Mhz Alpha Processors 4GB DRAM
22Image Data Sources
23DOQ coverage of the US
- 1 Meter images of many places
- Problems
- most of data not yet published
- interesting places missing (LA, Portland, SD,
Anchorage,) - Loaded published 130 GB.
- CRDA for unpublished 3 TB
24SPIN-2 Coverage
- The rest of the world
- The US Government cant help, but....
- The Russian Space Agency is eager to cooperate.
- 2 Meter Geo Rectified imagery of anywhere
- More data coming, Earth has 500 TeraMeters2
- gt 30 Tera Bytes of Land at 2x2 Meter
- gt we need 3 of the land (Urban World the red
stuff)
25Demo Interface
26Grow UP and OUT
1 Terabyte DB
- Cluster
- a collection of nodes
- as easy to program and manage as a single node
1 billion transactions per day
27Clusters Have Advantages
- Clients and servers made from the same stuff
- Inexpensive
- Built with commodity components
- Fault tolerance
- Spare modules mask failures
- Modular growth
- Grow by adding small modules
- Unlimited growth no biggest one
28Billion Transactions per Day Project
- Built a 45-node Windows NT Cluster (with help
from Intel Compaq) gt 900 disks - All off-the-shelf parts
- Using SQL Server DTC distributed transactions
- DebitCredit Transaction
- Each node has 1/20 th of the DB
- Each node does 1/20 th of the work
- 15 of the transactions are distributed
29How Much Is 1Â Billion Transactions Per Day?
- 1 Btpd 11,574 tps (transactions per second)
700,000 tpm (transactions/minute) - ATT
- 185 million calls (peak day worldwide)
- Visa 20 M tpd
- 400 M customers
- 250,000 ATMs worldwide
- 7 billion transactions / year (cardcheque) in
1994
Millions of transactions per day
1,000.
100.
10.
Mtpd
1.
0.1
ATT
Visa
BofA
NYSE
1 Btpd
30Billion Transactions Per Day Hardware
- 45 nodes (Compaq Proliant)
- Clustered with 100 Mbps Switched Ethernet
- 140 cpu, 13 GB, 3 TB.
311.2 B tpd
- 1 B tpd ran for 24 hrs.
- Sized for 30 days
- Linear growth
- 5 micro-dollars per transaction
- Out-of-the-box software
- Off-the-shelf hardware
- AMAZING!
32Other Stunts
- 100 M Web Hits/day on one server
- (1,300 hits/sec, Web Mark HTML server)
- Email server (exchange)
- 50 GB database (up from 16GB, limit now 16TB)
- 50 k POP3 users (1.5 M msg/day)
- 64-bit addressing SQL Server
- SAP Failover
- Theme
- conventional stuff is easy
33Thesis Scaleable Servers
- Scaleable Servers
- Commodity hardware allows new applications
- New applications need huge servers
- Clients and servers are built of the same stuff
- Commodity software and
- Commodity hardware
- Servers should be able to
- Scale up (grow node by adding CPUs, disks,
networks) - Scale out (grow by adding nodes)
- Scale down (can start small)
- Key software technologies
- Objects, Transactions, Clusters, Parallelism
34ParallelismThe OTHER aspect of clusters
- Clusters of machines allow two kinds of
parallelism - Many little jobs online transaction processing
- TPC-A, B, C
- A few big jobs data search and analysis
- TPC-D, DSS, OLAP
- Both give automatic parallelism
35Kinds of Parallel Execution
Any
Any
Sequential
Sequential
Pipeline
Program
Program
Partition outputs split N ways inputs merge
M ways
Any
Any
Sequential
Sequential
Program
Program
36Data Rivers Split Merge Streams
N X M Data Streams
M Consumers
N producers
River
Producers add records to the river, Consumers
consume records from the river Purely sequential
programming. River does flow control and
buffering does partition and merge of data
records River Split/Merge in Gamma Exchange
operator in Volcano.
37Partitioned Execution
Spreads computation and IO among processors
Partitioned data gives
NATURAL parallelism
38N x M way Parallelism
N inputs, M outputs, no bottlenecks. Partitioned
Data Partitioned and Pipelined Data Flows
39Clusters (Plumbing)
- Single system image
- naming
- protection/security
- management/load balance
- Fault Tolerance
- Wolfpack
- Hot Pluggable hardware Software
40Windows NT clusters
- Key goals
- Easy to install, manage, program
- Reliable better than a single node
- Scaleable added parts add power
- Microsoft 60 vendors defining NT clusters
- Almost all big hardware and software vendors
involved - No special hardware needed - but it may help
- Enables
- Commodity fault-tolerance
- Commodity parallelism (data mining, virtual
reality) - Also great for workgroups!
- Initial two-node failover
- Beta testing since December96
- SAP, Microsoft, Oracle giving demos.
- File, print, Internet, mail, DB, other services
- Easy to manage
- Each node can be 4x (or more) SMP
- Next (NT5) Wolfpack is modest size cluster
- About 16 nodes (so 64 to 128 CPUs)
- No hard limit, algorithms designedto go further
41SQL Failover Using NT Clusters
- Each server owns half the database
- When one fails
- The other server takes over the shared disks
- Recovers the database and serves it
42So, Whats New?
- When slices cost 50k, you buy 10 or 20.
- When slices cost 5k you buy 100 or 200.
- Manageability, programmability, usability become
key issues (total cost of ownership). - PCs are MUCH easier to use and program
MPP Vicious Cycle No Customers!
Apps
CP/Commodity Virtuous Cycle Standards allow
progress and investment protection
Standard platform
Customers
43Thesis Scaleable Servers
- Scaleable Servers
- Commodity hardware allows new applications
- New applications need huge servers
- Clients and servers are built of the same stuff
- Commodity software and
- Commodity hardware
- Servers should be able to
- Scale up (grow node by adding CPUs, disks,
networks) - Scale out (grow by adding nodes)
- Scale down (can start small)
- Key software technologies
- Objects, Transactions, Clusters, Parallelism
44The BIG PictureComponents and transactions
- Software modules are objects
- Object Request Broker (a.k.a., Transaction
Processing Monitor) connects objects(clients to
servers) - Standard interfaces allow software plug-ins
- Transaction ties execution of a job into an
atomic unit all-or-nothing, durable, isolated
Object Request Broker
45ActiveX and COM
- COM is Microsoft model, engine inside OLE ALL
Microsoft software is based on COM (ActiveX) - CORBA OpenDoc is equivalent
- Heated debate over which is best
- Both share same key goals
- Encapsulation hide implementation
- Polymorphism generic operationskey to GUI and
reuse - Versioning allow upgrades
- Transparency local/remote
- Security invocation can be remote
- Shrink-wrap minimal inheritance
- Automation easy
- COM now managed by the Open Group
46Linking And EmbeddingObjects are data
modulestransactions are execution modules
- Link pointer to object somewhere else
- Think URL in Internet
- Embed bytesare here
- Objects may be active can callback to subscribers
47Commodity Software ComponentsInexpensive OS,
DBMSand plug-ins
- Recent TPC-C prices
- Oracle on DEC UNIX 30.4 k tpmC _at_ 305/tpmC
- Informix on DEC UNIX 13.6 k tpmC _at_ 277/tpmC
- DB2 on Solaris 6.4 ktpmC _at_ 200/tpmC
- SQLÂ Server on Compaq, Windows NT 7.3 ktpmC _at_
65/tpmC (using Web, no TP monitor!) - Oracle on Windows NT 3.1 ktpmC _at_ 198/tpmC
- Net Open solutionscan do even biggest jobs
thousands of online users per node of cluster - ActiveX, VBX, andJava plug-ins
- Spreadsheets, GeoQuery, FAX, voice, image
libraries, commodity component market
48Objects Meet DatabasesThe basis for universal
data servers, access, integration
- object-oriented (COM oriented) programming
interface to data - Breaks DBMS into components
- Anything can be a data source
- Optimization/navigation on top of other data
sources - A way to componentized a DBMS
- Makes an RDBMS and O-RDBMS (assumes optimizer
understands objects)
DBMS engine
49The Pattern Three Tier Computing
Presentation
- Clients do presentation, gather input
- Clients do some workflow (Xscript)
- Clients send high-level requests to ORB (Object
Request Broker) - ORB dispatches workflows and business objects --
proxies for client, orchestrate flows queues - Server-side workflow scripts call on distributed
business objects to execute task
workflow
Business Objects
Database
50The Three Tiers
Object Data server.
51Why Did Everyone Go To Three-Tier?
- Manageability
- Business rules must be with data
- Middleware operations tools
- Performance (scaleability)
- Server resources are precious
- ORB dispatches requests to server pools
- Technology Physics
- Put UI processing near user
- Put shared data processing near shared data
Presentation
workflow
Business Objects
Database
52Why Put Business Objects at Server?
53What Middleware Does ORB, TP Monitor, Workflow
Mgr, Web Server
- Registers transaction programs workflow and
business objects (DLLs) - Pre-allocates server pools
- Provides server execution environment
- Dynamically checks authority (request-level
security) - Does parameter binding
- Dispatches requests to servers
- parameter binding
- load balancing
- Provides Queues
- Operator interface
54Server Side Objects Easy Server-Side Execution
A Server
- Give simple execution environment
- Object gets
- start
- invoke
- shutdown
- Everything else is automatic
- Drag Drop Business Objects
Network
Receiver
Queue
Management
Connections
Context
Security
Configuration
Thread Pool
Service logic
Synchronization
Shared Data
55A new programming paradigm
- Develop object on the desktop
- Better yet download them from the Net
- Script work flows as method invocations
- All on desktop
- Then, move work flows and objects to server(s)
- Gives
- desktop development
- three-tier deployment
- Software Cyberbricks
56Transactions Coordinate Components (ACID)
- Transaction properties
- Atomic all or nothing
- Consistent old and new values
- Isolated automatic locking or versioning
- Durable once committed, effects survive
- Transactions are built into modern OSs
- MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC
57Transactions Objects
- Application requests transaction identifier (XID)
- XID flows with method invocations
- Object Managers join (enlist)in transaction
- Distributed Transaction Manager coordinates
commit/abort
58Transactions Coordinate Components (ACID)
- Programmers view bracket a collection of
actions - A simple failure model
- Only two outcomes
Begin() action action action
action Commit()
Begin() action action action Rollback()
Begin() action action action Rollback()
Fail !
Success!
Failure!
59Distributed Transactions Enable Huge Throughput
- Each node capable of 7 KtmpC (7,000 active
users!) - Can add nodes to cluster (to support 100,000
users) - Transactions coordinate nodes
- ORB / TP monitor spreads work among nodes
60Distributed Transactions Enable Huge DBs
- Distributed database technology spreads data
among nodes - Transaction processing technology manages nodes
61Thesis Scaleable Servers
- Scaleable Servers Built from Cyberbricks
- Allow new applications
- Servers should be able to
- Scale up, out, down
- Key software technologies
- Clusters (ties the hardware together)
- Parallelism (uses the independent cpus, stores,
wires - Objects (software CyberBricks)
- Transactions masks errors.
62Computer Industry Laws (Rules of thumb)
- Metcalfs law
- Moores first law
- Bells computer classes (7 price tiers)
- Bells platform evolution
- Bells platform economics
- Bills law
- Software economics
- Groves law
- Moores second law
- Is info-demand infinite?
- The death of Groschs law
63Metcalfs LawNetwork Utility Users2
- How many connections can it make?
- 1 user no utility
- 100,000 users a few contacts
- 1 million users many on Net
- 1 billion users everyone on Net
- That is why the Internet is so hot
- Exponential benefit
64Moores First Law
- XXX doubles every 18 months 60 increase per
year - Micro processor speeds
- Chip density
- Magnetic disk density
- Communications bandwidthWAN bandwidth
approaching LANs - Exponential growth
- The past does not matter
- 10x here, 10x there, soon youre talking REAL
change - PC costs decline faster than any other platform
- Volume and learning curves
- PCs will be the building bricks of all future
systems
65Bumps In The Moores Law Road
- DRAM
- 1988 United States anti-dumping
rules - 1993-1995 ?price flat
- Magnetic disk
- 1965-1989 10x/decade
- 1989-1996 4x/3year! 100X/decade
66Gordon Bells 1975 VAX Planning Model... He
Didnt Believe It!
System Price 5 x 3 x .04 x memory size/ 1.26
(t-1972) K
- 5x Memory is20 of cost3x DEC markup.04x
per byte - He didnt believethe projection500 machine
- He couldntcomprehendthe implications
67Gordon Bells ProcessingMemories, And Comm 100
Years
Sec. Mem.
Processing
Pri. Mem
Backbone
POTS(bps)
68Gordon Bells Seven Price Tiers
- 10 wrist watch computers
- 100 pocket/ palm computers
- 1,000 portable computers
- 10,000 personal computers (desktop)
- 100,000 departmental computers
(closet) - 1,000,000 site computers (glass house)
- 10,000,000 regional computers (glass
castle)
Super server costs more than 100,000Mainframe
costs more than 1 million Must be an array
of processors, disks, tapes, comm ports
69Bells Evolution Of Computer Classes
Technology enables two evolutionary paths 1.
constant performance, decreasing cost 2.
constant price, increasing performance
1.26 2x/3 yrs -- 10x/decade 1/1.26 .8 1.6
4x/3 yrs --100x/decade 1/1.6 .62
70Gordon Bells Platform Economics
- Traditional computers custom or semi-custom,
high-tech and high-touch - New computers high-tech and no-touch
100000
10000
Price (K)
1000
Volume (K)
Applicationprice
100
10
1
0.1
0.01
Mainframe
WS
Browser
Computer type
71Software Economics
Microsoft 9 billion
- An engineer costs about150,000/year
- RD gets 515of budget
- Need 3 million1 million revenue per
engineer
Profit 24
RD 16
SGA 34
Tax 13
Productand Service 13
Intel 16 billion
IBM 72 billion
Oracle 3 billion
Profit 15
Profit 6
RD 9
RD 8
Profit
22
Tax 7
SGA
11
Tax
SGA
12
PS 59
43
PS 47
PS 26
72Software Economics Bills Law
Fixed_
Cost
Price
Marginal _Cost
Units
- Bill Joys law (Sun) dont write software for
less than 100,000 platforms _at_10 million
engineering expense, 1,000 price - Bill Gates lawdont write software for less
than 1,000,000 platforms _at_10 engineering
expense, 100 price - Examples
- UNIX versus Windows NT 3,500 versus 500
- Oracle versus SQL-Server 100,000 versus 6,000
- No spreadsheet or presentation pack on
UNIX/VMS/... - Commoditization of base software and hardware
73Groves LawThe New Computer Industry
- Horizontal integrationis new structure
- Each layer picks best from lower layer
- Desktop (C/S) market
- 1991 50
- 1995 75
Example
Function
Operation
ATT
Integration
EDS
Applications
SAP
Middleware
Oracle
Baseware
Microsoft
Systems
Compaq
Intel Seagate
Silicon Oxide
74Moores Second Law
- The cost of fab linesdoubles every generation
(three years) - Money limit hard to imagine
- 10-billion line
- 20-billion line
- 40-billion line
- Physical limit
- Quantum effects at 0.25 micron now 0.05 micron
seems hard 12 years, three generations - Lithograph need Xray below 0.13 micron
75Constant Dollars Versus Constant Work
- Constant work
- One SuperServer can doall the worlds
computations
- Constant dollars
- The world spends 10 oninformation processing
- Computers are moving from5 penetration to 50
- 300 billion to 3 trillion
- We have the patenton the byte and algorithm
76Crossing The Chasm
New market
No product no customers
Product finds customers
Hard
Veryhard
Old market
Hard
Boring competitive slow growth
Customers find product
Old technology
New technology