Scaleable WindowsNT? Jim Gray Microsoft Research - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Scaleable WindowsNT? Jim Gray Microsoft Research

Description:

Scaleable WindowsNT? Jim Gray Microsoft Research Gray_at_Microsoft.com http://research.Microsoft.com/~Gray Outline What is Scalability? Why does Microsoft care about ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 49

Provided by: researchM6

Category:

more less

Transcript and Presenter's Notes

Title: Scaleable WindowsNT? Jim Gray Microsoft Research

1
Scaleable WindowsNT?

Jim GrayMicrosoft Research Gray_at_Microsoft.comht
tp//research.Microsoft.com/Gray

2
Outline

What is Scalability?
Why does Microsoft care about ScaleUp
Current ScaleUp Status?
NT5 SQL7 Exchange

3
Scale Up and Scale Out
Grow Up with SMP 4xP6 is now standard Grow Out
with Cluster Cluster has inexpensive parts
Cluster of PCs
4
Billions Of Clients

Every device will be intelligent
Doors, rooms, cars
Computing will be ubiquitous

5
Billions Of ClientsNeed Millions Of Servers

All clients networked to servers
May be nomadicor on-demand
Fast clients wantfaster servers
Servers provide
Shared Data
Control
Coordination
Communication

Clients
Mobileclients
Fixedclients
Servers
Server
Super server
6
ThesisMany little beat few big
1 million
100 K
10 K
Pico Processor
Micro
Nano
10 pico-second ram
1 MB
Mini
Mainframe
10
0

MB
1
0 GB
1
TB
1
00 TB
1.8"
2.5"
3.5"
5.25"
1 M SPECmarks, 1TFLOP 106 clocks to bulk
ram Event-horizon on chip VM reincarnated Multi
program cache, On-Chip SMP
9"
14"

Smoking, hairy golf ball
How to connect the many little parts?
How to program the many little parts?
Fault tolerance?

7
Outline

What is Scalability
Why does Microsoft care about ScaleUp
Current ScaleUp Status?
NT5 SQL7 Exchange

8
Scalability
100 millionweb hits
1 billion transactions

Scale up to large SMP nodes
Scale out to clusters of SMP nodes

1.8 million mail messages
4 terabytes of data
9
Commercial NT Clusters

16-node Tandem Cluster
64 cpus
2 TB of disk
Decision support
45-node Compaq Cluster
140 cpus
14 GB DRAM
4 TB RAID disk
OLTP (Debit Credit)
1 B tpd (14 k tps)

10
Tandem Oracle/NT

27,383 tpmC
71.50 /tpmC
4 x 6 cpus
384 disks2.7 TB

11
24 cpu, 384 disks (2.7TB)
12
Billion Transactions per Day Project

Built a 45-node Windows NT Cluster (with help
from Intel Compaq) gt 900 disks
All off-the-shelf parts
Using SQL Server DTC distributed transactions
DebitCredit Transaction
Each node has 1/20 th of the DB
Each node does 1/20 th of the work
15 of the transactions are distributed

13
Billion Transactions Per Day Hardware

45 nodes (Compaq Proliant)
Clustered with 100 Mbps Switched Ethernet
140 cpu, 13 GB, 3 TB.

14
How Much Is 1 Billion Tpd?

1 billion tpd 11,574 tps 700,000 tpm
(transactions/minute)
ATT
185 million calls per peak day (worldwide)
Visa 20 million tpd
400 million customers
250K ATMs worldwide
7 billion transactions (cardcheque) in 1994
New York Stock Exchange
600,000 tpd
Bank of America
20 million tpd checks cleared (more than any
other bank)
1.4 million tpd ATM transactions
Worldwide Airlines Reservations 250 Mtpd

15
Infinite, Ubiquitous ScalingRedefining the rules
Per Sec Per Min Per
Day 10K TPC 166 10,000
14,400,000 1 BTPD 11,574 694,444
1,000,000,000 1.4 BTPD 16,204 972,222
1,400,000,000
IIS
MTS
All ShippingProducts!
COM / ActiveX
16
Microsoft.com 150x4 nodes
(3)
17
NCSA Super Cluster
http//access.ncsa.uiuc.edu/CoverStories/SuperClus
ter/super.html

National Center for Supercomputing
ApplicationsUniversity of Illinois _at_ Urbana
512 Pentium II cpus, 2,096 disks, SAN
Compaq HP Myricom WindowsNT
A Super Computer for 3M
Classic Fortran/MPI programming
DCOM programming model

18
TPC C Improved Fast(250/year!)
40 hardware, 100 software, 100 PC Technology
19
Windows NT Versus UNIX
20
Economy Of Scale
21
Microsoft TerraServer Scaleup to Big Databases

Build a 1 TB SQL Server database
Data must be
1 TB
Unencumbered
Interesting to everyone everywhere
And not offensive to anyone anywhere
Loaded
1.5 M place names from Encarta World Atlas
3 M Sq Km from USGS (1 meter resolution)
1 M Sq Km from Russian Space agency (2 m)
On the web (worlds largest atlas)
Sell images with commerce server.

22
Microsoft TerraServer Background

Earth is 500 Tera-meters square
USA is 10 tm2
100 TM2 land in 70ºN to 70ºS
We have pictures of 6 of it
3 tsm from USGS
2 tsm from Russian Space Agency
Compress 51 (JPEG) to 1.5 TB.
Slice into 10 KB chunks
Store chunks in DB
Navigate with
Encarta Atlas
globe
gazetteer
StreetsPlus in the USA

Someday
multi-spectral image
of everywhere
once a day / hour

23
Demo

navigate by coverage map to White House
Download image
buy imagery from USGS
navigate by name to Venice
buy SPIN2 image Kodak photo
Pop out to Expedia street map of Venice
Mention that DB will double in next 18 months (2x
USGS, 2X SPIN2)

24
The Microsoft TerraServer Hardware

Compaq AlphaServer 8400
8x400Mhz Alpha cpus
10 GB DRAM
324 9.2 GB StorageWorks Disks
3 TB raw, 2.4 TB of RAID5
STK 9710 tape robot (4 TB)
WindowsNT 4 EE, SQL Server 7.0

25
Software
Web Client
Internet InformationServer 4.0
ImageServer Active Server Pages
HTML
JavaViewer
The Internet
browser

MTS
Terra-ServerStored Procedures
Internet InfoServer 4.0
Internet InformationServer 4.0
SQL Server 7
MicrosoftSite Server EE
Microsoft AutomapActiveX Server
Image DeliveryApplication
SQL Server7
Automap Server
TerraServer DB
Image Provider Site(s)
26
Image Delivery and LoadIncremental load of 4
more TB in next 18 months
DLTTape
tar
\DropN
LoadMgrDB
DoJob
Wait 4 Load
DLTTape
NTBackup
...
Cutting Machines
LoadMgr
10 ImgCutter 20 Partition 30 ThumbImg40
BrowseImg 45 JumpImg 50 TileImg 55 Meta
Data 60 Tile Meta 70 Img Meta 80 Update Place
ImgCutter
100mbitEtherSwitch
\DropN \Images
TerraServer
Enterprise Storage Array
STKDLTTape Library
AlphaServer8400
108 9.1 GB Drives
108 9.1 GB Drives
108 9.1 GB Drives
27
TerraServer A Real World Example

Largest DB on the Web
1.3TB
99.95 uptime since July 1
No downtime, period, in August
70 of downtime for SQL software upgrades

28
NT Clusters (Wolfpack)

Scale DOWN to PDA WindowsCE
Scale UP an SMP TerraServer
Scale OUT with a cluster of machines
Single-system image
Naming
Protection/security
Management/load balance
Fault tolerance
Wolfpack
Hot pluggable hardware software

29
Symmetric Virtual Server Failover Example
Server 1
Server 2
Web site
Web site
Database
Database
Web site files
Web site files
Database files
Database files
30
Windows NT 5 (scalability features)

Better SMP support
Clusters
16x packs (fault tolerant clusters)
100x mobs arrays for manageability
SAN/VIA support
64 bit addressing for data
Apps like SQL, Oracle, will use it for data
64 bit API to NT comes later (in lab now).
Remote management (scripting and DCOM)
Active Directory
Veritas volume manager
Many 3rd party HSMs
Batch support

31
Microsoft SQL Server 7.0

Fixes the famous performance bugs
dynamic record locking
online backup, quick recovery.
64 bit addressing buffer pool
SMP parallelism and better SMP support
Built in OLAP (cubes and MOLAP)
Scale down to Win9x
Improved management interfaces
Data transform services (for warehouses)

32
Outline

What is Scalability
Why does Microsoft care about ScaleUp
Current ScaleUp Status?
NT5 SQL7

33
end

Other slides would be interesting, but...

34
Interesting other slidesNo time for them but...

How much information is there?
IO bandwidth in the Intel world
Intelligent disks
SAN/VIA
NT Cluster Sort

35
Some Tera-Byte Databases
Kilo Mega Giga Tera Peta Exa Zetta Yotta

The Web 1 TB of HTML
TerraServer 1 TB of images
Several other 1 TB (file) servers
Hotmail 7 TB of email
Sloan Digital Sky Survey 40 TB raw, 2 TB
cooked
EOS/DIS (picture of planet each week)
15 PB by 2007
Federal Clearing house images of checks
15 PB by 2006 (7 year history)
Nuclear Stockpile Stewardship Program
10 Exabytes (???!!)

36
Info Capture

You can record everything you see or hear or
read.
What would you do with it?
How would you organize analyze it?

Video 8 PB per lifetime (10GBph) Audio 30 TB
(10KBps) Read or write 8 GB (words) See
http//www.lesk.com/mlesk/ksg97/ksg.html
37
Michael Lesks Points www.lesk.com/mlesk/ksg97/ks
g.html

Soon everything can be recorded and kept
Most data will never be seen by humans
Precious Resource Human attention
Auto-Summarization Auto-Searchwill be a key
enabling technology.

38
PAP (peak advertised Performance) vs RAP (real
application performance)

Goal RAP PAP / 2 (the half-power point)

System Bus
422 MBps
40 MBps
7.2 MB/s
7.2 MB/s
Application
10-15 MBps
Data
7.2 MB/s
File System
SCSI
Buffers
Disk
133 MBps
PCI
7.2 MB/s
39
PAP vs RAP

Reads are easy, writes are hard
Async write can match WCE.

422 MBps
142
MBps
SCSI
Disks
Application
Data
40 MBps
10-15 MBps
31 MBps
File System
9 MBps

133 MBps
72 MBps
SCSI
PCI
40
Bottleneck Analysis

NTFS Read/Write 12 disk, 4 SCSI, 2 PCI (not
measured, we had only one PCI bus available, 2nd
one was internal)
120 MBps Unbuffered read
80 MBps Unbuffered write
40 MBps Buffered read
35 MBps Buffered write

120 MBps
41
Year 2002 Disks