Title: CyberBricks: The future of Database And Storage Engines Jim Gray http:research'Microsoft'comGray
1CyberBricksThe future of Database And Storage
Engines Jim Grayhttp//research.Microsoft.com/G
ray
2Outline
- What storage things are coming from Microsoft?
- TerraServer a 1 TB DB on the Web
- Storage Metrics Kaps, Maps, Gaps, Scans
- The future of storage ActiveDisks
3New Storage Software From Microsoft
- SQL Server 7.0
- Simplicity Auto-most-things
- Scalability on Win95 to Enterprise
- Data warehousing built-in OLAP, VLDB
- NT 5
- Better volume management (from Veritas)
- HSM architecture
- Intellimirror
- Active directory for transparency
4Thin Client SupportTSO comes to NT
- Lower Per-Client cost
- Huge centralized data stores.
Hydra Server
5Windows NT 5.0Intelli-Mirror
- Files and settings mirrored on client and server
- Great for mobile users
- Facilitates roaming
- Easy to replace PCs
- Optimizes network performance
- Means HUGE data stores
6Outline
- What storage things are coming from Microsoft?
- TerraServer a 1 TB DB on the Web
- Storage Metrics Kaps, Maps, Gaps, Scans
- The future of storage ActiveDisks
7Microsoft TerraServer Scaleup to Big Databases
- Build a 1 TB SQL Server database
- Data must be
- 1 TB
- Unencumbered
- Interesting to everyone everywhere
- And not offensive to anyone anywhere
- Loaded
- 1.5 M place names from Encarta World Atlas
- 3 M Sq Km from USGS (1 meter resolution)
- 1 M Sq Km from Russian Space agency (2 m)
- On the web (worlds largest atlas)
- Sell images with commerce server.
8Microsoft TerraServer Background
- Earth is 500 Tera-meters square
- USA is 10 tm2
- 100 TM2 land in 70ºN to 70ºS
- We have pictures of 6 of it
- 3 tsm from USGS
- 2 tsm from Russian Space Agency
- Compress 51 (JPEG) to 1.5 TB.
- Slice into 10 KB chunks
- Store chunks in DB
- Navigate with
- Encarta Atlas
- globe
- gazetteer
- StreetsPlus in the USA
- Someday
- multi-spectral image
- of everywhere
- once a day / hour
9USGS Digital Ortho Quads (DOQ)
- US Geologic Survey
- 4 Tera Bytes
- Most data not yet published
- Based on a CRADA
- Microsoft TerraServer makes data available.
10Russian Space Agency(SovInfomSputnik) SPIN-2
(Aerial Images is Worldwide Distributor)
- 1.5 Meter Geo Rectified imagery of (almost)
anywhere - Almost equal-area projection
- De-classified satellite photos (from 200 KM),
- More data coming (1 m)
- Selling imagery on Internet.
- Putting 2 tm2 onto Microsoft TerraServer.
SPIN-2
11Demo
http//www.TerraServer.Microsoft.com/
12Demo
- navigate by coverage map to White House
- Download image
- buy imagery from USGS
- navigate by name to Venice
- buy SPIN2 image Kodak photo
- Pop out to Expedia street map of Venice
- Mention that DB will double in next 18 months (2x
USGS, 2X SPIN2)
13Hardware
Map
Site
Server
Internet
Servers
100 Mbps
Ethernet Switch
Web Servers
Alpha
Enterprise Storage Array
STK
Server
9710
8400
DLT
Tape
8 x 440MHz
Library
Alpha
cpus
10 GB DRAM
1TB Database Server AlphaServer 8400 4x400. 10
GB RAM 324 StorageWorks disks 10 drive tape
library (STC Timber Wolf DLT7000 )
14The Microsoft TerraServer Hardware
- Compaq AlphaServer 8400
- 8x400Mhz Alpha cpus
- 10 GB DRAM
- 324 9.2 GB StorageWorks Disks
- 3 TB raw, 2.4 TB of RAID5
- STK 9710 tape robot (14 TB)
- WindowsNT 4 EE, SQL Server 7.0
15Software
Web Client
Internet InformationServer 4.0
ImageServer Active Server Pages
HTML
JavaViewer
The Internet
browser
MTS
Terra-ServerStored Procedures
Internet InfoServer 4.0
Internet InformationServer 4.0
SQL Server 7
MicrosoftSite Server EE
Microsoft AutomapActiveX Server
Image DeliveryApplication
SQL Server7
Automap Server
TerraServer DB
Image Provider Site(s)
16System Management Maintenance
- Backup and Recovery
- STK 9710 Tape robot
- Legato NetWorker
- SQL Server 7 Backup Restore
- Clocked at 80 MBps (peak)( 200 GB/hr)
- SQL Server Enterprise Mgr
- DBA Maintenance
- SQL Performance Monitor
17Microsoft TerraServer File Group Layout
- Convert 324 disks to 28 RAID5 sets plus 28 spare
drives - Make 4 WinNT volumes (RAID 50) 595 GB per
volume - Build 30 20GB files on each volume
- DB is File Group of 120 files
18Image Delivery and LoadIncremental load of 4
more TB in next 18 months
DLTTape
tar
\DropN
LoadMgrDB
DoJob
Wait 4 Load
DLTTape
NTBackup
...
Cutting Machines
LoadMgr
10 ImgCutter 20 Partition 30 ThumbImg40
BrowseImg 45 JumpImg 50 TileImg 55 Meta
Data 60 Tile Meta 70 Img Meta 80 Update Place
ImgCutter
100mbitEtherSwitch
\DropN \Images
TerraServer
Enterprise Storage Array
STKDLTTape Library
AlphaServer8400
108 9.1 GB Drives
108 9.1 GB Drives
108 9.1 GB Drives
19Technical ChallengeKey idea
- Problem Geo-Spatial Search without geo-spatial
access methods.(just standard SQL Server) - Solution
- Geo-spatial search key
- Divide earth into rectangles of 1/48th degree
longitude (X) by 1/96th degree latitude (Y) - Z-transform X Y into single Z value, build
B-tree on Z - Adjacent images stored next to each other
- Search Method
- Latitude and Longitude gt X, Y, then Z
- Select on matching Z value
20Some Tera-Byte Databases
Kilo Mega Giga Tera Peta Exa Zetta Yotta
- The Web 1 TB of HTML
- TerraServer 1 TB of images
- Several other 1 TB (file) servers
- Hotmail 7 TB of email
- Sloan Digital Sky Survey 40 TB raw, 2 TB
cooked - EOS/DIS (picture of planet each week)
- 15 PB by 2007
- Federal Clearing house images of checks
- 15 PB by 2006 (7 year history)
- Nuclear Stockpile Stewardship Program
- 10 Exabytes (???!!)
21Info Capture
- You can record everything you see or hear or
read. - What would you do with it?
- How would you organize analyze it?
Video 8 PB per lifetime (10GBph) Audio 30 TB
(10KBps) Read or write 8 GB (words) See
http//www.lesk.com/mlesk/ksg97/ksg.html
22Kilo Mega Giga Tera Peta Exa Zetta Yotta
A letter
A novel
A Movie
Library of Congress (text)
LoC (image)
LoC (sound cinima)
All Photos
All Disks
All Tapes
All Information!
23Michael Lesks Points www.lesk.com/mlesk/ksg97/ks
g.html
- Soon everything can be recorded and kept
- Most data will never be seen by humans
- Precious Resource Human attention
Auto-Summarization Auto-Searchwill be a key
enabling technology.
24Outline
- What storage things are coming from Microsoft?
- TerraServer a 1 TB DB on the Web
- Storage Metrics Kaps, Maps, Gaps, Scans
- The future of storage ActiveDisks
25Storage Latency How Far Away is the Data?
9
Tape /Optical
10
Robot
6
Disk
10
Memory
100
10
On Board Cache
On Chip Cache
2
Registers
1
26DataFlow ProgrammingPrefetch Postwrite Hide
Latency
Can't wait for the data to arrive (2,000
years!) Need a memory that gets the data in
advance ( 100MB/S) Solution Pipeline
data to/from the processor Pipe data from
source (tape, disc, ram...) to cpu cache
27MetaMessage Technology Ratios Are Important
- If everything gets fastercheaper at the same
rate THEN nothing really changes. - Things getting MUCH BETTER
- communication speed cost 1,000x
- processor speed cost 100x
- storage size cost 100x
- Things staying about the same
- speed of light (more or less constant)
- people (10x more expensive)
- storage speed (only 10x better)
28Todays Storage Hierarchy Speed Capacity vs
Cost Tradeoffs
29Storage Ratios Changed in Last 20 Years
- MediaPrice 4000X, Bandwidth 10X, Access/s 10X
- DRAMDISK /MB 1001 ? 251
- TAPE DISK /GB 1001 ? 51
30Storage Ratios Changed
- DRAM/disk media price ratio
- 1970-1990 1001
- 1990-1995 101
- 1995-1997 501
- today .15pMB disk 5 pMB dram
- 4,000x lower media price
- Capacity 100X, Bandwidth 10X, Access/s 10X
- DRAMDISK /MB 1001 ? 251
- TAPE DISK /GB 1001 ? 51
31Disk Access Time
- Access time SeekTime 6 ms 5/y
RotateTime 3 ms 5/y
ReadTime 1 ms 25/y - Other useful facts
- Power rises more than size3 (so small is indeed
beautiful) - Small devices are more rugged
- Small devices can use plastics (forces are much
smaller)e.g. bugs fall without breaking anything
32Standard Storage Metrics
- Capacity
- RAM MB and /MB today at 100MB 1/MB
- Disk GB and /GB today at 10GB and 50/GB
- Tape TB and /TB today at .1TB and 10/GB
(nearline) - Access time (latency)
- RAM 100 ns
- Disk 10 ms
- Tape 30 second pick, 30 second position
- Transfer rate
- RAM 1 GB/s
- Disk 5 MB/s - - - Arrays can go to 1GB/s
- Tape 3 MB/s - - - not clear that striping
works
33New Storage Metrics Kaps, Maps, Gaps, SCANs
- Kaps How many kilobyte objects served per second
- the file server, transaction procssing metric
- Maps How many megabyte objects served per second
- the Mosaic metric
- Gaps How many gigabyte objects served per hour
- the video EOSDIS metric
- SCANS How many scans of all the data per day
- the data mining and utility metric
- And /Kaps, /Maps, /Gaps, /SCAN
34How To Get Lots of Maps, Gaps, SCANS
- parallelism use many little devices in parallel
At 10 MB/s 1.2 days to scan
1,000 x parallel 100 seconds/scan
Parallelism divide a big problem into many
smaller ones to be solved in parallel.
35Tape Optical Beware of the Media Myth
Optical is cheap 200 /platter
2 GB/platter gt 100/GB (5x
cheaper than disc) Tape is cheap 100 /tape
40 GB/tape gt 2.5 /GB (100x
cheaper than disc).
36Tape Optical Reality Media is 10 of System
Cost
Tape needs a robot (10 k ... 3 m ) 10 ...
1000 tapes (at 40GB each) gt 20/GB ... 200/GB
(1x10x cheaper than disc) Optical needs a
robot (50 k ) 100 platters 200GB ( TODAY )
gt 250 /GB ( more expensive than disc )
Robots have poor access times Not good for
Library of Congress (25TB) Data motel data
checks in but it never checks out!
37The Access Time Myth
- The Myth seek or pick time dominates
- The reality (1) Queuing dominates
- (2) Transfer dominates BLOBs
- (3) Disk seeks often short
- Implication many cheap servers better than
one fast expensive server - shorter queues
- parallel transfer
- lower cost/access and cost/byte
- This is obvious for disk tape arrays
38My Solution to Tertiary StorageTape Farms, Not
Mainframe Silos
100 robots
1M
40TB
25/GB
3K Maps
10K robot
1.5K Gaps
10 tapes
2 Scans
400 GB
6 MB/s
25/GB
Scan in 12 hours. many independent tape
robots (like a disc farm)
30 Maps
15 Gaps 2 Scans
39The Metrics Disk and Tape Farms Win
Data Motel Data checks in, but it never checks
out
GB/K
1
,
000
,
000
Kaps
100
,
000
Maps
Scans
10
,
000
SCANS/Day
1
,
000
100
10
1
0.1
0.01
1000 x
D
i
sc Farm
100x DLT
Tape Farm
STK Tape Robot
6,000 tapes, 8 readers
40Cost Per Access (3-year)
540
,000
500K
67
,000
100,000
Kaps/
Maps/
Gaps/
100
68
SCANS/k
23
120
10
4.3
7
7
100
2
1.5
1
0.2
0.1
1000 x Disc Farm
STK Tape Robot
100x DLT Tape Farm
6,000 tapes, 16
readers
41Storage Ratios Impact on Software
- Gone from 512 B pages to 8192 B pages (will go
to 64 KB pages in 2006) - Treat disks as tape
- Increased use of sequential access
- Use disks for backup copies
- Use tape for
- VERY COLD data or
- Offsite Archive
- Data interchange
42Summary
- Storage accesses are the bottleneck
- Accesses are getting larger (Maps, Gaps, SCANS)
- Capacity and cost are improvingBUT
- Latencies and bandwidth are not improving muchSO
- Use parallel access (disk and tape farms)
- Use sequential access (scans)
43The Memory Hierarchy
- Measuring Modeling Sequential IO
- Where is the bottleneck?
- How does it scale with
- SMP, RAID, new interconnects
Goals balanced bottlenecks Low overhead Scale
many processors (10s) Scale many disks (100s)
Memory
App address space
Mem bus
File cache
Controller
Adapter
SCSI
PCI
44Sequential IO your mileage will vary
- Measuring hardware Software
- Looking for software fixes..
- Aiming for out of the box 1/2 power point
50 of peak powerout of the box
- 40 MB/sec Advertised UW SCSI
- 35r-23w MB/sec Actual disk transfer
- 29r-17w MB/sec 64 KB request (NTFS)
- 9 MB/sec Single disk media
- 3 MB/sec 2 KB request (SQL Server)
45PAP (peak advertised Performance) vs RAP (real
application performance)
- Goal RAP PAP / 2 (the half-power point)
System Bus
422 MBps
40 MBps
7.2 MB/s
7.2 MB/s
Application
10-15 MBps
Data
7.2 MB/s
File System
SCSI
Buffers
Disk
133 MBps
PCI
7.2 MB/s
46The Best Case Temp File, NO IO
- Temp file Read / Write File System Cache
- Program uses small (in cpu cache) buffer.
- So, write/read time is bus move time (3x better
than copy) - Paradox fastest way to move data is to write
then read it. - This hardware islimited to 150 MBpsper
processor
47Bottleneck Analysis
Theoretical Bus Bandwidth 422MBps 66 Mhz x 64
bits
MemoryRead/Write 150 MBps
MemCopy 50 MBps
Disk R/W 9MBps
483 Stripes and Your Out!
- CPU time goes down with request size
- Ftdisk (striping is cheap)
- 3 disks can saturate adapter
- Similar story with UltraWide
49Parallel SCSI Busses Help
- Second SCSI bus nearly doubles read and wce
throughput - Write needs deeper buffers
- Experiment is unbuffered(3-deep WCE)
?
2 x
50File System Buffering Stripes(UltraWide Drives)
- FS buffering helps small reads
- FS buffered writes peak at 12MBps
- 3-deep async helps
- Write peaks at 20 MBps
- Read peaks at 30 MBps
51PAP vs RAP
- Reads are easy, writes are hard
- Async write can match WCE.
422 MBps
142
MBps
SCSI
Disks
Application
Data
40 MBps
10-15 MBps
31 MBps
File System
9 MBps
133 MBps
72 MBps
SCSI
PCI
52Bottleneck Analysis
- NTFS Read/Write 9 disk, 2 SCSI bus, 1 PCI 65
MBps Unbuffered read - 43 MBps Unbuffered write
- 40 MBps Buffered read
- 35 MBps Buffered write
-
Adapter 30 MBps
Memory Read/Write 150 MBps
PCI 70 MBps
70 MBps
Adapter
53Peak Thrughput on Intel/NT
- NTFS Read/Write 24 disk, 4 SCSI, 2 PCI (64
bit) 190 MBps Unbuffered read - 95 MBps Unbuffered write
- so 0.8 TB/hr read, 0.4 TB/hr write
- on a 25k server.
-
190 MBps
54Penny Sort Ground Ruleshttp//research.microsoft.
com/barc/SortBenchmark
- How much can you sort for a penny.
- Hardware and Software cost
- Depreciated over 3 years
- 1M system gets about 1 second,
- 1K system gets about 1,000 seconds.
- Time (seconds) SystemPrice () / 946,080
- Input and output are disk resident
- Input is
- 100-byte records (random data)
- key is first 10 bytes.
- Must create output file and fill with sorted
version of input file. - Daytona (product) and Indy (special) categories
55PennySort
- Hardware
- 266 Mhz Intel PPro
- 64 MB SDRAM (10ns)
- Dual Fujitsu DMA 3.2GB EIDE
- Software
- NT workstation 4.3
- NT 5 sort
- Performance
- sort 15 M 100-byte records (1.5 GB)
- Disk to disk
- elapsed time 820 sec
- cpu time 404 sec
56Cluster Sort Conceptual Model
- Multiple Data Sources
- Multiple Data Destinations
- Multiple nodes
- Disks -gt Sockets -gt Disk -gt Disk
A
AAA BBB CCC
B
C
AAA BBB CCC
AAA BBB CCC
57Cluster Install Execute
- If this is to be used by others,
- it must be
- Easy to install
- Easy to execute
-
- Installations of distributed systems take
- time and can be tedious. (AM2, GluGuard)
- Parallel Remote execution is
- non-trivial. (GLUnix, LSF)
- How do we keep this simple and built-in to
NTClusterSort ?
58Remote Install
- Add Registry entry to each remote node.
RegConnectRegistry() RegCreateKeyEx()
59Cluster Execution
- Setup
- MULTI_QI struct
- COSERVERINFO struct
- Retrieve remote object handle
- from MULTI_QI struct
60Outline
- What storage things are coming from Microsoft?
- TerraServer a 1 TB DB on the Web
- Storage Metrics Kaps, Maps, Gaps, Scans
- The future of storage ActiveDisks
61Crazy Disk Ideas
- Disk Farm on a card surface mount disks
- Disk (magnetic store) on a chip (micro machines
in Silicon) - NT and BackOffice in the disk controller (a
processor with 100MB dram)
ASIC
62Remember Your Roots
63Year 2002 Disks
- Big disk (10 /GB)
- 3
- 100 GB
- 150 kaps (k accesses per second)
- 20 MBps sequential
- Small disk (20 /GB)
- 3
- 4 GB
- 100 kaps
- 10 MBps sequential
- Both running Windows NT 7.0?(see below for why)
64The Disk Farm On a Card
- The 1 TB disc card
- An array of discs
- Can be used as
- 100 discs
- 1 striped disc
- 10 Fault Tolerant discs
- ....etc
- LOTS of accesses/second
- bandwidth
14"
Life is cheap, its the accessories that cost
ya. Processors are cheap, its the peripherals
that cost ya (a 10k disc card).
65Put Everything in Future (Disk)
Controllers(its not if, its
when?)AcknowledgementsDave Patterson
explained this to me a year ago
Kim Keeton Erik Riedel
Catharine Van Ingen
Helped me sharpen these arguments
66Technology Drivers Disks
Kilo Mega Giga Tera Peta Exa Zetta Yotta
- Disks on track
- 100x in 10 years 2 TB 3.5 drive
- Shrink to 1 is 200GB
- Disk replaces tape?
- Disk is super computer!
67Data Gravity Processing Moves to
Transducers(moves to data sources sinks)
- Move Processing to data sources
- Move to where the power (and sheet metal) is
- Processor in
- Modem
- Display
- Microphones (speech recognition) cameras
(vision) - Storage Data storage and analysis
68Its Already True of PrintersPeripheral
CyberBrick
- You buy a printer
- You get a
- several network interfaces
- A Postscript engine
- cpu,
- memory,
- software,
- a spooler (soon)
- and a print engine.
69Functionally Specialized Cards
P mips processor
Today P50 mips M 2 MB
M MB DRAM
In a few years P 200 mips M 64 MB
ASIC
ASIC
70All Device Controllers will be Cray 1s
- TODAY
- Disk controller is 10 mips risc engine with 2MB
DRAM - NIC is similar power
- SOON
- Will become 100 mips systems with 100 MB DRAM.
- They are nodes in a federation (can run Oracle
on NT in disk controller). - Advantages
- Uniform programming model
- Great tools
- Security
- economics (CyberBricks)
- Move computation to data (minimize traffic)
Central Processor Memory
Tera Byte Backplane
71Basic Argument for x-Disks
- Future disk controller is a super-computer.
- 1 bips processor
- 128 MB dram
- 100 GB disk plus one arm
- Connects to SAN via high-level protocols
- RPC, HTTP, DCOM, Kerberos, Directory
Services,. - Commands are RPCs
- Management, security,.
- Services file/web/db/ requests
- Managed by general-purpose OS with good dev
environment - Apps in disk saves data movement
- need programming environment in controller
72The Slippery Slope
Nothing Sector Server
- If you add function to server
- Then you add more function to server
- Function gravitates to data.
Something Fixed App Server
Everything App Server
73Why Not a Sector Server?(lets get physical!)
- Good idea, thats what we have today.
- But
- cache added for performance
- Sector remap added for fault tolerance
- error reporting and diagnostics added
- SCSI commends (reserve,.. are growing)
- Sharing problematic (space mgmt, security,)
- Slipping down the slope to a 2-D block server
74Why Not a 1-D Block Server?Put A LITTLE on the
Disk Server
- Tried and true design
- HSC - VAX cluster
- EMC
- IBM Sysplex (3980?)
- But look inside
- Has a cache
- Has space management
- Has error reporting management
- Has RAID 0, 1, 2, 3, 4, 5, 10, 50,
- Has locking
- Has remote replication
- Has an OS
- Security is problematic
- Low-level interface moves too many bytes
75Why Not a 2-D Block Server?Put A LITTLE on the
Disk Server
- Tried and true design
- Cedar -gt NFS
- file server, cache, space,..
- Open file is many fewer msgs
- Grows to have
- Directories Naming
- Authentication access control
- RAID 0, 1, 2, 3, 4, 5, 10, 50,
- Locking
- Backup/restore/admin
- Cooperative caching with client
- File Servers are a BIG hit NetWare
- SNAP! is my favorite today
76Why Not a File Server?Put a Little on the Disk
Server
- Tried and true design
- Auspex, NetApp, ...
- Netware
- Yes, but look at NetWare
- File interface gives you app invocation interface
- Became an app server
- Mail, DB, Web,.
- Netware had a primitive OS
- Hard to program, so optimized wrong thing
77Why Not Everything?Allow Everything on Disk
Server(thin clients)
- Tried and true design
- Mainframes, Minis, ...
- Web servers,
- Encapsulates data
- Minimizes data moves
- Scaleable
- It is where everyone ends up.
- All the arguments against are short-term.
78The Slippery Slope
Nothing Sector Server
- If you add function to server
- Then you add more function to server
- Function gravitates to data.
Something Fixed App Server
Everything App Server
79Disk Node
- has magnetic storage (100 GB?)
- has processor DRAM
- has SAN attachment
- has execution environment
Applications
Services
DBMS
File System
RPC, ...
SAN driver
Disk driver
OS Kernel
80Technology Drivers System on a Chip
- Integrate Processing with memory on chip
- chip is 75 memory now
- 1MB cache gtgt 1960 supercomputers
- 256 Mb memory chip is 32 MB!
- IRAM, CRAM, PIM, projects abound
- Integrate Networking with processing on chip
- system bus is a kind of network
- ATM, FiberChannel, Ethernet,.. Logic on chip.
- Direct IO (no intermediate bus)
- Functionally specialized cards shrink to a chip.
81How Do They Talk to Each Other?
- Each node has an OS
- Each node has local resources A federation.
- Each node does not completely trust the others.
- Nodes use RPC to talk to each other
- CORBA? DCOM? IIOP? RMI?
- One or all of the above.
- Huge leverage in high-level interfaces.
- Same old distributed system story.
Applications
Applications
datagrams
datagrams
streams
RPC
?
streams
RPC
?
h
VIAL/VIPL
Wire(s)
82Technology Drivers What if Networking Was as
Cheap As Disk IO?
- Disk
- Unix/NT 8 cpu _at_ 40MBps
- TCP/IP
- Unix/NT 100 cpu _at_ 40MBps
83Technology Drivers The Promise of SAN/VIA10x
in 2 years http//www.ViArch.org/
- Today
- wires are 10 MBps (100 Mbps Ethernet)
- 20 MBps tcp/ip saturates 2 cpus
- round-trip latency is 300 us
- In the lab
- Wires are 10x faster Myrinet, Gbps Ethernet,
ServerNet, - Fast user-level communication
- tcp/ip 100 MBps 10 of each processor
- round-trip latency is 15 us
84SAN Standard Interconnect
Gbps Ethernet 110 MBps
- LAN faster than memory bus?
- 1 GBps links in lab.
- 100 port cost soon
- Port is computer
PCI 70 MBps
UW Scsi 40 MBps
FW scsi 20 MBps
scsi 5 MBps
85Technology Drivers GBps Ethernet replaces SCSI
- Why I love SCSI
- Its fast (30MBps (ultra) to 100 MBps (ultra3))
- The protocol uses little processor power
- Why I hate SCSI
- Wires must be short
- Cables are pricey
- pins bend
86Technology DriversPlug Play Software
- RPC is standardizing (DCOM, IIOP, HTTP)
- Gives huge TOOL LEVERAGE
- Solves the hard problems for you
- naming,
- security,
- directory service,
- operations,...
- Commoditized programming environments
- FreeBSD, Linix, Solaris, tools
- NetWare tools
- WinCE, WinNT, tools
- JavaOS tools
- Apps gravitate to data.
- General purpose OS on controller runs apps.
87Basic Argument for x-Disks
- Future disk controller is a super-computer.
- 1 bips processor
- 128 MB dram
- 100 GB disk plus one arm
- Connects to SAN via high-level protocols
- RPC, HTTP, DCOM, Kerberos, Directory
Services,. - Commands are RPCs
- management, security,.
- Services file/web/db/ requests
- Managed by general-purpose OS with good dev
environment - Move apps to disk to save data movement
- need programming environment in controller
88Outline
- What storage things are coming from Microsoft?
- TerraServer a 1 TB DB on the Web
- Storage Metrics Kaps, Maps, Gaps, Scans
- The future of storage ActiveDisks
- Papers and Talks at http//research.Microsoft.com
/Gray