Long Term Storage Trends and You - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Long Term Storage Trends and You

Description:

Long Term Storage Trends and You Jim Gray Microsoft Research 28 Sept 2006 storage bricks 200x Illiac Disk: 1968 Minoan Phaistos Disk:1700 BC – PowerPoint PPT presentation

Number of Views:220
Avg rating:3.0/5.0
Slides: 47
Provided by: JimG171
Category:

less

Transcript and Presenter's Notes

Title: Long Term Storage Trends and You


1
Long Term Storage Trends and You
  • Jim GrayMicrosoft Research
  • 28 Sept 2006

storage bricks 200x
Illiac Disk 1968
Minoan Phaistos Disk1700 BC About 1KB No one can
read it
2
The Abstract
  • We are headed for a world of 10TB disk drives,
    64GB flash cards, and a massive main memories.
    This talk begins with an exploration of these
    storage trends and how they impact storage heat
  • everything has to get colder,
  • utilities have to be redesigned to deal with scan
    times measured in days, and
  • massive replication is needed to mask failures.
  • I assume we all agree that "tape is dead", so I
    am robbed of that lunatic idea, but I am still
    left with two crazy ideas
  • smart disks and
  • the death of SAN.
  • These contrarian ideas are related of course.
  • The second half of the talk discusses the tape
    postmortem and these two crazy ideas.

3
The Reality
  • This is an update of a 6-year old talk
  • Rules of Thumb in Data Engineering
  • Rules of Thumb in Data Engineering, pdf,
    MSR-TR-99-100, 1999. Proc ICDE 2000,
  • talk.
  • In light of 6 years change progress.
  • brief note on some recent studies.

4
Whats New / Surprising
  • Not a big surprise just amazing!
  • exponential growth in capacity
  • latency lags bandwidth
  • 5 minute rule is 30 minute rule
  • FLASH is coming
  • low end storage (GBs now 100 GBs soon)
  • low latency storage (fraction of ms)
  • high /byte but good /access
  • Smart Disks still seem far of, but...

5
To Blob or Not To Blob (½)
  • Folklore
  • DB is good for billions of small things
  • Files are good for thousands of big things
  • Put another way
  • DB is bad at big objects
  • Files Systems have trouble with billions of
    files.
  • This is a fact, not a law of nature
  • DB and FS could learn each others tricks.
  • But what is big and small? Put another
    way what is break-even size?

6
To Blob or Not To Blob (2/2)
  • Folklore BLOBS win for things less than 1MB.
  • RefinementIf fragmentation, BLOBs win below
    250KB.
  • Humor most files are less than 250KB. (but
    most bytes are in big files).
  • To BLOB or Not To BLOB Large Object Storage in
    a Database or a Filesystem? Russell Sears,
    Catharine Van Ingen, Jim Gray, MSR-TR-2006-45,
    April 2006

7
How Reliable are Cheap Disks? (1/5)
  • Prices, Specs, and Gurus suggestSCSI good SATA
    bad.
  • 3x cheaper but
  • 10x shorter MTTF
  • 10x shorter warranty
  • 100x higher Uncorrectable Error on Read (UER)
  • Spec Sheet says 1 UER every 10 Terabytes!
  • So, we measured and here is what we saw

8
How Reliable are Cheap Disks? (2/5)
DISK DRIVE FAILURES
  • Things fail much more often than predicted
  • Vendors say 0.5 /year
  • Customers see 10x that rate
  • Vendors say
  • 60 are no trouble found
  • 30 are mis-handling (dropped/cooked/bent pins)
  • 10 are real failures.
  • Will UERs be worse than the specs?We need to
    worry about ctlr, pci, ram, software,

9
How Reliable are Cheap Disks? (3/5)
  • For the record

Observed failure rates. Observed failure rates. Observed failure rates. Observed failure rates. Observed failure rates. Observed failure rates.
System Type Part Years Fails Fails /Year
TerraServer SAN SCSI 10krpm 858 24 2.8
TerraServer SAN controllers 72 2 2.8
TerraServer SAN san switch 9 1 11.1
TerraServer Brick SATA 7krpm 138 10 7.2
Web Property 1 SCSI 10krpm 15,805 972 6.0
Web Property 1 controllers 900 139 15.4
Web Property 2 PATA 7krpm 22,400 740 3.3
Web Property 2 motherboard 3,769 66 1.7
Empirical Measurements of Disk Failure Rates and
Error Rates, Jim Gray, Catharine van Ingen,
MSR-TR-2005-166, December 2005
10
How Reliable are Cheap Disks? (4/5)
  • The experiment
  • Do 180,000 times ( 1.8PB 1E16 bits)
  • Create and write 10GB disk file
  • Read it to check the checksum
  • On various office systems for 4 months (8
    drive years)
  • Expected 114 UER events, Observed 3 or 4
    UER events
  • Two events corrected by OS on retry -- 1 real
    one
  • no disk failures
  • a file-system corruption (due to controller we
    guess)
  • Many reboots due to security patches
  • 4 system hangs (bad controllers / drivers).
  • UER better than advertised (checked end-to-end)
  • Empirical Measurements of Disk Failure Rates and
    Error Rates, MSR-TR-2005-166

11
Moral Design For Failure (5/5)
  • Things break
  • disks break
  • controllers break
  • systems break
  • software breaks
  • data centers break
  • networks break
  • Design for independent failure modes
  • guard against operations errors
  • guard against sympathetic failures
  • guard against viruses
  • Simple recovery is testable
  • The cost of reliability is simplicity.Few are
    willing to pay that price T. Hoare

12
Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
  • At 1GBps it takes 12 days!
  • Store it in two (or more) places online. A
    geo-plex
  • Scrub it continuously (look for errors)
  • On failure,
  • use other copy until failure repaired,
  • refresh lost copy from safe copy.
  • Can organize the two copies differently
    (e.g. one by time, one by space)

13
Why 4 copies
  • duplex storage masks MOST failures
  • But,.. when one is broken you are worried
  • So, triplex it (a la GFS, Cosmos, Blue)
  • And you need geo-plex anyway
  • So, why not 22 rather than 33?
  • Symmetric and simple good.

14
Outline
  • Moores Law and consequences
  • Storage rules of thumb
  • Balanced systems rules revisited
  • Networking rules of thumb
  • Caching rules of thumb

15
Meta-Message Technology Ratios Matter
  • Price and Performance change.
  • If everything changes in the same way, then
    nothing really changes.
  • If some things get much cheaper/faster than
    others, then that is real change.
  • Some things are not changing much
  • Cost of people
  • Speed of light
  • And some things are changing a LOT

16
The Perfect Memory (ratio problems)
  • Store name-value pairs
  • Read value given name (or predicate?) instantly!
  • Capacity has grown 2x/year (or 2x/2y)
  • But ratios are changing
  • Latency lags bandwidth (Patterson
    http//portal.acm.org/citation.cfm?id1022596)
  • Bandwidth lags capacity
  • Pipelining (prefetch) can hide latency
  • No way to fake bandwidth you have to pay for
    it!

17
Find Useful Ways To waste Space
  • 1 TB disks now
  • 100TB disks in 10 years? (or.)
  • Cost 1GB now, 10/TB in future
  • Smart disks eventually (or now if you count xbox,
    ipod, )
  • Petabyte 1,400 disks now 140 disks in
    2012
  • Simple math
  • 30M seconds/year,
  • 1GBps 30 PB/y
  • Find creative ways to waste 99 of capacity
    but not use any bandwidth (ice cold data)

18
Technology Trends
  • 1 TB disks now
  • 100TB disks in 10 years? (or.)
  • Cost 1GB now, 10/TB in future
  • Smart disks eventually (or now if you count xbox,
    ipod, )
  • Petabyte 1,400 disks now 300 disks in 2010
  • Simple math
  • 30M seconds/year,
  • 1GBps 30 PB/y

19
Technology Trend Implication
  • Find creative ways to waste 99 of capacity
    but not use any bandwidth (ice cold data)
  • replication
  • snapshots
  • archive
  • Pipeline-Prefetch rewards
  • sequential access patterns
  • very large transfers
  • large 1MB now,
  • large 100MB in future
  • Dataflow programming stream data to programs.

20
Technology Trend Implication
  • Q For an infinite disk, how long does it
    take to
  • check disk (scrub)
  • defragment
  • reorganize
  • backup
  • A A LONG time
  • Doing all four takes 4x longer
  • Nightly/weekly ltlt 4xInfinity
  • Short-term fix
  • combine utility scans
  • one pass algorithms.
  • Van Ingen Where have all the IOPS gone?
    MSR-TR-2005-181

21
Bandwidth links and parallel links
  • Today
  • 40 Gbps per channel (?)
  • 12 channels per fiber (wdm) 500 Gbps
  • 32 fibers/bundle 16 Tbps/bundle
  • In lab 20 Tbps/fiber (400 x WDM)
  • 1 Tbps USA 1996 WAN bisection bandwidth
  • Serial links are fast can be used in parallel

1 fiber 25 Tbps
22
Free Storage like free puppies
  • Storage is cheap (1k/TB)
  • Storage management is not100K /TB /Year (or
    less )opX gt 100 capX
  • Goal opX ltlt capX

23
Trends Moores Law
  • Performance/Price doubles every 18 months
  • 100x per decade
  • Progress in next 18 months ALL previous
    progress
  • New storage sum of all old storage (ever)
  • New processing sum of all old processing.
  • E. coli double ever 20 minutes!

15 years ago
24
Trends ops/s/ Had Three Growth Phases
  • 1890-1945
  • Mechanical
  • Relay
  • 7-year doubling
  • 1945-1985
  • Tube, transistor,..
  • 2.3 year doubling
  • 1985-2010
  • Microprocessor
  • 1.0 year doubling

25
So a problem
  • Suppose you have a ten-year compute job on the
    worlds fastest supercomputer. What should you
    do.
  • ? Commit 250M now?
  • ? Program for 9 years Software speedup 26
    64x Moores law speedup 26 64x so
    4,000x speedup spend 1M (not 250M on
    hardware) runs in 2 weeks, not 10 years.
  • Homework problem What is the optimum strategy?

26
Storage Capacity Beating Moores Law
  • 500/TB today (raw disk)
  • 50/TB by 2010
  • 2005 shipped 350M drives (28 increase over
    2004) 0.1 Zeta Byte (!)

27
Trends Magnetic Storage Densities
  • Amazing progress
  • Ratios have changed
  • ImprovementsCapacity 60/yBandwidth 40/yAcce
    ss time 16/y

28
Trends Density Limits
Bit Density
Density vs Time b/µm2 Gb/in2
  • The end is near!
  • In 2000 Products_at_23 GbpsiLab 50
    Gbpsilimit 60 Gbpsi
  • Butlimit keeps rising there are alternatives
  • Today Products _at_ 245 gbsilimit at 5 tbpsi

b/µm2 Gb/in2
3,000 2,000
? NEMS, Florescent? Holographic, DNA?
1,000 600
300 200
SuperParmagnetic Limit
100 60
30 20
Wavelength Limit
ODD
10 6
DVD
3 2
CD

1 0.6
1990 1992 1994 1996 1998 2000 2002 2004
2006 2008
Figure adapted from Franco Vitaliano, The NEW
new media the growing attraction of nonmagnetic
storage, Data Storage, Feb 2000, pp 21-32
29
Consequence of Moores lawNeed an address bit
every 18 months.
  • Moores law gives you 2x more in 18 months.
  • RAM
  • Today we have 1 GB to 1 TB machines(30-40 bits
    of addressing)
  • In 9 years we will need 6 more bits 36-46 bit
    addressing (64GB - 64TB ram).
  • Disks
  • Today we have 10 GB to 10 TB files DBs(33-43
    bit file addresses)
  • In 9 years, we will need 6 more bits40-50 bit
    file addresses (1 PB files (! (?)))

30
Architecture could change this
  • 1-level store
  • System 48, AS400 has 1-level store.
  • Never re-uses an address.
  • Needs 96-bit addressing today.
  • NUMAs and Clusters
  • Willing to buy a 100 M computer?
  • Then add 6 more address bits.
  • Only 1-level store pushes us beyond 64-bits
  • Still, these are logical addresses, 64-bit
    physical will last many years

31
Outline
  • Moores Law and consequences
  • Storage rules of thumb
  • Balanced systems rules revisited
  • Networking rules of thumb
  • Caching rules of thumb

32
How much storage do we need?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
  • Soon everything can be recorded and indexed
  • Most bytes will never be seen by humans.
  • Data summarization, trend detection anomaly
    detection are key technologies
  • See Mike Lesk How much information is there
    http//www.lesk.com/mlesk/ksg97/ksg.html
  • See Lyman Varian
  • How much information
  • http//www.sims.berkeley.edu/research/projects/how
    -much-info/

Everything! Recorded
All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
33
Storage Latency How Far Away is the Data?
Andromeda
9
Tape /Optical
10
2,000 Years
Robot
6
Pluto
Disk
2 Years
10
1.5 hr
Olympia
Memory
100
This Campus
10
10 min
On Board Cache
On Chip Cache
2
This Room
Registers
1
My Head
1 min
34
Storage Hierarchy Speed Capacity vs Cost
Tradeoffs
Price vs Speed
Size vs Speed
Offline
Cache
Nearline
Main
Disc
Tape
Tape
Secondary
Online
Secondary
Online
/GB
Disc
Typical System (bytes)
Main
Nearline
Offline
Cache
-9
-6
-3
0
3
-9
-6
-3
0
3
10
10
10
10
10
10
10
10
10
10
Access Time (seconds)
Access Time (seconds)
35
Disks Today
  • Disk is 30GB to 1 TB10-80 MBps5k-15k rpm
    (6ms-2ms rotational latency)10ms-3ms seek/TB
    .5K/ATA, 1.2k/SCSI
  • For shared disks most time spent waiting in queue
    for access to arm/controller

Wait
Transfer
Transfer
Rotate
Rotate
Seek
Seek
36
The Street Price of a Raw disk TB about 1K/TB
37
Standard Storage Metrics
  • Capacity
  • RAM MB and /MB today at 4GB and
    100/GB
  • Disk GB and /GB today at 700GB and
    500/TB
  • Tape TB and /TB today at 400GB and
    300/TB (nearline)
  • Access time (latency)
  • RAM 1100 ns
  • Disk 515 ms
  • Tape 30 second pick, 30 second position
  • Transfer rate
  • RAM 1-10 GB/s
  • Disk 50 MB/s - - -Arrays can go to
    1GB/s
  • Tape 50 MB/s - - - Arrays can go to
    1GB/s

38
New Storage Metrics Kaps, Maps, SCAN
  • Kaps How many kilobyte objects served per second
  • The file server, transaction processing metric
  • This is the OLD metric.
  • Maps How many megabyte objects served per sec
  • The Multi-Media metric
  • SCAN How long to scan all the data
  • the data mining and utility metric
  • And
  • Kaps/, Maps/, TBscan/

39
For the Record (good 2002 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Compaq/compaq.5500.99050701.es.pdf)
X 100
Tape slice is 8Tb with 1 LTO reader at 50MBps per
100 tapes.
40
For the Record (good 2002 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Compaq/compaq.5500.99050701.es.pdf)
Tape is 1Tb with 4 DLT readers at 5MBps each.
41
Disk Changes
  • Disks got cheaper 20k -gt 200
  • /Kaps etc improved 100x (Moores law!) (or even
    500x)
  • One-time event (went from mainframe prices to PC
    prices)
  • Disks got cooler (50x per decade)
  • 1990 1 Kaps per 20 MB (1GB disk)
  • 2006 1 Kaps per 10,000 MB (.75TB disk)
  • Disk scans take longer (10x per decade)
  • 1990 disk 1GB and 50Kaps and 5 minute scan
  • 2006 disk 750GB and 150Kaps and 5 hour scan
  • So.. Backup/restore takes a long time (too long)

42
Storage Ratios Changed
  • 10x better access time
  • 10x more bandwidth
  • 100x more capacity
  • Data 25x cooler (1Kaps/20MB vs 1Kaps/GB)
  • 4,000x lower media price
  • 20x to 100x lower disk price
  • Scan takes 10x longer (3 min vs 1hr)
  • RAM/disk media price ratio changed
  • 1970-1990 1001
  • 1990-1995 101
  • 1995-1997 501
  • 2006 0.5/GB disk 2001
    100/GB ram

43
More Kaps and Kaps/
  • Disk accesses got much less expensive Better
    disks Cheaper disks!
  • But disk arms are expensivethe scarce resource
  • 5 hour Scanvs 5 minutes in 1990

Assumptions 15krpm, Dell TPC-C pricing for
scsi disks cabinets and controllers depreciated
over 3 years.
44
Data on Disk Can Move to RAM in 10 years
1001
10 years
45
The Absurd Disk Has Arrived
  • 2.5 hr scan time (poor sequential access)
  • 1 kaps / 10 GB (VERY cold data)
  • Its a tape!

1 TB
100 MB/s
100 Kaps
46
FLASH The Gap Filler?
  • Flash chips are 4GB today cards 64GB.
  • 20/GB
  • 1/5 RAM price
  • but 20x disk price, but 20x better kaps
  • Predicted to double each year to Tbit
  • doubled each year since 1997
  • Will eat disk market from below
  • cameras, ipods, then laptops then
  • similar to cost/page or cost/first-page in
    printers
  • Block-oriented read-write (2KB)
  • 20MB/s per chip
  • read 16 chips in parallel (64KB page, 320MB/s
  • 125 µs latency on read (25 fixed, 100 transfer)
  • Write has 2ms latency (clear the page)
  • Pages can only be written 1M times
    (approximately).

Year chip gbit Package GB
2006 16 4
2007 32 8
2008 64 16
2009 128 32
2010 256 64
2011 512 128
2012 1024 256
80 package
47
Flash CERTAINLY Represents an Opportunity To
Rethink
  • A Non-Volatile disk buffer (inside drive?)
  • Low latency (100us) cache near cpu
  • WAL Cache for Databases
  • Quick restart
  • FLASH is a block oriented deviceIt likes
    read/write sequential It likes big (64KB
    reads/writes)

A Design for High-Performance Flash
Disks Andrew Birrell Michael Isard Chuck
Thacker Ted Wobber December 2005,
MSR-TR-2005-176
48
Disk vs Tape
  • Tape
  • 400 GB (80/cartrige)
  • 40 MBps
  • 10 sec pick time
  • 30-120 second seek time
  • 200/TB for media800/TB for drivelibrary
  • 1 week scan
  • Disk
  • 750 GB
  • 50 MBps
  • 4 ms seek time
  • 2 ms rotate latency
  • 0.5 /GB for drive 0.5 /GB for ctlrs/cabinet
  • 3.6 PB/rack
  • 5 hour scan

Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 1.25 drives
The price advantage of tape is gone, and the
performance advantage of disk is growing At
1K/TB, disk is competitive with nearline tape.
49
Auto Manage Storage
  • 1980 rule of thumb
  • A DataAdmin per 10GB, SysAdmin per mips
  • 2006 rule of thumb
  • A DataAdmin per 50TB (WITH GOOD TOOLS)
  • Data Admin per ½ TB with crappy tools!
  • SysAdmin per 100 clones (varies with app).
  • Problem
  • 5TB is gt5k today, 500 in a few years.
  • Admin cost gtgt storage cost !!!!
  • Challenge
  • Automate ALL storage admin tasks

50
How to cool disk data
  • Cache data in main memory
  • See 30 minute rule later in presentation
  • Fewer-larger transfers
  • Larger pages (512-gt 8KB -gt 256KB)
  • Sequential rather than random access
  • Random 8KB IO is 1 MBps
  • Sequential IO is 60 MBps (601 ratio is growing)
  • Raid1 (mirroring) rather than Raid5 (parity).

51
Stripes, Mirrors, Parity (RAID 0,1, 5)
  • RAID 0 Stripes
  • bandwidth
  • RAID 1 Mirrors, Shadows,
  • Fault tolerance
  • Reads faster, writes 2x slower
  • RAID 5 Parity
  • Fault tolerance
  • Reads faster
  • Writes 4x or 6x slower.

0,3,6,..
1,4,7,..
2,5,8,..
0,1,2,..
0,1,2,..
0,2,P2,..
1,P1,4,..
P0,3,5,..
52
RAID 10 (strips of mirrors) Winswastes space,
saves arms
  • RAID 5 (6 disks 1 vol)
  • Performance
  • 675 reads/sec
  • 210 writes/sec
  • Write
  • 4 logical IO,
  • 2 seek 1.7 rotate
  • SAVES SPACE
  • Performance degrades on failure
  • RAID1 (6 disks, 3 pairs)
  • Performance
  • 750 reads/sec
  • 300 writes/sec
  • Write
  • 2 logical IO
  • 2 seek 0.7 rotate
  • SAVES ARMS
  • Performance improves on failure

53
Best Index Page Size gt64KB
small page has few entries, so little benefit big
pages waste ram and bandwidth
Best near 100KB
54
Summarizing storage rules of thumb (1)
  • Moores law 4x every 3 years 100x more per
    decade
  • Ratios change!!!
  • Implies 2 bit of addressing every 3 years.
  • Storage capacities increase 100x/decade
  • Storage costs drop 100x per decade
  • Storage throughput increases 10x/decade
  • Data cools 10x/decade
  • Disk page sizes increase 5x per decade.

55
Summarizing storage rules of thumb (2)
  • RAMDisk and DiskTape cost ratios are 1001
    and 11
  • Prices decline 100x per decade, so, in 10 years,
    disk data can move to RAM.
  • A person should be able to administer a million
    dollars of storage that is 1PB today
  • Disks are replacing tapes as backup devices.You
    cant backup/restore a Petabyte quicklyso
    geoplex it.
  • Mirroring rather than Parity to save disk arms

56
Outline
  • Moores Law and consequences
  • Storage rules of thumb
  • Balanced systems rules revisited
  • Networking rules of thumb
  • Caching rules of thumb

57
Standard Architecture (today)
58
Amdahls Balance Laws
  • parallelism law If a computation has a serial
    part S and a parallel component P, then the
    maximum speedup is (SP)/S.
  • balanced system law A system needs a bit of IO
    per second per instruction per secondabout 8
    MIPS per MBps.
  • memory law ?1 the MB/MIPS ratio (called alpha
    (?)), in a balanced system is 1.
  • IO law Programs do one IO per 50,000
    instructions.

59
Amdahls Laws Valid 40 Years Later?
  • Parallelism law is algebra so SURE!
  • Balanced system laws?
  • Look at tpc results (tpcC, tpcH) at
    http//www.tpc.org/
  • Some imagination needed
  • Whats an instruction (CPI varies from 1-3)?
  • RISC, CISC, VLIW, clocks per instruction,
  • Whats an I/O?

60
TPC systems Disk/CPU and I/B
  • Normalize for CPI (clocks per instruction)
  • TPC-C has about 14 ins/byte of IO
  • TPC-H has 1 ins/byte of IO

61
TPC systems Whats alpha (MB/MIPS)?
  • Hard to say
  • Intel 32 bit addressing ( 4GB limit). Known CPI.
  • IBM, HP, Sun have 64 GB limit. Unknown CPI.
  • Look at both, guess CPI for IBM, HP, Sun
  • Alpha is between 4 and 16

Mips Memory Alpha Disks/cpu
Amdahl 1 1 1 1
tpcC Intel 4x3Ghz 6Gips 24GB 4 25..100
tpcH Intel 4x2.4Ghz 10Gips 64GB 16 10..40
62
Instructions per IO?
  • We know 8 mips per MBps of IO
  • So, 8KB page is 64 K instructions
  • And 64KB page is 512 K instructions.
  • But, sequential has fewer instructions/byte. (3
    vs 7 in tpcH vs tpcC).
  • So, 64KB page is 200 K instructions.

63
Amdahls Balance Laws Revised
  • Laws right, just need interpretation
    (imagination?)
  • Balanced System Law A system needs 8
    MIPS/MBpsIO, but instruction rate must be
    measured on the workload.
  • Sequential workloads have low CPI (clocks per
    instruction),
  • random workloads tend to have higher CPI.
  • Alpha (the MB/MIPS ratio) is rising from 1 to 16.
    This trend will likely continue.
  • One Random IO per 50k instructions.
  • Sequential IOs are larger One sequential IO per
    200k instructions

64
PAP vs RAP (a 2006 perspective)
  • Peak Advertised Performance vs Real Application
    Performance

65
Outline
  • Moores Law and consequences
  • Storage rules of thumb
  • Balanced systems rules revisited
  • Networking rules of thumb
  • Caching rules of thumb

66
Standard IO (Infiniband) next Year?
  • Probably
  • Replace PCI with something better will still
    need a mezzanine bus standard
  • Multiple serial links directly from processor
  • Fast (10 GBps/link) for a few meters
  • System Area Networks (SANS) ubiquitous (VIA
    morphs to Infiniband?)

ie 2001
in 2006Inifiniband got marginalized by 10Gbps
Ethernet. It has low-latency, but that is a
niche. PCI-Express came along
67
Ubiquitous 10 GBps SANs in 5 years
  • 1Gbps Ethernet are reality now.
  • Also FiberChannel ,MyriNet, GigaNet, ServerNet,,
    ATM,
  • 10 Gbps x4 WDM deployed now (OC192)
  • 3 Tbps WDM working in lab
  • In 5 years, expect 10x, wow!!

1 GBps
120 MBps (1Gbps)
80 MBps
5 MBps
40 MBps
20 MBps
68
Networking
  • WANS are getting faster than LANSG8 OC192
    9Gbps is standard
  • Link bandwidth improves 4x per 3 years
  • Speed of light (60 ms round trip in US)
  • Software stacks have always been the problem.

Time SenderCPU ReceiverCPU bytes/bandwidth
This has been the problem for small (10KB or
less) messages
69
The Promise of SAN/VIA10x in 2 years
http//www.ViArch.org/
  • Yesterday
  • 10 MBps (100 Mbps Ethernet)
  • 20 MBps tcp/ip saturates 2 cpus
  • round-trip latency 250 µs
  • Now
  • Wires are 10x faster Myrinet, Gbps Ethernet,
    ServerNet,
  • Fast user-level communication
  • tcp/ip 100 MBps 10 cpu
  • round-trip latency is 15 us
  • 1.6 Gbps demoed on a WAN

70
The Network Revolution
  • Networking folks are finally streamlining LAN
    case (SAN).
  • Offloading protocol to NIC
  • ½ power point is 8KB
  • Min round trip latency is 50 µs.
  • 3k ins .1 ins/byte
  • High-Performance Distributed Objects over a
    System Area NetworkLi, L. Forin, A. Hunt, G.
    Wang, Y. , MSR-TR-98-68

71
How much does wire-time cost?/Mbyte?
  • Cost Time
  • Gbps Ethernet .2µ 10 ms
  • 100 Mbps Ethernet .3µ 100 ms
  • OC12 (650 Mbps) .003 20 ms
  • DSL .0006 25 sec
  • POTs .002 200 sec
  • Wireless .80 500 sec

72
Data delivery costs 1/GB today
  • Rent for big customers 30/megabit per
    second per month
  • Improved 3x in last 6 years (!).
  • That translates to 0.1 /GB at each end.
  • Overhead (routers, people,..) makes it 1/GB at
    each end.
  • You can mail a 750 GB disk for 20.
  • Thats 30x .. 3 x cheaper
  • If overnight its 7 MBps.
  • 7 disks 50 MBps (1/4 Gbps)
  • TeraScale SneakerNet

7x750 GB 5 TB
73
Outline
  • Moores Law and consequences
  • Storage rules of thumb
  • Balanced systems rules revisited
  • Networking rules of thumb
  • Caching rules of thumb

74
The Five Minute Rule
  • Trade DRAM for Disk Accesses
  • Cost of an access (Drive_Cost /
    Access_per_second)
  • Cost of a DRAM page ( /MB/ pages_per_MB)
  • Break even has two terms
  • Technology term and an Economic term
  • Grew page size to compensate for changing ratios.
  • Now at 5 minutes for random, 10 seconds sequential

75
The 5 Minute Rule Derived

Disk Access Cost /T DiskPrice .
AccessesPerSecond
( )/T
Cost a RAM Page RAM__Per_MB
PagesPerMB

T TimeBetweenReferences to Page
  • Breakeven
  • RAM__Per_MB _____DiskPrice
    .
  • PagesPerMB T x
    AccessesPerSecond
  • T DiskPrice x
    PagesPerMB .
  • RAM__Per_MB x
    AccessPerSecond

76
Plugging in the Numbers
PPM/aps disk/Ram Break Even
Random 128/120 1 200/0.1 2,000 28 minutes
Sequential 1/60 .01 2,000 30seconds
  • Trend is longer times because disk not
    changing much, RAM declining 100x/decade

30 Minutes 30 second rule
77
When to Cache Web Pages.
  • Caching saves user time
  • Caching saves wire time
  • Caching costs storage
  • Caching only works sometimes
  • New pages are a miss
  • Stale pages are a miss

78
Web Page Caching Saves People Time
  • Assume people cost 20/hour (or .2 /hr ???)
  • Assume 20 hit in browser, 40 in proxy
  • Assume 3 second server time
  • Caching saves people time 28/year to 150/year
    of people time or .28 cents to 1.5/year.

79
Web Page Caching Saves Resources
  • Wire cost is penny (wireless) to 100µ LAN
  • Storage is 8 µ/mo
  • Breakeven wire cost storage rent 18 months
    to 300 years
  • Add people cost breakeven gt15 years.cheap
    people (.2/hr) ? gt3 years.

80
Caching
  • Disk caching
  • 30 minute rule for random IO
  • 30 second rule for sequential IO
  • Web page caching
  • If page will be re-referenced in 18 months
    with free users 15 years with valuable
    usersthen cache the page in the client/proxy.
  • Challenge guessing which pages will be
    re-referenceddetecting stale pages (page
    velocity)

81
Meta-Message Technology Ratios Matter
  • Price and Performance change.
  • If everything changes in the same way, then
    nothing really changes.
  • If some things get much cheaper/faster than
    others, then that is real change.
  • Some things are not changing much
  • Cost of people
  • Speed of light
  • And some things are changing a LOT

82
Outline
  • Moores Law and consequences
  • Storage rules of thumb
  • Balanced systems rules revisited
  • Networking rules of thumb
  • Caching rules of thumb

83
Whats New / Surprising
  • Not a big surprise just amazing!
  • exponential growth in capacity
  • latency lags bandwidth lags cpacity
  • 5 minute rule is 30 minute rule
  • FLASH is coming
  • low end storage (GBs now 100 GBs soon)
  • low latency storage (fraction of ms)
  • high /byte but good /access
  • Smart Disks still seem far of, but...
Write a Comment
User Comments (0)
About PowerShow.com