Designing for 20TB Disk Drives And enterprise storage - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Designing for 20TB Disk Drives And enterprise storage

Description:

1. Designing for 20TB Disk Drives. And 'enterprise storage' Jim Gray, Microsoft research ... Smart drives. Camera with micro-drive. Replay / Tivo / Ultimate TV ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 34
Provided by: jimg178
Category:

less

Transcript and Presenter's Notes

Title: Designing for 20TB Disk Drives And enterprise storage


1
Designing for 20TB Disk DrivesAnd enterprise
storage
  • Jim Gray, Microsoft research

2
Disk Evolution
Kilo Mega Giga Tera Peta Exa Zetta Yotta
  • Capacity100x in 10 years 1 TB 3.5 drive in
    2005 20 TB? in 2012?!
  • System on a chip
  • High-speed SAN
  • Disk replacing tape
  • Disk is super computer!

3
Disks are becoming computers
  • Smart drives
  • Camera with micro-drive
  • Replay / Tivo / Ultimate TV
  • Phone with micro-drive
  • MP3 players
  • Tablet
  • Xbox
  • Many more

ApplicationsWeb, DBMS, Files OS
Disk Ctlr 1Ghz cpu 1GB RAM
Comm Infiniband, Ethernet, radio
4
Intermediate Step Shared Logic
Snap 1TB 12x80GB NAS
  • Brick with 8-12 disk drives
  • 200 mips/arm (or more)
  • 2xGbpsEthernet
  • General purpose OS
  • 10k/TB to 100k/TB
  • Shared
  • Sheet metal
  • Power
  • Support/Config
  • Security
  • Network ports
  • These bricks could run applications (e.g. SQL or
    Mail or..)

NetApp .5TB 8x70GB NAS
Maxstor 2TB 12x160GB NAS
IBM TotalStorage 360GB 10x36GB NAS
5
Hardware
  • Homogenous machines leads to quick response
    through reallocation
  • HP desktop machines, 320MB RAM, 3u high, 4 100GB
    IDE Drives
  • 4k/TB (street), 2.5processors/TB, 1GB RAM/TB
  • 3 weeks from ordering to operational

Slide courtesy of Brewster Kahle, _at_ Archive.org
6
Disk as Tape
  • Tape is unreliable, specialized, slow, low
    density, not improving fast, and expensive
  • Using removable hard drives to replace tapes
    function has been successful
  • When a tape is needed, the drive is put in a
    machine and it is online. No need to copy from
    tape before it is used.
  • Portable, durable, fast, media cost raw tapes,
    dense. Unknown longevity suspected good.

Slide courtesy of Brewster Kahle, _at_ Archive.org
7
Disk As Tape What format?
  • Today I send NTFS/SQL disks.
  • But that is not a good format for Linux.
  • Solution Ship NFS/CIFS/ODBC servers (not disks)
  • Plug disk into LAN.
  • DHCP then file or DB server via standard
    interface.
  • Web Service in long term

8
State is Expensive
  • Stateless clones are easy to manage
  • App servers are middle tier
  • Cost goes to zero with Moores law.
  • One admin per 1,000 clones.
  • Good story about scaleout.
  • Stateful servers are expensive to manage
  • 1TB to 100TB per admin
  • Storage cost is going to zero(2k to 200k).
  • Cost of storage is management cost

9
Databases ( SQL)
  • VLDB survey (Winter Corp).
  • 10 TB to 100TB DBs.
  • Size doubling yearly
  • Riding disk Moores law
  • 10,000 disks at 18GB is 100TB cooked.
  • Mostly DSS and data warehouses.
  • Some media managers

10
Interesting facts
  • No DBMSs beyond 100TB.
  • Most bytes are in files.
  • The web is file centric
  • eMail is file centric.
  • Science (and batch) is file centric.
  • But.
  • SQL performance is better than CIFS/NFS..
  • CISC vs RISC

11
BarBar the biggest DB
  • 500 TB
  • Uses Objectivity
  • SLAC events
  • Linux cluster scans DB looking for patterns

12
300 TB (cooked)Hotmail / Yahoo
  • Clone front ends 10,000_at_hotmail.
  • Application servers
  • 100 _at_ hotmail
  • Get mail box
  • Get/put mail
  • Disk bound
  • 30,000 disks
  • 20 admins

13
AOL (msn) (1PB?)
  • 10 B transactions per day (10 of that)
  • Huge storage
  • Huge traffic
  • Lots of eye candy
  • DB used for security/accounting.
  • GUESS AOL is a petabyte
  • (40M x 10MB 400 x 1012)

14
Google1.5PB as of last spring
  • 8,000 no-name PCs
  • Each 1/3U, 2 x 80 GB disk, 2 cpu 256MB ram
  • 1.4 PB online.
  • 2 TB ram online
  • 8 TeraOps
  • Slice-price is 1K so 8M.
  • 15 admins (!) ( 1/100TB).

15
Astronomy
  • Ive been trying to apply DB to astronomy
  • Today they are at 10TB per data set
  • Heading for Petabytes
  • Using Objectivity
  • Trying SQL (talk to me offline)

16
Scale Out Buy Computing by the Slice709,202
tpmC! 1 Billion transactions/day
  • Slice 8cpu, 8GB, 100 disks (1.8TB) 20ktpmC per
    slice, 300k/slice
  • clients and 4 DTC nodes not shown

17
ScaleUp A Very Big System!
  • UNISYS Windows 2000 Data Center Limited Edition
  • 32 cpus on
  • 32 GB of RAM and
  • 1,061 disks (15.5 TB)
  • Will be helped by 64bit addressing

24 fiber channel
18
Hardware
8 Compaq DL360 Photon Web Servers
One SQL database per rack Each rack contains 4.5
tb 261 total drives / 13.7 TB total
Fiber SAN Switches
Meta Data Stored on 101 GB Fast, Small
Disks(18 x 18.2 GB)
SQL\Inst1
Imagery Data Stored on 4 339 GB Slow, Big
Disks (15 x 73.8 GB)
SQL\Inst2
SQL\Inst3
To Add 90 72.8 GB Disks in Feb 2001 to create 18
TB SAN
Spare
4 Compaq ProLiant 8500 Db Servers
19
Amdahls Balance Laws
  • parallelism law If a computation has a serial
    part S and a parallel component P, then the
    maximum speedup is (SP)/S.
  • balanced system law A system needs a bit of IO
    per second per instruction per secondabout 8
    MIPS per MBps.
  • memory law ?1 the MB/MIPS ratio (called alpha
    (?)), in a balanced system is 1.
  • IO law Programs do one IO per 50,000
    instructions.

20
Amdahls Laws Valid 35 Years Later?
  • Parallelism law is algebra so SURE!
  • Balanced system laws?
  • Look at tpc results (tpcC, tpcH) at
    http//www.tpc.org/
  • Some imagination needed
  • Whats an instruction (CPI varies from 1-3)?
  • RISC, CISC, VLIW, clocks per instruction,
  • Whats an I/O?

21
TPC systems
  • Normalize for CPI (clocks per instruction)
  • TPC-C has about 7 ins/byte of IO
  • TPC-H has 3 ins/byte of IO
  • TPC-H needs ½ as many disks, sequential vs random
  • Both use 9GB 10 krpm disks (need arms, not bytes)

22
TPC systems Whats alpha (MB/MIPS)?
  • Hard to say
  • Intel 32 bit addressing ( 4GB limit). Known CPI.
  • IBM, HP, Sun have 64 GB limit. Unknown CPI.
  • Look at both, guess CPI for IBM, HP, Sun
  • Alpha is between 1 and 6

23
Performance (on current SDSS data)
  • Run times on 15k COMPAQ Server (2 cpu, 1 GB ,
    8 disk)
  • Some take 10 minutes
  • Some take 1 minute
  • Median 22 sec.
  • Ghz processors are fast!
  • (10 mips/IO, 200 ins/byte)
  • 2.5 m rec/s/cpu

1,000 IO/cpu sec 64 MB IO/cpu sec
24
How much storage do we need?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
Everything! Recorded
  • Soon everything can be recorded and indexed
  • Most bytes will never be seen by humans.
  • Data summarization, trend detection anomaly
    detection are key technologies
  • See Mike Lesk How much information is there
    http//www.lesk.com/mlesk/ksg97/ksg.html
  • See Lyman Varian
  • How much information
  • http//www.sims.berkeley.edu/research/projects/how
    -much-info/

All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
25
Standard Storage Metrics
  • Capacity
  • RAM MB and /MB today at 512MB and 200/GB
  • Disk GB and /GB today at 80GB and
    70k/TB
  • Tape TB and /TB today at 40GB and
    10k/TB (nearline)
  • Access time (latency)
  • RAM 100 ns
  • Disk 15 ms
  • Tape 30 second pick, 30 second position
  • Transfer rate
  • RAM 1-10 GB/s
  • Disk 10-50 MB/s - - -Arrays can go to
    10GB/s
  • Tape 5-15 MB/s - - - Arrays can go to
    1GB/s

26
New Storage Metrics Kaps, Maps, SCAN
  • Kaps How many kilobyte objects served per second
  • The file server, transaction processing metric
  • This is the OLD metric.
  • Maps How many megabyte objects served per sec
  • The Multi-Media metric
  • SCAN How long to scan all the data
  • the data mining and utility metric
  • And
  • Kaps/, Maps/, TBscan/

27
More Kaps and Kaps/ but.
  • Disk accesses got much less expensive Better
    disks Cheaper disks!
  • But disk arms are expensivethe scarce resource
  • 1 hour Scanvs 5 minutes in 1990

28
Data on Disk Can Move to RAM in 10 years
1001
10 years
29
The Absurd 10x (4 year) Disk
  • 2.5 hr scan time (poor sequential access)
  • 1 aps / 5 GB (VERY cold data)
  • Its a tape!

1 TB
100 MB/s
200 Kaps
30
Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
  • At 1GBps it takes 12 days!
  • Store it in two (or more) places online (on
    disk?). A geo-plex
  • Scrub it continuously (look for errors)
  • On failure,
  • use other copy until failure repaired,
  • refresh lost copy from safe copy.
  • Can organize the two copies differently
    (e.g. one by time, one by space)

31
Auto Manage Storage
  • 1980 rule of thumb
  • A DataAdmin per 10GB, SysAdmin per mips
  • 2000 rule of thumb
  • A DataAdmin per 5TB
  • SysAdmin per 100 clones (varies with app).
  • Problem
  • 5TB is 50k today, 5k in a few years.
  • Admin cost gtgt storage cost !!!!
  • Challenge
  • Automate ALL storage admin tasks

32
How to cool disk data
  • Cache data in main memory
  • See 5 minute rule later in presentation
  • Fewer-larger transfers
  • Larger pages (512-gt 8KB -gt 256KB)
  • Sequential rather than random access
  • Random 8KB IO is 1.5 MBps
  • Sequential IO is 30 MBps (201 ratio is growing)
  • Raid1 (mirroring) rather than Raid5 (parity).

33
Data delivery costs 1/GB today
  • Rent for big customers 300/megabit per
    second per month
  • Improved 3x in last 6 years (!).
  • That translates to 1/GB at each end.
  • You can mail a 160 GB disk for 20.
  • Thats 16x cheaper
  • If overnight its 4 MBps.

3x160 GB ½ TB
Write a Comment
User Comments (0)
About PowerShow.com