Data%20Centric%20Computing - PowerPoint PPT Presentation

About This Presentation
Title:

Data%20Centric%20Computing

Description:

Data Centric Computing – PowerPoint PPT presentation

Number of Views:433
Avg rating:3.0/5.0
Slides: 52
Provided by: jimg178
Category:

less

Transcript and Presenter's Notes

Title: Data%20Centric%20Computing


1
Data Centric Computing
Yotta Zetta Exa Peta Tera Giga Mega Kilo
  • Jim Gray
  • Microsoft Research
  • Research.Microsoft.com/Gray/talks
  • FAST 2002
  • Monterey, CA, 14 Oct 1999

2
Put Everything in Future (Disk)
Controllers(its not if, its when?) Jim
Gray Microsoft Research http//Research.Micrsoft
.com/Gray/talks FAST 2002 Monterey, CA, 14
Oct 1999AcknowledgementsDave Patterson
explained this to me long ago Leonard
Chung Kim Keeton Erik Riedel
Catharine Van Ingen
Sub-Title

Helped me sharpen these arguments
3
First Disk 1956
  • IBM 305 RAMAC
  • 4 MB
  • 50x24 disks
  • 1200 rpm
  • 100 ms access
  • 35k/y rent
  • Included computer accounting software(tubes
    not transistors)

4
10 years later
1.6 meters
5
Disk Evolution
Kilo Mega Giga Tera Peta Exa Zetta Yotta
  • Capacity100x in 10 years 1 TB 3.5 drive in
    2005 20 GB as 1 micro-drive
  • System on a chip
  • High-speed SAN
  • Disk replacing tape
  • Disk is super computer!

6
Disks are becoming computers
  • Smart drives
  • Camera with micro-drive
  • Replay / Tivo / Ultimate TV
  • Phone with micro-drive
  • MP3 players
  • Tablet
  • Xbox
  • Many more

ApplicationsWeb, DBMS, Files OS
Disk Ctlr 1Ghz cpu 1GB RAM
Comm Infiniband, Ethernet, radio
7
Data Gravity Processing Moves to Transducers
smart displays, microphones, printers, NICs,
disks
Processing decentralized Moving to data
sources Moving to power sources Moving to sheet
metal ? The end of computers ?
  • Storage
  • Network
  • Display

8
Its Already True of PrintersPeripheral
CyberBrick
  • You buy a printer
  • You get a
  • several network interfaces
  • A Postscript engine
  • cpu,
  • memory,
  • software,
  • a spooler (soon)
  • and a print engine.

9
The (absurd?) consequences of Moores Law
  • 256 way nUMA?
  • Huge main memories now 500MB - 64GB memories
    then 10GB - 1TB memories
  • Huge disksnow 20-200 GB 3.5 disks then .1 -
    1 TB disks
  • Petabyte storage farms
  • (that you cant back up or restore).
  • Disks gtgt tapes
  • Small disksOne platter one inch 10GB
  • SAN convergence 1 GBps point to point is easy
  • 1 GB RAM chips
  • MAD at 200 Gbpsi
  • Drives shrink one quantum
  • 10 GBps SANs are ubiquitous
  • 1 bips cpus for 10
  • 10 bips cpus at high end

10
The Absurd Design?
  • Further segregate processing from storage
  • Poor locality
  • Much useless data movement
  • Amdahls laws bus 10 B/ips io 1 b/ips

Disks
Processors
100 GBps
10 TBps
1 Tips
100TB
11
Whats a Balanced System?(40 disk arms / cpu)
12
Amdahls Balance Laws Revised
  • Laws right, just need interpretation
    (imagination?)
  • Balanced System Law A system needs 8
    MIPS/MBpsIO, but instruction rate must be
    measured on the workload.
  • Sequential workloads have low CPI (clocks per
    instruction),
  • random workloads tend to have higher CPI.
  • Alpha (the MB/MIPS ratio) is rising from 1 to 6.
    This trend will likely continue.
  • One Random IOs per 50k instructions.
  • Sequential IOs are larger One sequential IO per
    200k instructions

13
Observations re TPC C, H systems
  • More than ½ the hardware cost is in disks
  • Most of the mips are in the disk controllers
  • 20 mips/arm is enough for tpcC
  • 50 mips/arm is enough for tpcH
  • Need 128MB to 256MB/arm
  • Ref
  • Gray Shenoy Rules of Thumb
  • Keeton, Riedel, Uysal, PhD thesis.
  • ? The end of computers ?

14
TPC systems
  • Normalize for CPI (clocks per instruction)
  • TPC-C has about 7 ins/byte of IO
  • TPC-H has 3 ins/byte of IO
  • TPC-H needs ½ as many disks, sequential vs random
  • Both use 9GB 10 krpm disks (need arms, not bytes)

15
TPC systems Whats alpha (MB/MIPS)?
  • Hard to say
  • Intel 32 bit addressing ( 4GB limit). Known CPI.
  • IBM, HP, Sun have 64 GB limit. Unknown CPI.
  • Look at both, guess CPI for IBM, HP, Sun
  • Alpha is between 1 and 6

Mips Memory Alpha
Amdahl 1 1 1
tpcC Intel 8x262 2Gips 4GB 2
tpcH Intel 8x458 4Gips 4GB 1
tpcC IBM 24 cpus ? 12 Gips 64GB 6
tpcH HP 32 cpus ? 16 Gips 32 GB 2
16
When each disk has 1bips, no need for cpu
17
Implications
Conventional
Radical
  • Move app to NIC/device controller
  • higher-higher level protocols CORBA / COM.
  • Cluster parallelism is VERY important.
  • Offload device handling to NIC/HBA
  • higher level protocols I2O, NASD, VIA, IP, TCP
  • SMP and Cluster parallelism is important.

18
Interim Step Shared Logic
  • Brick with 8-12 disk drives
  • 200 mips/arm (or more)
  • 2xGbpsEthernet
  • General purpose OS (except NetApp )
  • 10k/TB to 50k/TB
  • Shared
  • Sheet metal
  • Power
  • Support/Config
  • Security
  • Network ports

Snap 1TB 12x80GB NAS
NetApp .5TB 8x70GB NAS
Maxstor 2TB 12x160GB NAS
19
Next step in the Evolution
  • Disks become supercomputers
  • Controller will have 1bips, 1 GB ram, 1 GBps net
  • And a disk arm.
  • Disks will run full-blown app/web/db/os stack
  • Distributed computing
  • Processors migrate to transducers.

20
Gordon Bells Seven Price Tiers
  • 10 wrist watch computers
  • 100 pocket/ palm computers
  • 1,000 portable computers
  • 10,000 personal computers
    (desktop)
  • 100,000 departmental computers
    (closet)
  • 1,000,000 site computers
    (glass house)
  • 10,000,000 regional computers (glass
    castle)


Super-Server Costs more than 100,000
Mainframe Costs more than 1M Must be an
array of processors, disks, tapes comm
ports
21
Bells Evolution of Computer Classes
Technology enable two evolutionary paths 1.
constant performance, decreasing cost 2.
constant price, increasing performance
1.26 2x/3 yrs -- 10x/decade 1/1.26 .8 1.6
4x/3 yrs --100x/decade 1/1.6 .62
22
NAS vs SAN
High level Interfaces are better
  • Network Attached Storage
  • File servers
  • Database servers
  • Application servers
  • (its a slippery slope as Novell showed)
  • Storage Area Network
  • A lower life form
  • Block server get block / put block
  • Wrong abstraction level (too low level)
  • Security is VERY hard to understand.
  • (who can read that disk block?)

SCSI and iSCSI are popular.
23
How Do They Talk to Each Other?
  • Each node has an OS
  • Each node has local resources A federation.
  • Each node does not completely trust the others.
  • Nodes use RPC to talk to each other
  • WebServices/SOAP? CORBA? COM? RMI?
  • One or all of the above.
  • Huge leverage in high-level interfaces.
  • Same old distributed system story.

Applications
Applications
datagrams
datagrams
streams
RPC
?
streams
RPC
?
SIO
SIO
SAN
24
Basic Argument for x-Disks
  • Future disk controller is a super-computer.
  • 1 bips processor
  • 256 MB dram
  • 1 TB disk plus one arm
  • Connects to SAN via high-level protocols
  • RPC, HTTP, SOAP, COM, Kerberos, Directory
    Services,.
  • Commands are RPCs
  • management, security,.
  • Services file/web/db/ requests
  • Managed by general-purpose OS with good dev
    environment
  • Move apps to disk to save data movement
  • need programming environment in controller

25
The Slippery Slope
Nothing Sector Server
  • If you add function to server
  • Then you add more function to server
  • Function gravitates to data.

Something Fixed App Server
Everything App Server
26
Why Not a Sector Server?(lets get physical!)
  • Good idea, thats what we have today.
  • But
  • cache added for performance
  • Sector remap added for fault tolerance
  • error reporting and diagnostics added
  • SCSI commends (reserve,.. are growing)
  • Sharing problematic (space mgmt, security,)
  • Slipping down the slope to a 2-D block server

27
Why Not a 1-D Block Server?Put A LITTLE on the
Disk Server
  • Tried and true design
  • HSC - VAX cluster
  • EMC
  • IBM Sysplex (3980?)
  • But look inside
  • Has a cache
  • Has space management
  • Has error reporting management
  • Has RAID 0, 1, 2, 3, 4, 5, 10, 50,
  • Has locking
  • Has remote replication
  • Has an OS
  • Security is problematic
  • Low-level interface moves too many bytes

28
Why Not a 2-D Block Server?Put A LITTLE on the
Disk Server
  • Tried and true design
  • Cedar -gt NFS
  • file server, cache, space,..
  • Open file is many fewer msgs
  • Grows to have
  • Directories Naming
  • Authentication access control
  • RAID 0, 1, 2, 3, 4, 5, 10, 50,
  • Locking
  • Backup/restore/admin
  • Cooperative caching with client

29
Why Not a File Server?Put a Little on the 2-D
Block Server
  • Tried and true design
  • NetWare, Windows, Linux, NetApp, Cobalt,
    SNAP,...WebDav
  • Yes, but look at NetWare
  • File interface grew
  • Became an app server
  • Mail, DB, Web,.
  • Netware had a primitive OS
  • Hard to program, so optimized wrong thing

30
Why Not Everything?Allow Everything on Disk
Server(thin clients)
  • Tried and true design
  • Mainframes, Minis, ...
  • Web servers,
  • Encapsulates data
  • Minimizes data moves
  • Scaleable
  • It is where everyone ends up.
  • All the arguments against are short-term.

31
The Slippery Slope
Nothing Sector Server
  • If you add function to server
  • Then you add more function to server
  • Function gravitates to data.

Something Fixed App Server
Everything App Server
32
Disk Node
  • has magnetic storage (1TB?)
  • has processor DRAM
  • has SAN attachment
  • has execution environment

Applications
Services
DBMS
File System
RPC, ...
SAN driver
Disk driver
OS Kernel
33
Hardware
  • Homogenous machines leads to quick response
    through reallocation
  • HP desktop machines, 320MB RAM, 3u high, 4 100GB
    IDE Drives
  • 4k/TB (street), 2.5processors/TB, 1GB RAM/TB
  • 3 weeks from ordering to operational

Slide courtesy of Brewster Kahle, _at_ Archive.org
34
Disk as Tape
  • Tape is unreliable, specialized, slow, low
    density, not improving fast, and expensive
  • Using removable hard drives to replace tapes
    function has been successful
  • When a tape is needed, the drive is put in a
    machine and it is online. No need to copy from
    tape before it is used.
  • Portable, durable, fast, media cost raw tapes,
    dense. Unknown longevity suspected good.

Slide courtesy of Brewster Kahle, _at_ Archive.org
35
Disk As Tape What format?
  • Today I send NTFS/SQL disks.
  • But that is not a good format for Linux.
  • Solution Ship NFS/CIFS/ODBC servers (not disks)
  • Plug disk into LAN.
  • DHCP then file or DB server via standard
    interface.
  • Web Service in long term

36
Some Questions
  • Will the disk folks deliver?
  • What is the product?
  • How do I manage 1,000 nodes (disks)?
  • How do I program 1,000 nodes (disks)?
  • How does RAID work?
  • How do I backup a PB?
  • How do I restore a PB?

37
Will the disk folks deliver? Maybe!Hard Drive
Unit Shipments
Source DiskTrend/IDC
Not a pretty picture (lately)
38
Most Disks are Personal
  • 85 of disks are desktop/mobile (not SCSI)
  • Personal media is AT LEAST 50 of the problem.
  • How to manage your shoebox of
  • Documents
  • Voicemail
  • Photos
  • Music
  • Videos

39
What is the Product?(see next section on media
management)
  • Concept Plug it in and it works!
  • Music/Video/Photo appliance (home)
  • Game appliance
  • PC
  • File server appliance
  • Data archive/interchange appliance
  • Web appliance
  • Email appliance
  • Application appliance
  • Router appliance

network
power
40
Auto Manage Storage
  • 1980 rule of thumb
  • A DataAdmin per 10GB, SysAdmin per mips
  • 2000 rule of thumb
  • A DataAdmin per 5TB
  • SysAdmin per 100 clones (varies with app).
  • Problem
  • 5TB is 50k today, 5k in a few years.
  • Admin cost gtgt storage cost !!!!
  • Challenge
  • Automate ALL storage admin tasks

41
How do I manage 1,000 nodes?
  • You cant manage 1,000 x (for any x).
  • They manage themselves.
  • You manage exceptional exceptions.
  • Auto Manage
  • Plug Play hardware
  • Auto-load balance placement storage
    processing
  • Simple parallel programming model
  • Fault masking
  • Some positive signs
  • Few admins at Google 10k nodes 2 PB ,
    Yahoo! ? nodes, 0.3 PB, Hotmail 10k
    nodes, 0.3 PB

42
How do I program 1,000 nodes?
  • You cant program 1,000 x (for any x).
  • They program themselves.
  • You write embarrassingly parallel programs
  • Examples SQL, Web, Google, Inktomi, HotMail,.
  • PVM and MPI prove it must be automatic (unless
    you have a PhD)!
  • Auto Parallelism is ESSENTIAL

43
Plug Play Software
  • RPC is standardizing (SOAP/HTTP, COM,
    RMI/IIOP)
  • Gives huge TOOL LEVERAGE
  • Solves the hard problems
  • naming,
  • security,
  • directory service,
  • operations,...
  • Commoditized programming environments
  • FreeBSD, Linix, Solaris, tools
  • NetWare tools
  • WinCE, WinNT, tools
  • JavaOS tools
  • Apps gravitate to data.
  • General purpose OS on dedicated ctlr can run apps.

44
Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
  • At 1GBps it takes 12 days!
  • Store it in two (or more) places online (on
    disk?). A geo-plex
  • Scrub it continuously (look for errors)
  • On failure,
  • use other copy until failure repaired,
  • refresh lost copy from safe copy.
  • Can organize the two copies differently
    (e.g. one by time, one by space)

45
Disk vs Tape
  • Disk
  • 160 GB
  • 25 MBps
  • 5 ms seek time
  • 3 ms rotate latency
  • 2/GB for drive 1/GB for ctlrs/cabinet
  • 4 TB/rack
  • Tape
  • 100 GB
  • 10 MBps
  • 30 sec pick time
  • Many minute seek time
  • 5/GB for media10/GB for drivelibrary
  • 10 TB/rack

Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 20 drives
The price advantage of tape is narrowing, and
the performance advantage of disk is growing
46
Im a disk bigot
  • I hate tape, tape hates me.
  • Unreliable hardware
  • Unreliable software
  • Poor human factors
  • Terrible latency, bandwidth
  • Disk
  • Much easier to use
  • Much faster
  • Cheaper!
  • But needs new concepts

47
Disk as Tape Challenges
  • Offline disk (safe from virus)
  • Trivialize Backup/Restore software
  • Things never change
  • Just object versions
  • Snapshot for continuous change (databases)
  • RAID in a SAN
  • (cross-disk journaling)
  • Massive replication (a la Farsite)

48
Summary
  • Disks will become supercomputers
  • Compete in Linux appliance space
  • Build best NAS software (compete with NetApp, ..)
  • Auto-manage huge storage farms FarSite, SQL
    autoAdmin,
  • Build worlds best disk-based backup system
    Including Geoplex (compete with Veritas,..)
  • Push faster on 64-bit

49
Storage capacity beating Moores law
  • 2 k/TB today (raw disk)
  • 1k/TB by end of 2002

50
Trends Magnetic Storage Densities
  • Amazing progress
  • Ratios have changed
  • Capacity grows 60/y
  • Access speed grows 10x more slowly

51
Trends Density Limits
Density vs Time b/µm2 Gb/in2
Bit Density
  • The end is near!
  • Products23 GbpsiLab 50 Gbpsilimit
    60 Gbpsi
  • Butlimit keeps rising there are alternatives

b/µm2 Gb/in2
? NEMS, Florescent? Holographic, DNA?
3,000 2,000
1,000 600
300 200
SuperParmagnetic Limit
100 60
30 20
Wavelength Limit
ODD
10 6
DVD
3 2
CD

1 0.6
Figure adapted from Franco Vitaliano, The NEW
new media the growing attraction of nonmagnetic
storage, Data Storage, Feb 2000, pp 21-32,
www.datastorage.com
1990 1992 1994 1996 1998 2000 2002 2004
2006 2008
52
CyberBricks
  • Disks are becoming supercomputers.
  • Each disk will be a file server then SOAP server
  • Multi-disk bricks are transitional
  • Long-term brick will have OS per disk.
  • Systems will be built from bricks.
  • There will also be
  • Network Bricks
  • Display Bricks
  • Camera Bricks
  • .

53
Data Centric Computing
Yotta Zetta Exa Peta Tera Giga Mega Kilo
  • Jim Gray
  • Microsoft Research
  • Research.Microsoft.com/Gray/talks
  • FAST 2002
  • Monterey, CA, 14 Oct 1999

54
Communications Excitement!!
Point-to-Point
Broadcast
lecture concert
conversation money
Net Work DB
Immediate
book newspaper
mail
Time Shifted
Data Base
Its ALL going electronic Information is being
stored for analysis (so ALL database) Analysis
Automatic Processing are being added
Slide borrowed from Craig Mundie
55
Information Excitement!
  • But comm just carries information
  • Real value added is
  • information capture render speech, vision,
    graphics, animation,
  • Information storage retrieval,
  • Information analysis

56
Information At Your Fingertips
  • All information will be in an online database
    (somewhere)
  • You might record everything you
  • read 10MB/day, 400 GB/lifetime (5 disks today)
  • hear 400MB/day, 16 TB/lifetime (2 disks/year
    today)
  • see 1MB/s, 40GB/day, 1.6 PB/lifetime (150
    disks/year maybe someday)
  • Data storage, organization, and analysis is
    challenge.
  • text, speech, sound, vision, graphics, spatial,
    time
  • Information at Your Fingertips
  • Make it easy to capture
  • Make it easy to store organize analyze
  • Make it easy to present access

57
How much information is there?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
  • Soon everything can be recorded and indexed
  • Most bytes will never be seen by humans.
  • Data summarization, trend detection anomaly
    detection are key technologies
  • See Mike Lesk How much information is there
    http//www.lesk.com/mlesk/ksg97/ksg.html
  • See Lyman Varian
  • How much information
  • http//www.sims.berkeley.edu/research/projects/how
    -much-info/

Everything! Recorded
All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
58
Why Put Everything in Cyberspace?
Low rent min /byte Shrinks time now or
later Shrinks space here or there Automate
processing knowbots
Point-to-Point OR Broadcast
Immediate OR Time Delayed
Locate Process Analyze Summarize
59
Disk Storage Cheaper than Paper
  • File Cabinet cabinet (4 drawer) 250 paper
    (24,000 sheets) 250 space (2x3 _at_
    10/ft2) 180 total 700 3 /sheet
  • Disk disk (160 GB ) 300 ASCII
    100 m pages 0.0001 /sheet (10,000x
    cheaper)
  • Image 1 m photos 0.03
    /sheet (100x cheaper)
  • Store everything on disk

60
Gordon Bells MainBrainDigitize EverythingA
BIG shoebox?
  • Scans 20 k pages tiff_at_ 300 dpi 1 GB
  • Music 2 k tacks 7 GB
  • Photos 13 k images 2 GB
  • Video 10 hrs 3 GB
  • Docs 3 k (ppt, word,..) 2 GB
  • Mail 50 k messages 1 GB
  • 16 GB

61
Gary Starkweather
  • Scan EVERYTHING
  • 400 dpi TIFF
  • 70k pages 14GB
  • OCR all scans (98 recognition ocr accuracy)
  • All indexed (5 second access to anything)
  • All on his laptop.

62
  • Q What happens when the personal terabyte
    arrives?
  • A Things will run SLOWLY. unless we add good
    software

63
Summary
  • Disks will morph to appliances
  • Main barriers to this happening
  • Lack of Cool Apps
  • Cost of Information management

64
The Absurd Disk
  • 2.5 hr scan time (poor sequential access)
  • 1 aps / 5 GB (VERY cold data)
  • Its a tape!

1 TB
100 MB/s
200 Kaps
65
Crazy Disk Ideas
  • Disk Farm on a card surface mount disks
  • Disk (magnetic store) on a chip (micro machines
    in Silicon)
  • Full Apps (e.g. SAP, Exchange/Notes,..) in the
    disk controller (a processor with 128 MB dram)

ASIC
The Innovator's Dilemma When New Technologies
Cause Great Firms to FailClayton M.
Christensen.ISBN 0875845851
66
The Disk Farm On a Card
  • The 500GB disc card
  • An array of discs
  • Can be used as
  • 100 discs
  • 1 striped disc
  • 50 Fault Tolerant discs
  • ....etc
  • LOTS of accesses/second
  • bandwidth

14"
67
Trends promises NEMS (Nano Electro Mechanical
Systems)(http//www.nanochip.com/) also
Cornell, IBM, CMU,
  • 250 Gbpsi by using tunneling electronic
    microscope
  • Disk replacement
  • Capacity 180 GB now, 1.4 TB in 2 years
  • Transfer rate 100 MB/sec RW
  • Latency 0.5msec
  • Power 23W active, .05W Standby
  • 10k/TB now, 2k/TB in 2004

68
Trends Gilders Law 3x bandwidth/year for 25
more years
  • Today
  • 40 Gbps per channel (?)
  • 12 channels per fiber (wdm) 500 Gbps
  • 32 fibers/bundle 16 Tbps/bundle
  • In lab 3 Tbps/fiber (400 x WDM)
  • In theory 25 Tbps per fiber
  • 1 Tbps USA 1996 WAN bisection bandwidth
  • Aggregate bandwidth doubles every 8 months!

1 fiber 25 Tbps
69
Technology Drivers What if Networking Was as
Cheap As Disk IO?
  • TCP/IP
  • Unix/NT 100 cpu _at_ 40MBps
  • Disk
  • Unix/NT 8 cpu _at_ 40MBps

70
SAN Standard Interconnect
Gbps Ethernet 110 MBps
  • LAN faster than memory bus?
  • 1 GBps links in lab.
  • 100 port cost soon
  • Port is computer

PCI 70 MBps
UW Scsi 40 MBps
FW scsi 20 MBps
scsi 5 MBps
71
Building a Petabyte Store
  • EMC 500k/TB 500M/PB plus FC switches
    plus 800M/PB
  • TPC-C SANs (Dell 18GB/) 62 M/PB
  • Dell local SCSI, 3ware 20M/PB
  • Do it yourself 5M/PB

72
The Cost of Storage(heading for 1K/TB soon)
73
Cheap Storage or Balanced System
  • Low cost storage (2 x 1.5k servers) 6K TB2x
    (1K system 8x80GB disks 100MbEthernet)
  • Balanced server (7k/.5 TB)
  • 2x800Mhz (2k)
  • 256 MB (400)
  • 8 x 80 GB drives (2K)
  • Gbps Ethernet switch (1k)
  • 11k TB, 22K/RAIDED TB

74
320 GB, 2k (now)
  • 4x80 GB IDE(2 hot plugable)
  • (1,000)
  • SCSI-IDE bridge
  • 200k
  • Box
  • 500 Mhz cpu
  • 256 MB SRAM
  • Fan, power, Enet
  • 700
  • Or 8 disks/box640 GB for 3K ( or 300 GB RAID)

75
(No Transcript)
76
Hot Swap Drives for Archive or Data Interchange
  • 25 MBps write(so can write N x 160 GB in 3
    hours)
  • 160 GB/overnite
  • N x 4 MB/second
  • _at_ 19.95/nite

77
Data delivery costs 1/GB today
  • Rent for big customers 300/megabit per
    second per month
  • Improved 3x in last 6 years (!).
  • That translates to 1/GB at each end.
  • You can mail a 160 GB disk for 20.
  • Thats 16x cheaper
  • If overnight its 3 MBps.

3x160 GB ½ TB
78
Data on Disk Can Move to RAM in 8 years
301
6 years
79
Storage Latency How Far Away is the Data?
Andromeda
9
Tape /Optical
10
2,000 Years
Robot
6
Pluto
Disk
2 Years
10
1.5 hr
Springfield
Memory
100
This Campus
10
10 min
On Board Cache
On Chip Cache
2
This Room
Registers
1
My Head
1 min
80
More Kaps and Kaps/ but.
  • Disk accesses got much less expensive Better
    disks Cheaper disks!
  • But disk arms are expensivethe scarce resource
  • 1 hour Scanvs 5 minutes in 1990

81
Backup 3 scenarios
  • Disaster Recovery Preservation through
    Replication
  • Hardware Faults different solutions for
    different situations
  • Clusters,
  • load balancing,
  • replication,
  • tolerate machine/disk outages
  • (Avoided RAID and expensive, low volume
    solutions)
  • Programmer Error versioned duplicates (no
    deletes)

82
Online Data
  • Can build 1PB of NAS disk for 5M today
  • Can SCAN (read or write) entire PB in 3 hours.
  • Operate it as a data pump continuous sequential
    scan
  • Can deliver 1PB for 1M over Internet
  • Access charge is 300/Mbps bulk rate
  • Need to Geoplex data (store it in two places).
  • Need to filter/process data near the source,
  • To minimize network costs.
Write a Comment
User Comments (0)
About PowerShow.com