Title: Store Everything Online In A Database
1Store EverythingOnlineIn A Database
- Jim Gray
- Microsoft Research
- Gray_at_Microsoft.com
- http//research.microsoft.com/gray/talks
2Outline
- Store Everything
- Online (Disk not Tape)
- In a Database
3How Much is Everything?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
Everything! Recorded
- Soon everything can be recorded and indexed
- Most bytes will never be seen by humans.
- Data summarization, trend detection anomaly
detection are key technologies - See Mike Lesk How much information is there
http//www.lesk.com/mlesk/ksg97/ksg.html - See Lyman Varian
- How much information
- http//www.sims.berkeley.edu/research/projects/how
-much-info/
All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
4Storage capacity beating Moores law
- 3 k/TB today (raw disk)
- 1k/TB by end of 2002
-
5Outline
- Store Everything
- Online (Disk not Tape)
- In a Database
6Online Data
- Can build 1PB of NAS disk for 5M today
- Can SCAN (read or write) entire PB in 3 hours.
- Operate it as a data pump continuous sequential
scan - Can deliver 1PB for 1M over Internet
- Access charge is 300/Mbps bulk rate
- Need to Geoplex data (store it in two places).
- Need to filter/process data near the source,
- To minimize network costs.
7The Absurd Disk
- 2.5 hr scan time (poor sequential access)
- 1 access per second / 5 GB (VERY cold data)
- Its a tape!
1 TB
100 MB/s
200 Kaps
8Disk vs Tape
- Tape
- 40 GB
- 10 MBps
- 10 sec pick time
- 30-120 second seek time
- 2/GB for media8/GB for drivelibrary
- 10 TB/rack
- 1 week scan
- Disk
- 80 GB
- 35 MBps
- 5 ms seek time
- 3 ms rotate latency
- 3/GB for drive 2/GB for ctlrs/cabinet
- 15 TB/rack
- 1 hour scan
Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 12 drives
The price advantage of disk is growing the
performance advantage of disk is huge! At
10K/TB, disk is competitive with nearline tape.
9Building a Petabyte Disk Store
- Cadillac 500k/TB 500M/PB plus FC
switches plus 800M/PB - TPC-C SANs (Brand PC 18GB/) 60 M/PB
- Brand PC local SCSI 20M/PB
- Do it yourself ATA
5M/PB
10Cheap Storage and/or Balanced System
- Low cost storage (2 x 3k servers) 5K TB2x (
800 Mhz, 256Mb 8x80GB disks 100MbE)raid5
costs 6K/TB - Balanced server (5k/.64 TB)
- 2x800Mhz (2k)
- 512 MB
- 8 x 80 GB drives (2K)
- Gbps Ethernet switch (300/port)
- 9k/TB 18K/mirrored TB
11Next step in the Evolution
- Disks become supercomputers
- Controller will have 1bips, 1 GB ram, 1 GBps net
- And a disk arm.
- Disks will run full-blown app/web/db/os stack
- Distributed computing
- Processors migrate to transducers.
12Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
- At 1GBps it takes 12 days!
- Store it in two (or more) places online (on
disk?). A geo-plex - Scrub it continuously (look for errors)
- On failure,
- use other copy until failure repaired,
- refresh lost copy from safe copy.
- Can organize the two copies differently
(e.g. one by time, one by space)
13Outline
- Store Everything
- Online (Disk not Tape)
- In a Database
14Why Not file object GREP?
- It works if you have thousands of objects (and
you know them all) - But hard to search millions/billions/trillions
with GREP - Hard to put all attributes in file name.
- Minimal metadata
- Hard to do chunking right.
- Hard to pivot on space/time/version/attributes.
15The Reality its build vs buy
- If you use a file system you will eventually
build a database system - metadata,
- Query,
- parallel ops,
- security,.
- reorganize,
- recovery,
- distributed,
- replication,
16OK so Ill put lots of objects in a fileDo It
Yourself Database
- Good news
- Your implementation will be 10x faster than the
general purpose one easier to understand and
use than the general purpose on. - Bad news
- It will cost 10x more to build and maintain
- Someday you will get bored maintaining/evolving
it - It will lack some killer features
- Parallel search
- Self-describing via metadata
- SQL, XML,
- Replication
- Online update reorganization
- Chunking is problematic (what granularity, how to
aggregate)
17Top 10 reasons to put Everything in a DB
- Someone else writes the million lines of code
- Captures data and Metadata,
- Standard interfaces give tools and quick learning
- Allows Schema Evolution without breaking old apps
- Index and Pivot on multiple attributes
space-time-attribute-version. - Parallel terabyte searches in seconds or minutes
- Moves processing search close to the disk
arm (moves fewer bytes (qestons return datons). - Chunking is easier (can aggregate chunks at
server). - Automatic geo-replication
- Online update and reorganization.
- Security
- If you pick the right vendor, ten years from now,
there will be software that can read the data.
18DB Centric Examples
- TerraServer
- All images and all data in the database (chunked
as small tiles).www.TerraServer.Microsoft.com/ - http//research.microsoft.com/gray/Papers/MSR_TR_
99_29_TerraServer.doc - SkyServer Virtual Sky
- Both image and semantic data in a relational
store. - Parallel search NonProcedural access are
important. - http//research.microsoft.com/gray/Papers/MS_TR_9
9_30_Sloan_Digital_Sky_Survey.doc - http//dart.pha.jhu.edu/sdss/getMosaic.asp?Z1A1
T4H1S10M30 - http//virtualsky.org/servlet/Page?F3RA16h10m
1.0sDE2B0d42m45sT4P12S10X5096Y4121
W4Z-1tile.2.1.x55tile.2.1.y20
19OK Why dont they use our stuff?
- Wrong metaphor HDF with hyper-slab is better
match. - Impedence match getting stuff in/out of DB is
too hard - We sold them OODBs and they did not work
(unreliable, poor performance, no tools).
20So, why will the future be different?
- They have MUCH more data (108 files?)
- Java / C eases impedance mismatch rowsets
ragged arrays. - Tools are better
- Optimizers are better
- CPU and disk parallelism actually works now
- Statistical packages are better.
21Outline
- Store Everything
- Online (Disk not Tape)
- In a Database
22But The title of the talk was
- The Future of Distributed Database Systems
Nobody wants to share his database. blocks,
files, tables are wrong abstraction for
networks. (too low level) Objects are the right
abstraction So, UDDI / WSDL / SOAP is the
solution (not SQL) XML is the wire format, XLANG
is the workflow protocol, Query will be in there
somewhere.
23 DDB technology GREAT in a Cluster
- Uniform architecture
- Trust among nodes
- High bandwidth-low latency communication
- Programs have single system image
- Queries run in parallel
- Global optimizer does query decomposition
24But in a Distributed System
- Heterogenous architecture makes query planning
much harder - No trust
- Communication is slow and expensive (minimize
it). - ? Higher level abstraction to minimize round trips
25DDB the Trust Issue
- Customers serve themselves
- Follow the rules posted on the door
- No Overhead, no staff!
- Clerks serve Customers
- Take order, fill order, fill out invoice, collect
money. - Overhead staff, training, rules,
- Customers serve themselves
- Follow the rules posted on the dorr
Client/Server Groceries
DDB Grocery