Title: A Database View of Intelligent Disks
1A Database View ofIntelligent Disks
- James Hamilton
- JamesRH_at_microsoft.com
- Microsoft SQL Server
2Overview
- Do we agree on the assumptions?
- A database perspective
- Scalable DB clusters are inevitable
- Affordable SANs are (finally) here
- Admin Mgmt costs dominate
- Intelligent disks are coming
- DB exploitation of intelligent disk
- Failed DB machines all over again?
- Use of intelligent disk NASD model?
- full server slice vs. a file block server
- Conclusions
3Do we agree on the assumptions?
- From the NASD web page
- Computer storage systems are built from sets of
disks, tapes, disk arrays, tape robots, and so
on, connected to one or more host computers. We
are moving towards a world where the storage
devices are no longer directly connected to their
host systems, but instead talk to them through an
intermediate high-performance, scalable network
such as FibreChannel. To fully benefit from this,
we believe that the storage devices will have to
be come smarter and more sophisticated. - Premise conclusion is 100 correct well
question the assumptions that led to it - There are alternative architectures with strong
advantages for DB workloads
4Clusters Are Inevitable Query
- Data intensive application workloads (data
warehousing, data mining, complex query) are
growing quickly - Gregs law DB capacity growing 2X every 9-12
months (Patterson) - DB capacity requirements growing super-Moore
- Complex query workloads tend to scale with DB
size - many CPUs required
- Shared memory is fine as long as you dont share
it (Helland) - clusters only DB architecture with sufficient
scale - We only debate at what point, not if, clusters
are required
5Clusters Are Inevitable TP Apps
- Most database transaction workloads currently
hosted on SMPs - Prior to the web, TP workloads tended to be
reasonably predictable - TP workloads scale with customer base/business
size - Load changes at speed of business change
(typically slow) - Web puts back office in the front office
- Much of the world has direct access - very
volatile - Capacity planning goes from black art to
impossible - Server capacity variation is getting much wider
- Need near infinite, incremental growth capability
with potential to later de-scale - Wheel-in/wheel out upgrade model doesnt work
- clusters are only DB architecture with sufficient
incremental growth capability
6Clusters Are Inevitable Availability
- Non-cluster server architectures suffer from many
single points of failure - Web enabled direct server access model driving
high availability requirements - recent high profile failures at eTrade and
Charles Schwab - Web model enabling competition in access to
information - Drives much faster server side software
innovation which negative impacts quality - Dark machine room approach requires auto-admin
and data redundancy (Inktomi model) - 42 of system failures admin error (Gray)
- Paging admin at 2am hoping for quality response
is dangerous - Fail fast design approach is robust but only
acceptable with redundant access to redundant
copies of data - Cluster Architecture is required for availability
7Shared Nothing Clusters Are Inevitable
- Data-intensive application capacity growth
requirement is seriously super-Moore - Increasing proportion of apps are becoming data
intensive - E.g. High end web sites typically DB backed
- Transaction workloads now change very rapidly and
unpredictably - High availability increasingly important
- Conclusion cluster database architecture is
required - supported by Oracle, IBM, Informix, Tandem,
- Why dont clusters dominate today?
- High inter-server communications costs
- Admin management costs out of control
8Affordable SANs Are (Finally) Here
- TCP/IP send/receive costs on many O/Ss in 15k
instr range - some more than 30K
- Communications costs makes many cluster database
application model impractical - Bandwidth important, but prime issues CPU
consumption and, to lesser extent, latency - A system area network (SAN) is used to connect
clustered servers together - typically high bandwidth
- Send/receive without O/S Kernel transition (50 to
100 instructions common) - Round trip latency in 15 microsecond range
- SANs not new (e.g. Tandem)
- Commodity-priced parts are new (Myrinet, Giganet,
Severnet, etc.) and available today - www.viarch.org
9Admin Mgmt Costs Dominate
- Bank of America You keep explaining to me how I
can solve your problems - Admin costs single largest driver of IT costs
- Admitting we have a problem is first step to a
cure - Most commercial DBs now focusing on admin costs
- SQL Server
- Enterprise manager (MMC framework--same as O/S)
- Integrated security with O/S
- Index tuning wizard (Surajit Chaudhuri)
- Auto-statistics creation
- Auto-file grow/shrink
- Auto memory resource allocation
- Install and run model is near
- Trades processor resources for admin costs
10Intelligent Disk are Coming
- Fatalism theyre building them so we might as
well figure out how to exploit (Patterson trying
to get us DB guys to catch on) - Reality disk manufacturers work with very thin
margins and will continue to try to add value to
their devices (Gibson) - Many existing devices already (under-) exploiting
commodity procs (e.g. 68020) - Counter argument Prefer general purpose
processor for DB workloads - Dynamic workload requirements computing joins,
aggregations, applying filters, etc. - What if it was both a general purpose proc and
embedded on disk controller?
11DB Exploitation of Intelligent Disk
- Each disk includes network, CPU, memory and drive
subsystem - All on disk packageit already had power, chassis
and PCB - scales as a unit in small increments
- Runs full std O/S (e.g. Linux, NT, )
- Each is a node in single image, shared nothing
database cluster - Continues long standing DB trend of moving
function to the data - Stored procedures
- Joins done at server
- Internally as well SARGable predicates run in
storage engine
12DB Exploitation of Intelligent Disk
- Client systems are sold complete
- Include O/S, relevant device drivers, office
productivity apps, - Server systems require weeks to months of
capacity planning, training, installing,
configuring, and testing before going live - Lets make the client model work for servers
- Purchase a system (frame 2 disk, cpu, memory,
and network units) - Purchase server-slices as required when required
- Move to a design point where H/W is close to free
and admin costs dominate design decisions - High hardware volume still drives significant
revenue
13DB Exploitation of Intelligent Disk
- Each slice contains S/W for file, DB, www, mail,
directory, no install - Adding capacity is plugging in a few more slices
and choosing personality to extend - Due to large number of components in system
reliability an issue - Nothing fails fast just eventually performs
poorly enough to be fired typically devices
dont just quit (Patterson) - Introspection is key dedicate some resources to
tracking intermittent errors and predicting
failure - Take action prior to failure RAID-like model
where disks fail but system keeps running - Add slices when capacity increase or accumulating
failures require it
14Failed DB machines all over again?
- Numerous past projects both commercial research
- Britton Lee probably best remembered
- Solutions looking for a problem (Stonebraker)
- What went wrong?
- Special purpose hardware with low volume
- High, difficult to amortize engineering costs
- Fell off general purpose system technology curve
- Database sizes were smaller and most server
systems were not single function machines - Non-standard models for admin, management,
security, programming, etc.
15How about H/W DB accelerators?
- Many efforts to produce DB accelerators
- E.g. ICL CAFS
- I saw at least one of these proposals a year
while I was working on DB2 - Why not?
- The additional H/W only addresses a tiny portion
of total DB function - Device driver support required
- Substantial database engineering investment
required to exploit - Device must have intimate knowledge of database
physical row format in addition to logical
properties like international sort orders
(bug-for-bug semantic match) - Low volume so devices quickly fall off commodity
technology curve - ICL CAFS supported single proc general
commodity SMPs made irrelevant
16Use of intelligent disk NASD?
- NASD has architectural advantages when data can
be sent from block server directly to client - Many app-models require significant server side
processing preventing direct transfer (e.g. all
database processing) - Could treat the intermediate server as a NASD
client - Gives up advantages of not transferring data
through intermediate server - Each set of disk resources requires additional
network, memory, and CPU resources - Why not add together as self contained locally
attached unit? - Rather than directly transfer from the disk to
the client, move intermediate processing to data
(continuation of long database tradition)
17Use of Intelligent disk NASD Model?
- Making disk unit full server-slice allows use of
existing - Commodity operating system
- device drivers framework and drivers
- file system (API and on-disk format)
- No client changes required
- Object naming and directory lookup
- Leverage on-going DB engineering investment
- LOB apps (SAP, Peoplesoft )
- security, admin, and mgmt infrastructure
- Customer training and experience
- Program development environment investment
- if delivered as peer nodes in a cluster, no mass
infrastructure re-write required prior to
intelligent disk adoption
18Use of Intelligent disk NASD Eng. Costs
- New device driver model hard to sell
- OS/2 never fully got driver support
- NT still has less support than Win95/98
- Typical UNIX systems support far fewer devices
- Getting new file system adoption difficult
- HPFS on OS/2 never got heavy use
- After a decade NTFS now getting server use
- Will O/S and file system vendors want new server
side infrastructure - What is upside for them?
- If written and evangelized by others, will it be
adopted without system vendor support? - Intelligent disk is the right answer, question is
what architecture exploits them best and promotes
fastest adoption
19Conclusions
- Intelligent disk will happen
- An opportunity for all of us to substantially
improve server-side infrastructure - NASD could happen but alternative architectures
also based upon intelligent disk appear to - Require less infrastructure re-work
- Offer more benefit to non-file app models (e.g.
DB) - Intelligent disk could form generalized, scalable
server side component - CPU, network, memory, and disk
- Emulate client-side sales and distribution model
all software and hardware included in package - Client side usage model use until fails and then
discard
20A Database View ofIntelligent Disks
- James Hamilton
- JamesRH_at_microsoft.com
- Microsoft SQL Server