Title: Automated administration for storage system
1Automated administration for storage system
- Presentation by Amitayu Das
2Introduction
- Major challenges in storage management
- System design and configuration (device
management) - Capacity Planning (space management)
- Performance tuning (performance management)
- High Availability (availability management)
- Automation (all of the above, in a self-managing
manner)
3Motivation
- Large disk arrays and networked storage lead to
huge storage capacities and high bandwidth access
to facilitate consolidated storage systems. - Enterprise-scale storage systems contain hundreds
of host computers and storage devices and up to
tens of thousands of disks. - Designing, deploying and runtime management of
such systems lead to huge cost (often higher than
procuring cost) - Look at the problems in greater details
4Storage System life-cycle
5Storage administration functions
- Data protection
- Performance tuning
- Planning and deployment
- Monitoring and record-keeping
- Diagnosis and repair
6Few notable attempts
- System-managed storage (IBM)
- Attribute-managed storage (HP)
- Replication
- RAID
- Online snapshot support
- Remote replication
- Online archival
- Interposed request routing
- Smart file-system switches
7Designing problem
- Given a pool of resources and workload, determine
appropriate choice of devices, configure them and
assign the workload to the configured storage. - Solution is not straight-forward because,
- Huge size of system and thousands of design
choices and many choices have unforeseen
circumstances. - Personnel with detailed knowledge of
applications storage behavior are in short
supply and hence, are quite expensive. - Design process is tedious and complicated to do
by hand, usually leading to solutions that are
grossly over-provisioned, substantially
under-performing or, in the worst case, both. - Once a design is in place, implementing it is
time-consuming, tedious and error-prone. - A mistake in any of these steps is difficult to
identify and can result in a failure to meet the
performance requirements.
8Storage System life-cycle design/configuration
9System design and assignment problem
Application
Application
Application
Application
Workload requirements
Workload
Storage
Assignment engine
Storage System
System configuration
Storage device abilities
10Initial system design
- Problem convert workloads, business needs and
device characteristics into assignment of stores
and streams to devices - One approach constraint-based multi-dimensional
bin-packing - Sample constraints of device 1
- - Sum of store sizes ? capacity
- - Sum of stream utilizations ? 1.0
- Sample objective functions
- - Minimize cost
- - Balance load
Req. size
Capacity
I/O rate
How many drives? Holding which data?
11Initial system design gt disk arrays
- Problem
- extending the single disk solution to disk arrays
- The space of array designs is potentially huge
- LUN sizes and RAID levels, stripe unit sizes,
disks in LUNs - More work needed before the solver can run
12Minerva Control flow. The array designer is
called as a subroutine by allocator.
Minervas role in storage system life cycle.
Input and output are shown.
13Minerva running a sample workload
14Merits/demerits
- Merits
- Reasonable automation
- Demerits
- Requires accurate models of workloads,
performance requirements, and devices - Address only the mechanisms, not the policy
15Storage System life-cycle redesign/reconfigure
16System redesign/reconfiguration
- hardware/software upgraded
Reconfigured System
Running System
Events triggering redesign/reconfiguration
17Iterative storage management loop
Design new system
Implement design
Analyze workload
Events triggering reconfiguration
18Hippodrome
- Two objectives
- The automated loop must converge on a viable
design that meets the workloads requirements
without over- or under-provisioning. - It must converge to a stable final system as
quickly as possible, with as little as input from
its users.
19Components of Hippodrome
- Analysis component (1)
- Performance model component (2)
- Solver components (3)
- Migration component (4)
candidate design
2
utilzn (dsgn)
4
workload
1
summary
dsgn
finalized design
3
20Issues in system design and allocation
- What optimization algorithms are most effective?
- What optimization objectives and constraints
produce reasonable designs? - ex cost of reconfiguring system
- What's the right part of the storage design space
to explore? - ex RAID level vs. stripe unit size vs. cache
management parameters - What are reasonable general guidelines for
tagging a store's RAID level? - What (other) decompositions of the design and
allocation problem are reasonable? - How to generalize system design?
- for SAN environment
- for host and applications
21Issues in reconfiguration
- How to do system discovery?
- e.g., existing state, presence of new devices
- Dealing with inconsistent information
- In a scalable fashion
- How to abstractly describe storage devices?
- For system discovery output
- For input to tools that perform changes
- How to automate the physical redesign process?
- e.g., physical space allocation etc.
- Events trigger redesign decision
- How do we decide when to reconfigure?
- Reconfiguration inputs
- current system configuration/assignment
- desired system configuration/assignment
22Self- storage architecture
23Administration and organization
- Administrative interface
- Supervisors
- Administrative assistants
- Data access and storage
- Routers
- Workers
24Merits
- Simpler storage administration
- Data protection
- Performance tuning
- Planning and deployment
- Monitoring and record-keeping
- Diagnosis and repair
25Demerits
- The proposed solution is too simplistic to handle
the issues raised. - Authors have provided solution from a high-level
viewpoint, but the solution is not complete in
any sense. - The implementation and evaluation is not
convincing enough. - All the aspects of self- has not been
addressed as claimed.
26Storage System life-cycle virtualization
(Dynamic) business requirements
Configure/ reconfigure
Design/ redesign
Monitor
Performance tuning
27Runtime management problem
- Often, enterprise customers outsource their
storage needs to data centers. - At data centers, different workload /application
/services share the underlying storage
infrastructure. - Sharing (of disk drives, storage caches, network
links, controllers etc.) can lead to interference
between the users/applications leading to
possible violations in performance-based QoS
guarantees. - To prevent that, data centers needs to insulate
the users from each other virtualization.
28Need for virtualization
- At data centers, many different enterprise
servers that support different business
processes, such as, Web servers, file servers,
database serves may have very different
performance requirements on their backend storage
server. - Sophisticated resource allocation and scheduling
technology is required to effectively isolate
these logical storage servers as if they are
separate physical storage servers. - Storage Virtualization refers to the technology
that allows creation of a set of logical storage
devices from a single physical storage structure.
29Storage virtualization
Storage management
Application
Clients
Abstract Interface
Virtual Disks
Storage Virtualization
Operating System
Hardware resources
Disks, Controllers
Physical Disks
- Examples LVM, xFS, StorageTank
- Hides Physical details from high-level
applications
30Dimensions of virtualization
- Commercial storage virtualization systems are
rather limited because they can virtualize
storage capacity. - However, from the standpoint of storage clients
or enterprise servers, the virtual storage
devices are desired to be as tangible as physical
disks. - Need to virtualize efficiently any standard
attribute associated with a physical disk, such
as capacity, bandwidth, latency, availability
etc.
31Hardware Organization
client
client
Object interface
File interface
Object interface
Storage manager
Data/cmds
Control mesg
Gigabit network
32A 2-level CVC Scheduler
Storage Server
Storage Server
4
1
Storage Manager
Client
5
2
7
Storage Server
3
6
33References
- Hippodrome running circles around storage
administration. Eric Anderson et. al., FAST 02,
pp. 175-188, January 2002. - Minerva an automated resource provisioning tool
for large-scale storage systems. G. Alveraz et.
al., ACM Transactions on Computer Systems 19 (4)
483-518, November 2001 - Ergastulum quickly finding near-optimal storage
system designs. Eric Anderson et. al., Technical
Report from HP Laboratories. - Disk Array Models in Minerva. Arif Merchant et.
al., Technical Report, HP Laboratories. - Self- Storage Brick-based Storage with
Automated Administration. G. Ganger et. al.,
Technical report,2003
34References
- SIGMETRICS 00 Tutorial, HP Laboratories.
- Optimization algorithms
- Bin-packing Heuristics Coffman84
- Toyoda Gradient Toyoda75
- Simulated Annealing Drexl88
- Relaxation Approaches Pattipati90, Trick92
- Genetic Algorithms Chu97
- Multidimensional Storage Virtualization. Lan
Huang et. al., SIGMETRICS 04, New York, June
2004. - An Interposed 2-Level I/O Scheduling Framework
for Performance Virtualization. J. Zhang et. al.,
SIGMETRICS 05 - Efficiency-aware disk scheduler
- - Cello, Prism, YFQ
35THANK YOU !!!