Title: Xin Chen
1MetaData Management in Distributed File Systems
- Xin Chen
- (xchen21_at_tntech.edu)
- Nov.27 2007
- Department of Electrical and Computer Engineering
- Tennessee Technological University
2Outline
- Introduction
- Metadata Management in Distributed File Systems
- Case Study Dynamic Metadata Management for
Petabyte-scale File Systems - Conclusions
3What is Metadata?
- Two common definitions of metadata are
- Data about data
- Data needed to make other data useful
4How to Find Books in the Library?
5Types of Metadata
- File System Metadata
- Timestamps, file attributes, and some
special-purpose information. - Image Metadata
- Subjects, related emotions, and other descriptive
phrases - Program Metadata
- Company that published the program, the date the
program was created, the version number. - Digital library metadata
- Descriptive - Information describing the
intellectual content of the object - Structural - Information that ties each object to
others to make up logical units - Administrative - Information used to manage the
object or control access to it.
6Outline
- Introduction
- Metadata Management in Distributed File Systems
- Case Study Dynamic Metadata Management for
Petabyte-scale File Systems - Conclusions
7Metadata Management in Distributed File Systems
- In short, Metadata Management is the act of
imposing - management discipline on the collection and
control of - Metadata 2
- Create/delete metadata
- Manage file system namespace (directory
structure) - Control inconsistencies and redundancies of
metadata
8Importance of Metadata Management
- Good metadata performance is critical to overall
system performance. - Efficient metadata management leads to efficient
use of storage resources. - Efficient metadata management leads to long term
scalability of the system.
9A Typical Distributed File System Architecture
10Issues of Metadata Management in Distributed File
Systems
- Consistency
- Workload Partitioning
- Traffic Management
- Metadata Storage
11Consistency
- Updates must be consistent within MDSs clients
- should have consistent view of file system.
Distributed Locking vs. Centralized Management
12Workload Partition
- Balance MDS loads, maximize efficiency and
throughput
13Traffic Management
- Effectively adapt to a changing workload.
- Ideal case
- Unpopular items clients directly contact the
corresponding MDS. - Popular items those items are replicated across
several nodes.
14Metadata Storage
- Fast commits and efficient reads
- Ideally, the MDS memory is
- able to satisfy most reads.
15Outline
- Introduction
- Metadata Management in Distributed File Systems
- Case Study Dynamic Metadata Management for
Petabyte-scale File Systems - Conclusions
16- Dynamic Metadata Management for Petabyte-scale
File Systems - Sage Weil, Kristal T. Pollack, Scott A. Brandt
and Ethan L. Miller - University of California, Santa Cruz
- SC '04 Proceedings of the 2004 ACM/IEEE
conference on - Supercomputing, 2004
17Overview
- Introduction
- Background
- Dynamical Metadata Management
- Evaluation
18Motivations
- In petabyte-scale distributed file system, the
behavior - of the metadata server is critical to overall
system - performance and scalability.
- More than 50 of all file system operations are
metadata operations, making the performance of
the MDS cluster of critical importance. - Metadata exhibit a higher degree of
interdependence, making the design of a scalable
system much more challenging.
19Issues
Goal MDSs should be able to continually adapt to
current demands to maintain high system
performance and long-term scalability.
20Main Contribution
- In this paper, a dynamic sub-tree partitioning
- and adaptive metadata management system
- is designed to efficiently manage metadata
- workloads that evolve over time.
21Overview
- Introduction
- Background
- Dynamical Metadata Management
- Evaluation
22System Overview
- Petabytes of storage (1018 bytes, millions of
gigabytes) - Billions of files ranging from bytes to terabytes
- 10,000 to 100,000 clients
- 10 Metadata servers
- 1,000 object-based storage devices (OSDs)
Storage system Architecture 4
23Metadata Workloads
- More 50 of all file system operations are about
metadata - Open, close
- Readdir, setattr, ls
- Hot spots
- Directories near root of hierarchy are
necessarily popular - Popular files many opens of same file (e.g.
/lib/) - Popular directories many creates in same
directory, (e.g. /tmp)
24Current Partitioning Approaches
- Static sub-tree partitioning
- Coarse distribution
- Imbalanced distribution
- Hash-based partitioning
- Finer distribution
- Ignores hierarchical structure
- Hot spots still exist
- Eliminates all hierarchical locality
25Overview
- Introduction
- Background
- Dynamical Metadata Management
- Evaluation
26Dynamic Sub-tree Partitioning
- Sub-tree based partitioning
- Greater MDS independence
- Greater locality of reference within the workload
- Dynamic partitioning strategy
- Metadata storage
- Traffic control
- Flexible resource utilization policies
27Example
- Whether a directory is hashed is dynamically
determined.
28Load Balancing
- The metadata partition is modified over time by
allowing MDS nodes to transfer authority for
subtrees of the directory hierarchy.
- If a directory is large or busy,
- its contents can be selectively hashed
- across the cluster.
29Overview
- Introduction
- Background
- Dynamical Metadata Management
- Evaluation
30Evaluation
- The authors implemented their dynamic metadata
management system within an event-driven
simulation environment. - The Purpose is to validate the design hypotheses
- To show the relative performance and scalability
of the different metadata management - To show subtree partitioning is efficient
- To show the benefit of exploiting directory
locality - To show the ability of a dynamic partitioning in
traffic control.
31Simulation
- The focus of the simulation efforts
- is on MDS behavior and workload
- generation, and not on underlying
- disk storage behavior.
32Results Performance and Scalability
33Results Traffic control
34Outline
- Introduction
- Metadata Management in Distributed File Systems
- Case Study Dynamic Metadata Management for
Petabyte-scale File Systems - Conclusions
35Conclusions
- Metadata management is critical to overall system
performance and scalability. - Several aspects should be taken into account in
the design of metadata management - Consistency
- Partitioning
- Traffic control
- Metadata storage
- Dynamic subtree partitioning shows its advantages
for load balancing.
36References
- Metadata From Wikipedia, http//en.wikipedia.org/
wiki/MetadataDefinitions - Metadata Management An Essential Ingredient for
information Lifecycle Management,
http//www.sun.com/storagetek/white
papers/Metadata_Management.pdf - S. A. Brant, E. L. Miller, D. E. Long, and L.
Xue. Efficient metadata management in large
distributed storage system. In Proc.20th IEEE /
11th NASA Goddard Conference on Mass Storage
Systems and Technologies, page 290-298, April
2003. - S. A. Weil, K. T. Pollack, S. A. Brandt, and E.
L. Miller. Dynamic metadata management for
petabyte-scale file system. In Proc. ACM/IEEE
Conference on supercomputing, November 2004. - G. A. Gibson and R.V. Meter. Network attached
storage architecture. Communications of the ACM,
43(11)37-45, 2000. - Robyne M. Sumpter. Whitepaper on Data Management
v1.0. Lawrence Livermore National Laboratory,
February 10, 1994. - D. Roselli, J. Lorch, and T. Anderson. A
comparison of file system workloads. In
Proceedings of the 20th USENIX Annual Technical
Conference, pages 41-54, June 2000.
37