Xin Chen - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Xin Chen

Description:

Department of Electrical and Computer Engineering ... Metadata Management in Distributed File Systems ... S. A. Brant, E. L. Miller, D. E. Long, and L. Xue. ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 38
Provided by: Zhiq3
Category:
Tags: brant | chen | xin

less

Transcript and Presenter's Notes

Title: Xin Chen


1
MetaData Management in Distributed File Systems
  • Xin Chen
  • (xchen21_at_tntech.edu)
  • Nov.27 2007
  • Department of Electrical and Computer Engineering
  • Tennessee Technological University

2
Outline
  • Introduction
  • Metadata Management in Distributed File Systems
  • Case Study Dynamic Metadata Management for
    Petabyte-scale File Systems
  • Conclusions

3
What is Metadata?
  • Two common definitions of metadata are
  • Data about data
  • Data needed to make other data useful

4
How to Find Books in the Library?
5
Types of Metadata
  • File System Metadata
  • Timestamps, file attributes, and some
    special-purpose information.
  • Image Metadata
  • Subjects, related emotions, and other descriptive
    phrases
  • Program Metadata
  • Company that published the program, the date the
    program was created, the version number.
  • Digital library metadata
  • Descriptive - Information describing the
    intellectual content of the object
  • Structural - Information that ties each object to
    others to make up logical units
  • Administrative - Information used to manage the
    object or control access to it.

6
Outline
  • Introduction
  • Metadata Management in Distributed File Systems
  • Case Study Dynamic Metadata Management for
    Petabyte-scale File Systems
  • Conclusions

7
Metadata Management in Distributed File Systems
  • In short, Metadata Management is the act of
    imposing
  • management discipline on the collection and
    control of
  • Metadata 2
  • Create/delete metadata
  • Manage file system namespace (directory
    structure)
  • Control inconsistencies and redundancies of
    metadata

8
Importance of Metadata Management
  • Good metadata performance is critical to overall
    system performance.
  • Efficient metadata management leads to efficient
    use of storage resources.
  • Efficient metadata management leads to long term
    scalability of the system.

9
A Typical Distributed File System Architecture
10
Issues of Metadata Management in Distributed File
Systems
  • Consistency
  • Workload Partitioning
  • Traffic Management
  • Metadata Storage

11
Consistency
  • Updates must be consistent within MDSs clients
  • should have consistent view of file system.

Distributed Locking vs. Centralized Management
12
Workload Partition
  • Balance MDS loads, maximize efficiency and
    throughput

13
Traffic Management
  • Effectively adapt to a changing workload.
  • Ideal case
  • Unpopular items clients directly contact the
    corresponding MDS.
  • Popular items those items are replicated across
    several nodes.

14
Metadata Storage
  • Fast commits and efficient reads
  • Ideally, the MDS memory is
  • able to satisfy most reads.

15
Outline
  • Introduction
  • Metadata Management in Distributed File Systems
  • Case Study Dynamic Metadata Management for
    Petabyte-scale File Systems
  • Conclusions

16
  • Dynamic Metadata Management for Petabyte-scale
    File Systems
  • Sage Weil, Kristal T. Pollack, Scott A. Brandt
    and Ethan L. Miller
  • University of California, Santa Cruz
  • SC '04 Proceedings of the 2004 ACM/IEEE
    conference on
  • Supercomputing, 2004

17
Overview
  • Introduction
  • Background
  • Dynamical Metadata Management
  • Evaluation

18
Motivations
  • In petabyte-scale distributed file system, the
    behavior
  • of the metadata server is critical to overall
    system
  • performance and scalability.
  • More than 50 of all file system operations are
    metadata operations, making the performance of
    the MDS cluster of critical importance.
  • Metadata exhibit a higher degree of
    interdependence, making the design of a scalable
    system much more challenging.

19
Issues
Goal MDSs should be able to continually adapt to
current demands to maintain high system
performance and long-term scalability.
20
Main Contribution
  • In this paper, a dynamic sub-tree partitioning
  • and adaptive metadata management system
  • is designed to efficiently manage metadata
  • workloads that evolve over time.

21
Overview
  • Introduction
  • Background
  • Dynamical Metadata Management
  • Evaluation

22
System Overview
  • Petabytes of storage (1018 bytes, millions of
    gigabytes)
  • Billions of files ranging from bytes to terabytes
  • 10,000 to 100,000 clients
  • 10 Metadata servers
  • 1,000 object-based storage devices (OSDs)

Storage system Architecture 4
23
Metadata Workloads
  • More 50 of all file system operations are about
    metadata
  • Open, close
  • Readdir, setattr, ls
  • Hot spots
  • Directories near root of hierarchy are
    necessarily popular
  • Popular files many opens of same file (e.g.
    /lib/)
  • Popular directories many creates in same
    directory, (e.g. /tmp)

24
Current Partitioning Approaches
  • Static sub-tree partitioning
  • Coarse distribution
  • Imbalanced distribution
  • Hash-based partitioning
  • Finer distribution
  • Ignores hierarchical structure
  • Hot spots still exist
  • Eliminates all hierarchical locality

25
Overview
  • Introduction
  • Background
  • Dynamical Metadata Management
  • Evaluation

26
Dynamic Sub-tree Partitioning
  • Sub-tree based partitioning
  • Greater MDS independence
  • Greater locality of reference within the workload
  • Dynamic partitioning strategy
  • Metadata storage
  • Traffic control
  • Flexible resource utilization policies

27
Example
  • Whether a directory is hashed is dynamically
    determined.

28
Load Balancing
  • The metadata partition is modified over time by
    allowing MDS nodes to transfer authority for
    subtrees of the directory hierarchy.
  • If a directory is large or busy,
  • its contents can be selectively hashed
  • across the cluster.

29
Overview
  • Introduction
  • Background
  • Dynamical Metadata Management
  • Evaluation

30
Evaluation
  • The authors implemented their dynamic metadata
    management system within an event-driven
    simulation environment.
  • The Purpose is to validate the design hypotheses
  • To show the relative performance and scalability
    of the different metadata management
  • To show subtree partitioning is efficient
  • To show the benefit of exploiting directory
    locality
  • To show the ability of a dynamic partitioning in
    traffic control.

31
Simulation
  • The focus of the simulation efforts
  • is on MDS behavior and workload
  • generation, and not on underlying
  • disk storage behavior.

32
Results Performance and Scalability
33
Results Traffic control
34
Outline
  • Introduction
  • Metadata Management in Distributed File Systems
  • Case Study Dynamic Metadata Management for
    Petabyte-scale File Systems
  • Conclusions

35
Conclusions
  • Metadata management is critical to overall system
    performance and scalability.
  • Several aspects should be taken into account in
    the design of metadata management
  • Consistency
  • Partitioning
  • Traffic control
  • Metadata storage
  • Dynamic subtree partitioning shows its advantages
    for load balancing.

36
References
  • Metadata From Wikipedia, http//en.wikipedia.org/
    wiki/MetadataDefinitions
  • Metadata Management An Essential Ingredient for
    information Lifecycle Management,
    http//www.sun.com/storagetek/white
    papers/Metadata_Management.pdf
  • S. A. Brant, E. L. Miller, D. E. Long, and L.
    Xue. Efficient metadata management in large
    distributed storage system. In Proc.20th IEEE /
    11th NASA Goddard Conference on Mass Storage
    Systems and Technologies, page 290-298, April
    2003.
  • S. A. Weil, K. T. Pollack, S. A. Brandt, and E.
    L. Miller. Dynamic metadata management for
    petabyte-scale file system. In Proc. ACM/IEEE
    Conference on supercomputing, November 2004.
  • G. A. Gibson and R.V. Meter. Network attached
    storage architecture. Communications of the ACM,
    43(11)37-45, 2000.
  • Robyne M. Sumpter. Whitepaper on Data Management
    v1.0. Lawrence Livermore National Laboratory,
    February 10, 1994.
  • D. Roselli, J. Lorch, and T. Anderson. A
    comparison of file system workloads. In
    Proceedings of the 20th USENIX Annual Technical
    Conference, pages 41-54, June 2000.

37
  • Thank You !
Write a Comment
User Comments (0)
About PowerShow.com