A Tutorial - PowerPoint PPT Presentation

About This Presentation
Title:

A Tutorial

Description:

For number of directories it is assumed that very user will have approximately 5000 directories. ... Number of Directories. 300. Number of Users. 1. Teraflops ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 59
Provided by: csb6
Category:

less

Transcript and Presenter's Notes

Title: A Tutorial


1
A Tutorial
  • Designing Cluster Computers
  • and
  • High Performance Storage Architectures
  • At
  • HPC ASIA 2002, Bangalore INDIA
  • December 16, 2002
  • By

N. Seetharama Krishna Centre for Development of
Advanced Computing Pune University Campus, Pune
INDIA e-mail krishna_at_cdacindia.com
Dheeraj Bhardwaj Department of Computer Science
Engineering Indian Institute of Technology,
Delhi INDIA e-mail dheerajb_at_cse.iits.ac.in
2
Acknowledgments
  • All the contributors of LINUX
  • All the contributors of Cluster Technology
  • All the contributors in the art and science of
    parallel computing
  • Department of Computer Science Engineering, IIT
    Delhi
  • Centre for Development of Advanced Computing,
    (C-DAC) and collaborators

3
Disclaimer
  • The information and examples provided are based
    on the Red Hat Linux 7.2 installation on the
    Intel PCs platforms ( our specific hardware
    specifications)
  • Much of it should be applicable to other
    versions of Linux,
  • There is no warranty that the materials are error
    free
  • Authors will not be held responsible for any
    direct, indirect, special, incidental or
    consequential damages related to any use of these
    materials

4
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • I/O
  • Designing the Storage Architectures
  • Discussions
  • Introduction
  • Brief history of storage technologies
  • Importance of storage subsystems
  • Recent requirements and developments

5
Introduction
Brief History of Storage Technologies - Make 2-3
slides
6
Introduction
Importance of Storage Subsystems
  • Greater Demand from Technical and commercial
    users for
  • Higher capacity to meet the growing demands
  • Higher performance for meeting the increased user
    base
  • Very high performance to meet the balance between
    compute and I/O in technical computing.

7
Introduction
Importance of Storage Subsystems
  • Greater Demand from Technical and commercial
    users for
  • Manageability challenges for managing data
  • A large user base demands
  • Large capacity
  • Ever increasing demand for through put
  • Ever changing application configuration needs

8
Introduction
  • Required Capabilities
  • Meet the demands of Multi Tera Flop Compute power
  • Scalable from 1 TF needs to 10 TF needs
  • Network-centered Architecture
  • Scalable in performance and capacity
  • Centralized back up and archive and management

9
Introduction
  • Required Capabilities
  • In Built Parallel operation
  • A Design Based on Standard Components
  • Multiple Hierarchies and Class of Service
  • Heterogeneous compute systems support
  • Large file size support
  • Balanced architecture for mixed work load

10
Introduction
Todays Storage Challenges
  • Managing the increasing Volume of Data
  • Providing continuous access to information
  • Adopting an evolving set of Storage Technologies
  • Investment protection on legacy resources
  • Multi vendor Inter operability Issues

11
Introduction
Todays Storage Challenges
  • Solution
  • An open, standards-based approach to storage
    management must be the rule, not the exception
  • Open standards address key concerns
  • Supporting changing requirements
  • Managing heterogeneous device topologies
  • Incorporating best-of-breed products to create a
    complete storage solution.

12
Objective
  • To create state of the art Scalable, Enterprise
    wide, Interoperable, Manageable, Modular and High
    Performance Storage involving
  • Study of existing technologies
  • Sizing the requirements capacity and
    performance
  • Architecture to meet HPC and Non HPC user
    community.
  • Meet the mixed and ever changing work load
    patterns.

13
Objective
  • To create state of the art Scalable, Enterprise
    wide, Interoperable, Manageable, Modular and High
    Performance Storage involving
  • Central storage facility accessible to authentic
    in house remote users.
  • Central Back up facility to take backup of
    storage as well as local clients.
  • Cost effective Storage Solution

14
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • Parallel I/O
  • Storage management Software
  • Security
  • Designing the Storage Architectures
  • Discussions
  • Overview of Storage Components
  • Disks
  • Interfaces
  • Protocols (SCSI,FC-AL,iSCSI,FC-IP)
  • Secondary Storage (RAID)
  • Tertiary Storage (Back tapes)

15
Storage Components - Disks
Please add at least one slide for one component
16
Storage Components - Interfaces
Please add at least one slide for one component
17
Storage Components - Protocols
Please add at least one slide for one component
18
Storage Components Secondary Storage (RAID)
Please add at least one slide for one component
19
Storage Components Tertiary Storage (Tape)
Please add at least one slide for one component
20
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • Parallel I/O
  • Storage management Software
  • Security
  • Designing the Storage Architectures
  • Discussions
  • Overview of Storage Models
  • DAS
  • NAS
  • SAN
  • FAS (NAS SAN co-exists)

21
Overview of Storage Models - DAS
  • Direct Attached Storage (DAS) Model

22
Direct Attached Storage
Please write Features. Advantages and
Disadvantages
23
Network Attached Storage (NAS)
  • Network Attached Storage (NAS) Model

24
Network Attached Storage (NAS)
Please write Features. Advantages and
Disadvantages
25
Storage Area Network (SAN)
  • Storage Area Network (SAN) Model

26
Storage Area Network (SAN)
Please write Features. Advantages and
Disadvantages
27
Fiber Attached Storage (FAS)
  • Fiber Attached Storage (FAS) NAS and SAN
    co-exists

28
NAS and SAN co- exists
Justify NAS and SAN co-existence Pick up from
our papers
29
Advantages of FAS
  • Centralizing management to improve staff
    efficiency for monitoring and administration
  • Enabling storage to be more readily available to
    any servers on the network, making stored
    information a more valuable asset, and increasing
    the utility of the network itself.
  • Improving the availability, usefulness, and
    distribution of business applications.
  • Making automation simpler, and reducing IT
    operational costs and staffing requirements.
  • Providing greater visibility into the
    availability and performance of storage
    components.
  • Facilitating continuous availability
    requirements.

30
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • Parallel I/O
  • Storage management Software
  • Security
  • Designing the Storage Architectures
  • Discussions
  • File Systems
  • Overview
  • File System Calculations
  • VFS
  • CFS
  • PFS
  • HPSS

31
File System Calculation
Aggregate Bandwidth Rates for One Parallel Job Aggregate Bandwidth Rates for One Parallel Job
Teraflops 1
Memory Size (GB) 700GB
I/O Rates (GB/s) 1.17 2
  • Assumptions
  • The lower estimates of memory it is assumed that
    for n teraflops machine n3/4 TB of memory is
    required.
  • The higher estimates of memory it is assumed that
    for n teraflops machine 2/3n Terabytes is
    required.
  • Reference
  • Statement of Work SGS File System
  • Report DOE National Nuclear Security
    Administration , USA , April 2001

32
Assumptions for File System Capacity Calculations
  • The lower I/O rate estimates are based on the
    throughput needed to store one half of the
    smaller memory in five minutes.
  • (1/2 700GB) / (5 60s) 1.17 GB/sec.
  • The higher I/O rate estimates are assumed that
    applications will store one byte for every 500
    floating point operations. This is a common thumb
    rule used.
  • 1TF / 500 Flops 2GB/sec

33
Assumptions for File System Capacity Calculations
  • For number of directories it is assumed that very
    user will have approximately 5000 directories.
  • 300 users 5000 directories 1.5106
  •  For number of files it is assumed that minimum
    25 files per directory and maximum 2,00,000 files
    per directory.
  • Minimum 1.5106 directories 25 files
    37.5 106
  • Maximum 1.5106 directories 2105 files
    3 1011

34
Assumptions for File System capacity calculations
  • File system size is derived using formula
  • File system size 1.25 (7 to 18 Peak
    Performance) TB
  • Minimum 1.25 (7 1 TF) 8.75 TB
  • Minimum 37.5 106 256K 9.6TB
  • Maximum 1.25 (18 1TF) 22.5 TB
  • For number of devices/subsystem we are assuming
    that 72GB drives are used.
  • 8.75 TB / 72GB ? 121 drives
  • 22.5 TB / 72GB ? 312 drives

35
In Summary
File System Capacities File System Capacities
Teraflops 1
Number of Users 300
Number of Directories 1.5106
Number of Files 37.5 106 to 3 1011
File System size (TB) 8.75 22
Number of devices/subsystem 121 - 312 (72GB drives)
36
I/O Bandwidth
  • The File system Maximum Sustained Bandwidth can
    be obtained by the formula
  • Bfs N Bdrives E
  •  
  •  
  • Minimum Bfs 121 100 MB/s 0.85
    10.28 GB/s
  • Maximum Bfs 312 100 MB/s 0.85
    22.70 GB/s

Bfs File System Max Sustained Bandwidth
N Number of Drives
Bdrives Sustained bandwidth of the slowest disk
E File system efficiency factor (0.85)
37
Parallel File System
  • PFS is designed as a client-server system with
    multiple I/O servers, which have disk/RAID
    attached to them. Each PFS file is striped across
    the disk on the I/O nodes.
  • PFS also has a manager that handles only metadata
    operations such as permission checking for file
    creation, open, close and remove operations.
  • Direct Parallel I/O
  • All participating clients access the storage
    directly via request to parallel I/O server.
  • This provides the maximum throughput as it by
    passes the overheads of intermediate file
    servers.

38
Cluster File System
39
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • Parallel I/O
  • Storage management Software
  • Security
  • Designing the Storage Architectures
  • Discussions
  • Parallel I/O
  • Introduction
  • Parallel I/O Approaches
  • (You can add some more)

40
Introduction
Parallel Serial I/O Write the basic differences
41
I/O Approaches
  • Following four I/O approaches can be used for
    data distribution across the participating
    processors in the parallel program
  •  UNIX I/O on NFS
  • Parallel I/O on NFS
  • PFS UNIX I/O with PFS support
  • Parallel I/O with PFS support
  • Direct Parallel I/O
  • UNIX I/O on NFS
  • UNIX I/O, process with rank zero reads the input
    file using standard UNIX read, partitions it and
    distributes it to other processors.
  • The file is NFS mounted on the processor with
    process rank zero only.
  • Parallel I/O on NFS
  • All the processors open the file concurrently and
    read their required data blocks by moving offset
    pointer to the beginning of their corresponding
    data block in the input file.
  • File is NFS mounted from server to all the
    compute nodes.

42
I/O Approaches
  • UNIX I/O with PFS support
  • Define these terms
  • Parallel I/O with PFS support
  • Define these terms
  • Direct Parallel I/O

43
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • Parallel I/O
  • Storage management Software
  • Security
  • Designing the Storage Architectures
  • Discussions
  • Storage Management Software
  • Overview
  • Features
  • Details of available software and their features
  • Etc

44
Storage Management Software
Please make few slides --- say 8-10
45
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • Parallel I/O
  • Storage management Software
  • Security
  • Designing the Storage Architectures
  • Discussions
  • Storage Security
  • Overview
  • Other aspects

46
Storage Security
Make some slides on Security aspects of Storage
systems e.g. Kerberose etc
47
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • Parallel I/O
  • Storage management Software
  • Security
  • Designing the Storage Architectures
  • Discussions
  • Design of Storage Architecture
  • Approach
  • Traditional
  • Ideal
  • Logical
  • Proposed
  • Etc

48
Approach on Architecture
Compute Nodes
  • File Servers and File Systems
  • To support a high bandwidth we have to use
    special purpose file systems rather than the
    traditional file systems such as UFS, CIFS.
  • Cluster File System (CFS) is a highly available,
    distributed, cache-coherent file system that
    allows UFS file system to be concurrently
    accessed on multiple cluster nodes
  • Parallel File System (PFS) is necessary to stripe
    the data file across the multiple disks to
    increase the total I/O throughput.
  • A set of File Servers configured with cluster
    file system (CFS) and parallel file system (PFS)
    ensures the high availability and throughput of
    the data to the users
  • Distribution Networks
  • As of today, there are two networks (standard
    Ethernet and proprietary) available to be used to
    connect compute nodes to file servers for data
    transfer.
  • A third approach, extending the SAN to directly
    to compute nodes and avoid file servers (Direct
    parallel I/O) will reduce the network bottleneck
    but an expensive option.

C1
C2
C35
PARAM System Area Network
C36
C37
C70
GigabitSwitch
Fiber Switch
Storage Array
49
Design of Architecture
  • We propose an architecture, which is the mix of
    DAS, NAS and SAN connected together to the High
    Performance Computing Cluster.
  • We have chosen Direct Attached Storage directly
    connected to the application server for catering
    its application development need such as
    compliers, tools, source codes etc.
  • It is advisable to keep the application and data
    storage spaces separate to get the best
    performance and to avoid the single point of
    failure.
  • To achieve a high throughput a massive scalable
    storage system by combining multiple disk arrays
    or a single large array with large number of
    FC-AL interfaces.
  • To achieve the throughput of multi Gigabytes at
    file system level, we have to size the storage
    array output to twice the requirement.

50
Design of Architecture
  • We also have to size the number of disks, which
    can deliver desired sustained performance.
  • Our approach of keeping the applications data on
    to DAS and sequential users data on NAS and high
    performance computing data on SAN attached
    storage, will automatically separate the data
    from each other
  • The highly automated tape library connected to
    the storage array, NAS and DAS with the Fiber
    channel interface and accompanied by the data
    acquisition backup master server, will help to
    take the online backup in the server free, and
    LAN free environment.
  • This will free the CPUs of the file servers for
    the backup and restore jobs and focus on serving
    the high performance computing users.

51
Scalability
  • The quantities, which should scale are
  • Access
  • Storage capacity
  • SAN
  • I/O bandwidth
  • Access Parallel access to multiple devices.
  • Storage Capacity This can be addressed in two
    ways
  • Big Monolithic Storage Box Support several
    hundreds of disks but Realizing a large disk
    array may have limitations in terms of bandwidth
    scalability and reliability.
  • Multiple RAID arraysconnected to the fiber
    channel SAN and configure them as a single
    storage unit to enhance the capacity without
    affecting the bandwidth.
  • SAN Chassis based storage directors where they
    can scale from eight ports to few hundreds of
    ports. This will provide a non-blocking,
    full-fledged scalability in SAN.
  • I/O bandwidth Parallel File System (PFS) that
    stripes the data file across the multiple disks
    in the array through the I/O nodes to increase
    the total I/O throughput.

52
Typical Storage Architecture
SystemAreaNetwork
Backup / Archive System
TapeSystem
Cluster A
StorageAreaNetwork
NFS/CIFSClients
Visualization
Cluster A File System
LAN
SystemAreaNetwork
WAN
TootherSites
Cluster B
NFS, CIFS Servers
Cluster B File System
53
Ideal Storage Architecture
SystemAreaNetwork
SystemAreaNetwork
Cluster A
Cluster B
File System Servers (CFS/PFS)
NFS/CIFSClients
Visualization
Gigabit LAN
GPFS
Storage AreaNetwork
Backup Archive Server
TootherSites
NFS, CIFS Servers
WAN
TapeSystem
54
Physical Storage Components connectivity
I/O Storage Nodes
I/O Spare, B/up Dev.,Storage Mgr
M0
32 Port Switch - A
32 Port Switch - B
Tape Library
Disk Subsystem
Disk Subsystem
Disk Subsystem
Disk Subsystem
55
Network Based Scalable High Performance Storage
Architecture
PARAM 20000
Cluster of File Servers running Cluster File
System
Miscellaneous Servers
Internet
C1
C2
C35
FS1
FS2
FS3
FS4
M0
M1
M2
M3
Router
PARAM System Area Network
FS5
FS6
FS7
FS8
M4
M5
M6
M7
C36
C37
C70
DAS
M0 SchedulerM1 Spare ServerM2
Developmental User Nodes M3 Storage Mgmt.
ServerM4 Visualization ServerM5 Gateway
Authentication ServerM6 Backup ServerM7
Spare Server FS1- FS8 File Servers C1 C70
Compute Nodes
Gigabit
Storage Area Network
FastEthernet
FC-AL SWITCH2 GBps
NAS Server
1TB 3TB
PARAM System Area Network
Project 1
Storage Array
BackupLibrary
Trunk Ethernet
..
Gigabit Ethernet
Project n
FC-AL
MIS
2TB 20TB
20TB 200 TB
Fast Ethernet
56
Outline
  • Introduction
  • Overview of storage components
  • Overview of Storage Models
  • Files Systems
  • Parallel I/O
  • Storage management Software
  • Security
  • Designing the Storage Architectures
  • Discussions
  • Discussions
  • Suggested technologies
  • Future
  • Other aspects
  • Conclusion

57
Recommended Technologies
  • Disks Min 72 GB , Dual Port FC-AL ,10000 RPM
  • Protocol SCSI
  • Interface FC-AL Interface
  • Storage Connectivity 2 Gb/s Multi-Path Fiber
    Switches
  • Storage Array Host Intelligence Based with
    Modular and linear scale up Architecture
  • File System Access Direct ,PFS and NFS V4
    thro Gigabit N/W
  • File System POSIX Compliant IEEE/ANSI 1003.X
    Cluster File System with PFS
  • Back Up Fiber tape Libraries with HSM
  • Compute node Access through NFS and PFS on
    Gigabit Ethernet .
  • Architecture FAS Based, Combination of DAS,
    NAS SAN

58
Futuristic C-DAC Enterprise File System by 2005
VisualizationWorkstation
RD Project SMP /Numa Systems
SpecialPurposeComputers
PARAM 20000
DB Servers
Suitable Architecture for GRID
Write a Comment
User Comments (0)
About PowerShow.com