Title: A Tutorial
1A Tutorial
- Designing Cluster Computers
- and
- High Performance Storage Architectures
- At
- HPC ASIA 2002, Bangalore INDIA
- December 16, 2002
- By
N. Seetharama Krishna Centre for Development of
Advanced Computing Pune University Campus, Pune
INDIA e-mail krishna_at_cdacindia.com
Dheeraj Bhardwaj Department of Computer Science
Engineering Indian Institute of Technology,
Delhi INDIA e-mail dheerajb_at_cse.iits.ac.in
2Acknowledgments
- All the contributors of LINUX
- All the contributors of Cluster Technology
- All the contributors in the art and science of
parallel computing - Department of Computer Science Engineering, IIT
Delhi - Centre for Development of Advanced Computing,
(C-DAC) and collaborators
3Disclaimer
- The information and examples provided are based
on the Red Hat Linux 7.2 installation on the
Intel PCs platforms ( our specific hardware
specifications) - Much of it should be applicable to other
versions of Linux, - There is no warranty that the materials are error
free - Authors will not be held responsible for any
direct, indirect, special, incidental or
consequential damages related to any use of these
materials
4Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- I/O
- Designing the Storage Architectures
- Discussions
- Introduction
- Brief history of storage technologies
- Importance of storage subsystems
- Recent requirements and developments
5Introduction
Brief History of Storage Technologies - Make 2-3
slides
6Introduction
Importance of Storage Subsystems
- Greater Demand from Technical and commercial
users for - Higher capacity to meet the growing demands
- Higher performance for meeting the increased user
base - Very high performance to meet the balance between
compute and I/O in technical computing.
7Introduction
Importance of Storage Subsystems
- Greater Demand from Technical and commercial
users for - Manageability challenges for managing data
- A large user base demands
- Large capacity
- Ever increasing demand for through put
- Ever changing application configuration needs
8Introduction
- Required Capabilities
- Meet the demands of Multi Tera Flop Compute power
- Scalable from 1 TF needs to 10 TF needs
- Network-centered Architecture
- Scalable in performance and capacity
- Centralized back up and archive and management
9Introduction
- Required Capabilities
- In Built Parallel operation
- A Design Based on Standard Components
- Multiple Hierarchies and Class of Service
- Heterogeneous compute systems support
- Large file size support
- Balanced architecture for mixed work load
10Introduction
Todays Storage Challenges
- Managing the increasing Volume of Data
- Providing continuous access to information
- Adopting an evolving set of Storage Technologies
- Investment protection on legacy resources
- Multi vendor Inter operability Issues
11Introduction
Todays Storage Challenges
- Solution
- An open, standards-based approach to storage
management must be the rule, not the exception - Open standards address key concerns
- Supporting changing requirements
- Managing heterogeneous device topologies
- Incorporating best-of-breed products to create a
complete storage solution.
12Objective
- To create state of the art Scalable, Enterprise
wide, Interoperable, Manageable, Modular and High
Performance Storage involving - Study of existing technologies
- Sizing the requirements capacity and
performance - Architecture to meet HPC and Non HPC user
community. - Meet the mixed and ever changing work load
patterns.
13Objective
- To create state of the art Scalable, Enterprise
wide, Interoperable, Manageable, Modular and High
Performance Storage involving - Central storage facility accessible to authentic
in house remote users. - Central Back up facility to take backup of
storage as well as local clients. - Cost effective Storage Solution
14Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- Parallel I/O
- Storage management Software
- Security
- Designing the Storage Architectures
- Discussions
- Overview of Storage Components
- Disks
- Interfaces
- Protocols (SCSI,FC-AL,iSCSI,FC-IP)
- Secondary Storage (RAID)
- Tertiary Storage (Back tapes)
15Storage Components - Disks
Please add at least one slide for one component
16Storage Components - Interfaces
Please add at least one slide for one component
17Storage Components - Protocols
Please add at least one slide for one component
18Storage Components Secondary Storage (RAID)
Please add at least one slide for one component
19Storage Components Tertiary Storage (Tape)
Please add at least one slide for one component
20Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- Parallel I/O
- Storage management Software
- Security
- Designing the Storage Architectures
- Discussions
- Overview of Storage Models
- DAS
- NAS
- SAN
- FAS (NAS SAN co-exists)
21Overview of Storage Models - DAS
- Direct Attached Storage (DAS) Model
22Direct Attached Storage
Please write Features. Advantages and
Disadvantages
23Network Attached Storage (NAS)
- Network Attached Storage (NAS) Model
24Network Attached Storage (NAS)
Please write Features. Advantages and
Disadvantages
25Storage Area Network (SAN)
- Storage Area Network (SAN) Model
26Storage Area Network (SAN)
Please write Features. Advantages and
Disadvantages
27Fiber Attached Storage (FAS)
- Fiber Attached Storage (FAS) NAS and SAN
co-exists
28NAS and SAN co- exists
Justify NAS and SAN co-existence Pick up from
our papers
29Advantages of FAS
- Centralizing management to improve staff
efficiency for monitoring and administration - Enabling storage to be more readily available to
any servers on the network, making stored
information a more valuable asset, and increasing
the utility of the network itself. - Improving the availability, usefulness, and
distribution of business applications. - Making automation simpler, and reducing IT
operational costs and staffing requirements. - Providing greater visibility into the
availability and performance of storage
components. - Facilitating continuous availability
requirements.
30Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- Parallel I/O
- Storage management Software
- Security
- Designing the Storage Architectures
- Discussions
- File Systems
- Overview
- File System Calculations
- VFS
- CFS
- PFS
- HPSS
31File System Calculation
Aggregate Bandwidth Rates for One Parallel Job Aggregate Bandwidth Rates for One Parallel Job
Teraflops 1
Memory Size (GB) 700GB
I/O Rates (GB/s) 1.17 2
- Assumptions
- The lower estimates of memory it is assumed that
for n teraflops machine n3/4 TB of memory is
required. - The higher estimates of memory it is assumed that
for n teraflops machine 2/3n Terabytes is
required. - Reference
- Statement of Work SGS File System
- Report DOE National Nuclear Security
Administration , USA , April 2001
32Assumptions for File System Capacity Calculations
- The lower I/O rate estimates are based on the
throughput needed to store one half of the
smaller memory in five minutes. - (1/2 700GB) / (5 60s) 1.17 GB/sec.
- The higher I/O rate estimates are assumed that
applications will store one byte for every 500
floating point operations. This is a common thumb
rule used. - 1TF / 500 Flops 2GB/sec
33Assumptions for File System Capacity Calculations
- For number of directories it is assumed that very
user will have approximately 5000 directories. - 300 users 5000 directories 1.5106
- For number of files it is assumed that minimum
25 files per directory and maximum 2,00,000 files
per directory. - Minimum 1.5106 directories 25 files
37.5 106 - Maximum 1.5106 directories 2105 files
3 1011
34Assumptions for File System capacity calculations
- File system size is derived using formula
- File system size 1.25 (7 to 18 Peak
Performance) TB - Minimum 1.25 (7 1 TF) 8.75 TB
- Minimum 37.5 106 256K 9.6TB
- Maximum 1.25 (18 1TF) 22.5 TB
- For number of devices/subsystem we are assuming
that 72GB drives are used. - 8.75 TB / 72GB ? 121 drives
- 22.5 TB / 72GB ? 312 drives
35In Summary
File System Capacities File System Capacities
Teraflops 1
Number of Users 300
Number of Directories 1.5106
Number of Files 37.5 106 to 3 1011
File System size (TB) 8.75 22
Number of devices/subsystem 121 - 312 (72GB drives)
36I/O Bandwidth
- The File system Maximum Sustained Bandwidth can
be obtained by the formula - Bfs N Bdrives E
-
-
- Minimum Bfs 121 100 MB/s 0.85
10.28 GB/s - Maximum Bfs 312 100 MB/s 0.85
22.70 GB/s
Bfs File System Max Sustained Bandwidth
N Number of Drives
Bdrives Sustained bandwidth of the slowest disk
E File system efficiency factor (0.85)
37Parallel File System
- PFS is designed as a client-server system with
multiple I/O servers, which have disk/RAID
attached to them. Each PFS file is striped across
the disk on the I/O nodes. - PFS also has a manager that handles only metadata
operations such as permission checking for file
creation, open, close and remove operations. - Direct Parallel I/O
- All participating clients access the storage
directly via request to parallel I/O server. - This provides the maximum throughput as it by
passes the overheads of intermediate file
servers.
38Cluster File System
39Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- Parallel I/O
- Storage management Software
- Security
- Designing the Storage Architectures
- Discussions
- Parallel I/O
- Introduction
- Parallel I/O Approaches
- (You can add some more)
40Introduction
Parallel Serial I/O Write the basic differences
41I/O Approaches
- Following four I/O approaches can be used for
data distribution across the participating
processors in the parallel program - UNIX I/O on NFS
- Parallel I/O on NFS
- PFS UNIX I/O with PFS support
- Parallel I/O with PFS support
- Direct Parallel I/O
- UNIX I/O on NFS
- UNIX I/O, process with rank zero reads the input
file using standard UNIX read, partitions it and
distributes it to other processors. - The file is NFS mounted on the processor with
process rank zero only. - Parallel I/O on NFS
- All the processors open the file concurrently and
read their required data blocks by moving offset
pointer to the beginning of their corresponding
data block in the input file. - File is NFS mounted from server to all the
compute nodes.
42I/O Approaches
- UNIX I/O with PFS support
- Define these terms
- Parallel I/O with PFS support
- Define these terms
- Direct Parallel I/O
43Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- Parallel I/O
- Storage management Software
- Security
- Designing the Storage Architectures
- Discussions
- Storage Management Software
- Overview
- Features
- Details of available software and their features
- Etc
44Storage Management Software
Please make few slides --- say 8-10
45Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- Parallel I/O
- Storage management Software
- Security
- Designing the Storage Architectures
- Discussions
- Storage Security
- Overview
- Other aspects
46Storage Security
Make some slides on Security aspects of Storage
systems e.g. Kerberose etc
47Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- Parallel I/O
- Storage management Software
- Security
- Designing the Storage Architectures
- Discussions
- Design of Storage Architecture
- Approach
- Traditional
- Ideal
- Logical
- Proposed
- Etc
48Approach on Architecture
Compute Nodes
- File Servers and File Systems
- To support a high bandwidth we have to use
special purpose file systems rather than the
traditional file systems such as UFS, CIFS. - Cluster File System (CFS) is a highly available,
distributed, cache-coherent file system that
allows UFS file system to be concurrently
accessed on multiple cluster nodes - Parallel File System (PFS) is necessary to stripe
the data file across the multiple disks to
increase the total I/O throughput. - A set of File Servers configured with cluster
file system (CFS) and parallel file system (PFS)
ensures the high availability and throughput of
the data to the users - Distribution Networks
- As of today, there are two networks (standard
Ethernet and proprietary) available to be used to
connect compute nodes to file servers for data
transfer. - A third approach, extending the SAN to directly
to compute nodes and avoid file servers (Direct
parallel I/O) will reduce the network bottleneck
but an expensive option.
C1
C2
C35
PARAM System Area Network
C36
C37
C70
GigabitSwitch
Fiber Switch
Storage Array
49Design of Architecture
- We propose an architecture, which is the mix of
DAS, NAS and SAN connected together to the High
Performance Computing Cluster. - We have chosen Direct Attached Storage directly
connected to the application server for catering
its application development need such as
compliers, tools, source codes etc. - It is advisable to keep the application and data
storage spaces separate to get the best
performance and to avoid the single point of
failure. - To achieve a high throughput a massive scalable
storage system by combining multiple disk arrays
or a single large array with large number of
FC-AL interfaces. - To achieve the throughput of multi Gigabytes at
file system level, we have to size the storage
array output to twice the requirement.
50Design of Architecture
- We also have to size the number of disks, which
can deliver desired sustained performance. - Our approach of keeping the applications data on
to DAS and sequential users data on NAS and high
performance computing data on SAN attached
storage, will automatically separate the data
from each other - The highly automated tape library connected to
the storage array, NAS and DAS with the Fiber
channel interface and accompanied by the data
acquisition backup master server, will help to
take the online backup in the server free, and
LAN free environment. - This will free the CPUs of the file servers for
the backup and restore jobs and focus on serving
the high performance computing users.
51Scalability
- The quantities, which should scale are
- Access
- Storage capacity
- SAN
- I/O bandwidth
- Access Parallel access to multiple devices.
- Storage Capacity This can be addressed in two
ways - Big Monolithic Storage Box Support several
hundreds of disks but Realizing a large disk
array may have limitations in terms of bandwidth
scalability and reliability. - Multiple RAID arraysconnected to the fiber
channel SAN and configure them as a single
storage unit to enhance the capacity without
affecting the bandwidth. - SAN Chassis based storage directors where they
can scale from eight ports to few hundreds of
ports. This will provide a non-blocking,
full-fledged scalability in SAN. - I/O bandwidth Parallel File System (PFS) that
stripes the data file across the multiple disks
in the array through the I/O nodes to increase
the total I/O throughput.
52Typical Storage Architecture
SystemAreaNetwork
Backup / Archive System
TapeSystem
Cluster A
StorageAreaNetwork
NFS/CIFSClients
Visualization
Cluster A File System
LAN
SystemAreaNetwork
WAN
TootherSites
Cluster B
NFS, CIFS Servers
Cluster B File System
53Ideal Storage Architecture
SystemAreaNetwork
SystemAreaNetwork
Cluster A
Cluster B
File System Servers (CFS/PFS)
NFS/CIFSClients
Visualization
Gigabit LAN
GPFS
Storage AreaNetwork
Backup Archive Server
TootherSites
NFS, CIFS Servers
WAN
TapeSystem
54Physical Storage Components connectivity
I/O Storage Nodes
I/O Spare, B/up Dev.,Storage Mgr
M0
32 Port Switch - A
32 Port Switch - B
Tape Library
Disk Subsystem
Disk Subsystem
Disk Subsystem
Disk Subsystem
55Network Based Scalable High Performance Storage
Architecture
PARAM 20000
Cluster of File Servers running Cluster File
System
Miscellaneous Servers
Internet
C1
C2
C35
FS1
FS2
FS3
FS4
M0
M1
M2
M3
Router
PARAM System Area Network
FS5
FS6
FS7
FS8
M4
M5
M6
M7
C36
C37
C70
DAS
M0 SchedulerM1 Spare ServerM2
Developmental User Nodes M3 Storage Mgmt.
ServerM4 Visualization ServerM5 Gateway
Authentication ServerM6 Backup ServerM7
Spare Server FS1- FS8 File Servers C1 C70
Compute Nodes
Gigabit
Storage Area Network
FastEthernet
FC-AL SWITCH2 GBps
NAS Server
1TB 3TB
PARAM System Area Network
Project 1
Storage Array
BackupLibrary
Trunk Ethernet
..
Gigabit Ethernet
Project n
FC-AL
MIS
2TB 20TB
20TB 200 TB
Fast Ethernet
56Outline
- Introduction
- Overview of storage components
- Overview of Storage Models
- Files Systems
- Parallel I/O
- Storage management Software
- Security
- Designing the Storage Architectures
- Discussions
- Discussions
- Suggested technologies
- Future
- Other aspects
- Conclusion
57Recommended Technologies
- Disks Min 72 GB , Dual Port FC-AL ,10000 RPM
- Protocol SCSI
- Interface FC-AL Interface
- Storage Connectivity 2 Gb/s Multi-Path Fiber
Switches - Storage Array Host Intelligence Based with
Modular and linear scale up Architecture - File System Access Direct ,PFS and NFS V4
thro Gigabit N/W - File System POSIX Compliant IEEE/ANSI 1003.X
Cluster File System with PFS - Back Up Fiber tape Libraries with HSM
- Compute node Access through NFS and PFS on
Gigabit Ethernet . - Architecture FAS Based, Combination of DAS,
NAS SAN
58Futuristic C-DAC Enterprise File System by 2005
VisualizationWorkstation
RD Project SMP /Numa Systems
SpecialPurposeComputers
PARAM 20000
DB Servers
Suitable Architecture for GRID