The Panda library for fast and easy IO on parallel computers http:drl.cs.uiuc.edupanda - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

The Panda library for fast and easy IO on parallel computers http:drl.cs.uiuc.edupanda

Description:

Panda's high-level interfaces. SPMD style applications ... Panda write performance on an 8-node FDDI-connected HP workstation cluster ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 52
Provided by: ying5
Category:

less

Transcript and Presenter's Notes

Title: The Panda library for fast and easy IO on parallel computers http:drl.cs.uiuc.edupanda


1
The Panda library for fast and easy I/O on
parallel computershttp//drl.cs.uiuc.edu/panda/
  • Marianne Winslett
  • Department of Computer Science
  • University of Illinois at Urbana-Champaign

2
Outline
  • Motivations
  • Panda goals
  • Pandas high-level interfaces
  • High-performance I/O strategies
  • Panda and CSAR relationships
  • Conclusions
  • Current and future research directions

3
Motivations
  • High I/O demands in many scientific applications
  • Hard-to-use I/O facilities
  • Explicit file pointer manipulations
  • Careful hand-selection of file system call
    options
  • Tedious and slow data migration
  • Non-portable I/O codes
  • Poor I/O performance

4
Panda goals
  • Ease of use and automatic data management
  • Application portability
  • High I/O performance
  • Collective I/O for arrays
  • Commodity-parts only

5
Our approach
  • High-level array-oriented interface
  • Easy to use and automatic data management
  • Portable
  • Flexible underlying implementations
  • High performance I/O techniques
  • A research project
  • Development partner HDF (http//hdf.ncsa.uiuc.ed
    u)

6
The Panda library for parallel I/O
  • Target platforms
  • Distributed memory multiprocessors
  • Clusters of workstations
  • Target applications
  • SPMD scientific applications
  • Data type supported
  • Multidimensional arrays
  • Collective I/O operations

7
The Panda I/O library for collective I/O on arrays
Compute Nodes
SPMD Application
Panda Collective I/O Interface
Panda Clients
MPI
MPI
MPI
MPI
I/O Nodes
Panda Servers
Network
8
Pandas high-level interfaces
  • SPMD style applications
  • HPF style data distributions in memory and on
    disk
  • Operation types
  • checkpoint/restart, timestep output operation,
    reading/writing out-of-core arrays

9
High-level Array Interface
  • int array_size 512, 512, 512
  • int array_rank 3 int mem_layout
    8,8
  • int layout_rank 2 int disk_layout
    8,1
  • int distribution BLOCK,BLOCK,NONE
  • // Logical array layout in memory and on disk
    (HPF style)
  • ArrayLayout memory (mem, layout_rank,
    mem_layout)
  • ArrayLayout disk (disk, layout_rank,
    disk_layout)
  • // Array objects
  • Array temperature new Array (a, array_rank,
    array_size, sizeof(int), memory,
    distribution, disk, distribution)
  • Array density new Array (b, array_rank,
    array_size,
  • sizeof(float), memory, distribution, disk,
    distribution)

10
High-level array I/O interface
  • // ArrayList object to describe all arrays in a
    collective i/o
  • ArrayList simulation new ArrayList
  • (Sim1, // user
    specified name
  • simulation1.schema) //
    self-describing schema file
  • simulation-include(temperature)
  • simulation-include(density)
  • // Simulation runs for 100 timesteps
  • for (int i0 i
  • compute_next_timestep()
  • // Collective i/o operation to output the
    arrays
  • simulation-timestep()

11
Secrets to high-performance I/O
  • Devising high performance I/O strategies for
  • different platforms
  • IBM SPs, Cray T3E, Origin 2000, Intel Paragon,
    Workstation clusters
  • different I/O patterns
  • Automatic selection of proper I/O strategies
    without human intervention

12
Secrets to high-performance I/O - performance
factors of collective I/O
  • File system utilization
  • Server-directed I/O
  • Flexible array layouts on disk
  • Communication system utilization
  • Panda internal parallelism
  • I/O load balancing
  • Heterogeneous disks (NOWs, MPPs)
  • Uneven data distributions (AMR)
  • Data migration

13
Server-directed I/O - a strategy for obtaining
long sequential reads and writes to disk
Compute Nodes
SPMD Application
Panda Collective I/O Interface
Panda Clients
MPI
MPI
MPI
MPI
I/O Nodes
Panda Servers
Network
14
Why server-directed I/O?
  • No costly modifications to standard OS and file
    systems
  • Logical level data management
  • Allowing gathering/scattering larger amount of
    data per request
  • Flexible control
  • Data accesses do not depend on physical location
    of disk blocks

15
Server-directed I/O A strategy for obtaining long
sequential reads and writes to disk
Array distribution in memory
Array distribution on disk
1
2
3
1
(3 x 1) layout (Block, )
4
5
6
2
7
8
9
10
11
12
3
(4 x 3) layout (Block, Block)
1
2
3
16
Secrets to high-performance I/O
  • File system utilization
  • Server-directed I/O
  • Flexible array layouts on disk
  • Communication system utilization
  • Panda internal parallelism
  • I/O load balancing
  • Heterogeneous disks (NOWs, MPPs)
  • Uneven data distributions (AMR)
  • Data migration

17
Flexible array layouts
  • Why?
  • Improve data locality
  • Speed up data access
  • How?
  • Storage of arrays in chunks on disk

18
Array in-memory and on-disk layouts
Array layout in memory (12 compute nodes)
Array layout on disk (3 I/O nodes)
1
2
3
1
4
5
6
2
7
8
9
3
10
11
12
1
2
3
19
Performance results onthe NAS IBM SP2
  • Total nodes 150
  • MPI-F message passing library
  • latency 46 msec
  • bandwidth between two nodes 34 MB/s
  • AIX 3.2.5 file system on each node
  • write throughput per node 2.23 MB/s
  • read throughput per node 2.85 MB/s

20
Write Performance
memory distribution differs from disk
distribution
21
Secrets to high-performance I/O
  • File system utilization
  • Server-directed I/O
  • Flexible array layouts on disk
  • Communication system utilization
  • Panda internal parallelism
  • I/O load balancing
  • Heterogeneous disks (NOWs, MPPs)
  • Uneven data distributions (AMR)
  • Data migration

22
Communication system optimizations
  • Reducing the number of messages
  • Selecting optimal I/O nodes
  • Message combining
  • Communication scheduling

23
Selecting optimal I/O nodes
Problems - Select optimal I/O nodes to
minimize data transfer over network
Example array distributed (BLOCK, BLOCK)
across 2x2 compute nodes and (BLOCK, ) across
2 I/O nodes
0 1 2 3
Fixed I/O nodes 0, 1
0 1
Panda solutions - Weighted-edge graph
problem (Hungarian algorithm in polynomial time)
0 1 2 3
0 2
Optimal I/O nodes 0, 2
24
Panda write performance on an 8-node
FDDI-connected HP workstation cluster
(BLOCK,BLOCK) in memory, (BLOCK,) on disk
20
fixed
18
optimal
16
64MB
14
12
Panda Response Time (sec)
10
64MB
8
64MB
6
16MB
16MB
4
16MB
4MB
4MB
2
4MB
0
2
4
8
Number of I/O nodes
25
Message combining for fine grained data
distributions in memory
26
Panda write performance on a NOW (CYCLIC(K),
CYCLIC(K), BLOCK) in memory(BLOCK, BLOCK, ) on
disk4 compute nodes, 2 I/O nodes
5
4
3
Aggregate throughput (MB/sec)
2
1
0
8 MB
16 MB
32 MB
64 MB
8 MB
16 MB
32 MB
64 MB
Message Combination
No Message Combination
K8
K16
K32
K64
27
Secrets to high-performance I/O
  • File system utilization
  • Server-directed I/O
  • Flexible array layouts on disk
  • Communication system utilization
  • Panda internal parallelism
  • I/O load balancing
  • Heterogeneous disks (NOWs, MPPs)
  • Uneven data distributions (AMR)
  • Data migration

28
Panda internal parallelism
  • Overlap communication and file system activities
    whenever possible
  • Select proper disk unit size
  • Select proper message-passing mechanisms

29
Communication scheduling
Increase Panda internal parallelism
BEFORE AFTER
30
Secrets to high-performance I/O
  • File system utilization
  • Server-directed I/O
  • Flexible array layouts on disk
  • Communication system utilization
  • Panda internal parallelism
  • I/O load balancing
  • Heterogeneous disks (NOWs, MPPs)
  • Uneven data distributions (AMR)
  • Data migration

31
Heterogeneous disks
Problems - Different I/O capabilities
at different disks - Unbalanced I/O workload
- Poor I/O performance
1 2 3
4 5 6
7 8 9
3 4 5
1
6 7 8
9
2
32
Panda performance with heterogeneous disks
33
Uneven data distribution
Problems - Uneven data distribution -
Unbalanced I/O workload - Poor
I/O performance
Balanced workload Good performance
Unbalanced load Poor performance
34
Panda performance for Timestep operations
35
Panda performance for visualization operations
36
Secrets to high-performance I/O
  • File system utilization
  • Server-directed I/O
  • Flexible array layouts on disk
  • Communication system utilization
  • Panda internal parallelism
  • I/O load balancing
  • Heterogeneous disks (NOWs, MPPs)
  • Uneven data distributions (AMR)
  • Data migration

37
Data migration
Migration phase
I/O phase
Computation phase
Panda clients
Data
Panda servers
Solutions - Integrate data migration
and parallel I/O - Overlap data
migration with computation
Problems - Loosely coupled tertiary
storage systems with other parts of the
system - Slow data migration
Tertiary Storage
38
Data Migration Performance
H3expresso, 32 compute nodes
2500
2000
1500
Total elapsed time (sec)
1000
500
0
0
1
2
4
Number of I/O Nodes
No I/O
No Migration
Panda Migration
Naive Migration
39
Automatic parallel I/O performance optimization
  • Motivations
  • No single I/O strategy works well in general.
  • Performance robustness is a serious problem.
  • Solution
  • Develop an arsenal of strategies that work well
    under different conditions and provide
    predictable performance
  • Then automate strategy selection without human
    intervention

40
The state-of-the-art parallel I/O system
Rocket Simulation
Parallel I/O servers
Timestep Checkpoint
Network
Parallel I/O interface
Parallel I/O clients
Secondary storage
Tertiary storage
41
Automatic performance optimizationA model-based
approach
3 MB/s
Workload characteristics
Platform characteristics
Panda optimizer
Performance model
Optimization algorithms
I/O execution plans
Secondary storage
42
Performance studies
  • Platforms
  • CTC SP and ANL SP
  • Benchmarks
  • Entire array benchmark
  • Out-of-core benchmark
  • Optimized parameters
  • Array disk layouts
  • Disk unit sizes
  • Communication strategies
  • Performance metrics
  • Peak file system bandwidth utilization per I/O
    node

43
Platform characteristics

CTC SP2 ANL SP POWER2
POWER2 Super Chip High-performance
TB3 switch switch 40 MB/s
150 MB/s 56 microsec 29
microsec 32.4 MB/s 90 MB/s 6.6
MB/s 7.1 MB/s 6.3 MB/s
6.6 MB/s
Each node Interconnect Type Speed MPI
latency MPI bandwidth AIX JFS reads AIX JFS writes
44
Array disk layout selection on the CTC SP2
(entire array benchmark)
45
Array disk layout selection on the CTC SP2
(out-of-core benchmark)
1
0.5
Fraction of peak AIX JFS
throughput per I/O node
0
4
5
6
8
4
5
6
8
4
5
6
8
I/O
8 16 32
Compute nodes
Panda-selected
Default
46
Application experience - Panda and
Cactus/H3expresso
  • Cactus and H3expresso (Ed Seidels group)
  • Cactus Modular computational infrastructure for
    rapid development of high-scale numerical codes
  • H3expresso solves Einstein equations in 3D
  • Periodic output of simulation results
  • 144x144x144 run of 100 iterations outputs 570 MB

47
How Panda can help CSAR scientists?
  • Address major I/O issues faced by CSAR
    applications (AMR, Data migration)
  • Eliminate headaches of scientific data management
    from CSAR scientists
  • High-performance I/O strategies for a wide range
    of situations
  • User-friendly parallel I/O system
  • Automatic data management and I/O performance
    optimization

48
How CSAR scientists can help Panda?
  • Help the Panda developers understand the I/O
    needs of CSAR applications
  • Provide application testbed suites for evaluating
    Pandas strategies

49
Conclusions
  • High-level array I/O interface provides
  • Ease of use
  • Application portability
  • Flexibility for underlying implementations
  • Our optimization strategies provide
  • High performance for a wide range of system
    conditions
  • Automatic performance optimization strategy can
  • Select high quality I/O plans without human
    intervention

50
Work in progress
  • I/O support for uneven data distributions
  • AMR-style applications
  • I/O for Windows NT workstation cluster
    environment
  • I/O support on Cray T3E and Origin 2000
  • New release Panda 4.0
  • Remote I/O / data migration facilities

51
Related Work
  • Parallel I/O research
  • Collective I/O
  • Performance modeling
  • Automatic performance optimization
  • PPFS
  • TIP sytem
  • Database large query optimization
  • Attribute-managed storage systems
Write a Comment
User Comments (0)
About PowerShow.com