Coupling Parallel IO with Remote Data Access - PowerPoint PPT Presentation

About This Presentation
Title:

Coupling Parallel IO with Remote Data Access

Description:

mpiiosrm.h libmpiiosrm.a, libmpiiosrm.so ... With extendibility in all dimensions, not just one. For both dense arrays and sparse arrays ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 18
Provided by: ekow
Learn more at: https://sdm.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: Coupling Parallel IO with Remote Data Access


1
Coupling Parallel IO with Remote Data Access
  • Ekow Otoo, Arie Shoshani, Doron Rotem,
  • and Alex Sim
  • Lawrence Berkeley National Lab.

2
Outline
  • Project objectives
  • Status and accomplishments
  • Usage in an application
  • Extensions
  • Other future work

3
Project Objectives
  • Development of the MpiioSrm library
  • mpiiosrm.h libmpiiosrm.a, libmpiiosrm.so
  • Allows near-online access to files on mass
    storage system (e.g., HPSS), from MPI
    applications on a linux cluster
  • Access files from local and remote MSS with MPI
    applications.
  • Applications on a Linux cluster having
  • Local parallel file system (PVFS2) and
  • HPSS as the remote mass storage system

4
Status 1Libmpiiosrm Module Dependencies
MPI Applications
High Level Access and Control
mpiiosrm
Record Structured File Access
Low Level File System Access
ADIO
PVFS2
GPFS
UFS
XFS
Other
5
Status 2Main Functions
  • Functions in libmpiiosrm.a
  • MPI_File_srm_proxy_init()
  • MPI_File_srm_open() in place of
    MPI_File_open()
  • MPI_File_srm_close() in place of
    MPI_File_close()
  • MPI_File_srm_delete() in place of
    MPI_File_delete()
  • MPI_File_srm_proxy_destroy()
  • Functions (2) and (4) take a file name as one of
    its parameters.
  • Note name changes from last meeting.

6
Status 3Major Changes Since Last AHM
  • Function names revised
  • MPI_File_srm_proxy_init() function starts an SRM
    client as a detached thread.
  • Only the process with the proxy_rank spawns this
    thread.
  • Use of PVFS2
  • srm_put() implemented for MPI_File_writes, i.e.,
  • Files can now migrate from PVFS2 to HPSS.
  • Still being tested.

7
Usage in an Application
  • Steps for reading remote files
  • Prepare an input file for the program
  • A file containing the file names to be read from
    HPSS if not found in local parallel file system.
  • Initiate grid-proxy-init() password, etc.
  • User requires a grid certificate
  • Start a namesrver, drmServer and a trmServer
  • Compile the program to be executed
  • Run mpiexec n XX ltprognamegt to access files
    given in the input file.

8
Usage in an ApplicationInput file Layout
  • Implicit layout of parallel files
  • Uses default PVFS configuration
  • Alternatively use keys of MPI-IO File hints
  • Specify only pairs of source and destination URLs
  • Explicit layout specifies for each file
  • Pairs of source and destination URL
  • Start_IO_Node
  • Striping factor
  • Striping unit

9
Usage in an ApplicationProgram Skeleton
  • MPI_Init()
  • MPI_Info_create()
  • MPI_Info_set()
  • MPI_File_srm_proxy_init()
  • MPI_File_srm_open()
  • ltProcess file read with standard MPI_File_
    operationsgt
  • MPI_File_srm_close()
  • MPI_File_srm_proxy_destroy()
  • MPI_Finalize()

10
Extensions
  • File Control
  • Data vs File Access
  • Multi-site Access
  • Fault tolerance Failsafe

MPI Applications
Srm-Server
Srm- -Client
DRM
TRM
MPI-IO
11
Extensions - 2
  • Control of Prefetching of File Bundles
  • Process files, one at a time, by availability
  • Process files, one at a time, by sequence
  • Process files by bundles
  • Data Access instead of File Access only
  • Allow for file filtering at the source SRM
  • Use of select criteria and indexes to generate
    only relevant data

12
Extension - 3
  • Multi-Site Access
  • Extend access to other MSS implementing SRM
    specs.
  • Access files from multiple sites in a session
  • Extensions to Xrootd servers
  • Fault Tolerance and Failsafe Operations
  • Easier now with multiple srm_client proxies being
    spawn as threads
  • Access from C and Fortran

13
Other Future Work
  • Parallel Multidimensional Index Schemes
  • Repertoire of high and low dimensional indexing
    methods for scientific applications
  • High dimensions
  • Bitmaps (John, Kurt, etc)
  • Others
  • Low Dimensions (1 8)
  • R-Tree, Order Preserving Extendible hashing,
  • Multi-level Grid File
  • String Searching Methods Suffix trees,
    PATRICIA, etc.

14
Other Future Work - cont.
  • Extendible Multidimensional Array Files
  • With extendibility in all dimensions, not just
    one
  • For both dense arrays and sparse arrays
  • Efficiently accessible in MPI with irregular
    distributed array method using map arrays.
  • Multi-resolution array files

15
Other Proposed Activities cont.
Array Mapping Method for k dims, ANN Element Access Ops with E extensions and constant k. Storage Size, element size s, integer size w
Conventional Method, Extendible in 1 dimension only O(1) wk sNk
Index Array, Extendible in any dimensions O(1) wNk(k1) sNk
Index Array Tree, Extendible in any dimensions O(ln E) w((k6)E - 3) sNk
16
Example of Mapping Function
0
2
1
4
3
5
7
6
8
9
i1
0
1
1
0
2
12
16
35
42
49
72
81
0
3
4
5
13
17
36
43
50
73
82
1
6
7
8
14
18
37
44
51
74
83
2
5
4
9
10
11
15
19
38
45
52
75
84
3
gt
gt
lt
lt
8
39
46
53
76
85
4
21
20
22
23
24
7
3
1
3
0
0
40
47
54
86
25
26
27
28
29
77
5
lt
41
48
55
30
32
31
34
77
87
6
33
1
72
9
8
1
4
20
4
56
57
58
59
60
61
63
62
78
88
7
1
8
7
1
56
6
35
5
64
65
67
66
69
68
70
71
79
89
8
Red-Black-Like Binary Tree
i0
17
  • The End
Write a Comment
User Comments (0)
About PowerShow.com