Title: SCEC Data Management in SRB Digital Library
1SCEC Data Management in SRB Digital Library
- Reagan Moore
- George Kremenek
- Yifeng Cui
- Yuanfang Hu
- Jing Zhu
- Marcio Faerman
- San Diego Supercomputer Center
- University of California, San Diego
- SCEC/CME All Hands Meeting
2Outlines
- SCEC data size in SRB digital library (DL)
- Data organization in SRB DL
- Data integrity in SRB DL
- Metadata management in SRB DL
3SCEC Data Size
- 3D ground motion collection for the LA basin
- 60 scenarios, 150M data / scenario
- TeraShake 1 2
- 4 TeraShake1 runs, 6 TeraShake2 runs (wave
propagation dynamic rupture) - Each simulation generates
- Surface data and its derivatives
- 1.2 TB surface velocity data
- 0.4 TB velocity magnitude data
- 1.2 TB surface displacement data
- 0.4 TB displacement magnitude data
- 1.2 TB surface seismograms data
- 5 TB volume data (optional)
- Each file has one backup copy at HPSS or SRB tape
- Other data checkpoints and visualizations
- Total 168 TB, 3.5 million files
4SCEC Data Organization
- 3D ground motion collection for the LA basin
- Organized by scenarios
- TeraShake 1 2 simulation
- simulation id visualization
- checkpoint
- input
- output surface-velocity vmag
- peak
- surface-displacement dmag
- peak
- surface-seismograms
- volume
5Data Integrity in SRB DL
- Replication
- Output data of each simulation run have backup
copies at HPSS or SRB tape - Md5 checksums
- Surface velocity data of each simulation run have
md5 checksums as metadata - Data mutual backup
- Surface data and seismograms data mutually backup
each other - Codes to convert from surface data to seismograms
data, or from seismograms data to surface data
6Metadata Management in SRB DL
- Collection level
- Every SCEC collection has a metadata attribute to
briefly describe what kind of data inside the
collection - File level
- Every surface velocity file has a metadata
attribute to keep its md5 checksum for data
integrity - Every seismograms file has metadata attributes to
describe properties associated with the file -
Sufmeta xhist00001 0 simulation_level leaf
4 data_product_file_sequence_number 1 6
computation_float_size 4 2 data_product_type
seismogram 3 data_product_component east
5 computation_endian big 1 simulation_id
7
7SRB Metadata Management Tool Set
- Provide facilities to Create, Display, Update and
Remove SRB metadata - For either SRB collections or SRB files
- In an easy-to-use manner
- Each operation needs an input file in certain
format - Suitable for large-scale metadata operations
- Millions of files in SCEC simulation
- Tool set is located at SCEC CVS direction
/home/cvs/SRB_Tools
8Create Operation
- Function create new metadata entries for
collections/files according to attribute lists in
input file - Input file format
- SRB collection
- ltdirgt collection path (relative or absolute
path) - metadata attribute n value n
- SRB file
- ltfilegt file path (relative or absolute path)
- metadata attribute n value n
- Example
- Command line ./create.pl input_create
more input_create ltdirgt/home/sceclib.scec/test/m
y_test DC.title This is a test collection
DC.date 2006-07-10 ltfilegt/home/sceclib.scec/te
st/my_test/test1.dat DC.title test file
1 ltfilegt/home/sceclib.scec/test/my_test/test2.dat
DC.title test file 2
9Display Operation
- Function Display the metadata of
collections/files listed in input file - Input file format
- SRB collection ltdirgt collection path (relative
or absolute path) - SRB file ltfilegt file path (relative or absolute
path) - Example
- Command line ./display.pl input_display
- Output
more input_display ltdirgt/home/sceclib.scec/test/
my_test ltfilegt/home/sceclib.scec/test/my_test/test
1.dat ltfilegt/home/sceclib.scec/test/my_test/test2.
dat
./display.pl input_display /home/sceclib.scec/t
est/my_test 0 DC.title This is a test
collection 1 DC.date 2006-07-10
/home/sceclib.scec/test/my_test/test1.dat 0
DC.title test file 1 /home/sceclib.scec/test/my
_test/test2.dat 0 DC.title test file 2
10Update Operation
- Function Update metadata entries for
collections/files according to attribute lists in
input file. If metadata attributes already exist,
then update their value otherwise, create them
as new metadata entries - Input file format
- SRB collection
- ltdirgt collection path (relative or absolute
path) - metadata attribute n value n
- SRB file
- ltfilegt file path (relative or absolute path)
- metadata attribute n value n
- Example
- Command line ./update.pl input_update
more input_update ltdirgt/home/sceclib.scec/test/m
y_test DC.place SDSC ltfilegt/home/sceclib.sce
c/test/my_test/test1.dat DC.date 2006-07-13
11Remove Operation
- Function Remove all metadata of
collections/files listed in input file - Input file format
- SRB collection ltdirgt collection path (relative
or absolute path) - SRB file ltfilegt file path (relative or absolute
path) - Example
- Command line ./remove.pl input_remove
more input_remove ltdirgt/home/sceclib.scec/test/m
y_test ltfilegt/home/sceclib.scec/test/my_test/test1
.dat ltfilegt/home/sceclib.scec/test/my_test/test2.d
at
12Summary
- SRB digital library provides data management
facilities to SCEC - Large space to hold 100 TB data
- Replication, checksum and mutual backup
mechanisms for data integrity - Easy-to-use metadata management tool set