Title: Effective Use of NERSC File Systems
1Effective Use of NERSC File Systems
- Thomas M. DeBoni
- NERSC/USG
2Effective Use of NERSC File Systems
- Contents
- Home Directories
- Scratch Space
- Mass Storage
- Networked File Systems
- Resource Conservation
- Examples
- Details
- See also
- http//home.nersc.gov/training/tutorials/file.mana
gement.html
3Home Directories
- Your private portion of the file systems on a
computer - Default current working directory when you log
in - Knows as system environment variable HOME
- Usage limited in bytes
- Max size is 2 GB on T3E, 5 GB on J-90s
- Warnings issued at 75-90 usage
- Theres not enough space for everybody to use
this much at the same time, so migration
sometimes happens - Usage limited in inodes
- An inode is a file or directory
- Max number is 6000 on T3E, unsettled on J-90s
- Warnings issued at 75-90 usage
4Home Directories, cont.
- HOME
- is routinely backed up
- is shared among all J-90s
- is NOT the fastest file system available
- should be used for development, debugging, pre-
and post-processing, and other administrative
tasks - should NOT be routinely used by large jobs
requiring high performance - Startup files (.cshrc, .login, etc.) may change
your working directory on login - Remove all references to WRK from these files
5Home Directories, cont.
- Files can be migrated to backing store
- Largest and oldest files first
- De-migrate with dmget command before using, or
- Automatically de-migrated when referenced, but
with unknown delay - Example listing
- killeen 257 ls -al
- total 64
- drwx------ 2 u10101 zzz 4096 Sep 21
1111 . - drwxr-xr-x 5 u10101 zzz 4096 Sep 21
1111 .. - mrw------- 1 u10101 zzz 2414 Sep 21
1111 decomp.job.log - mrw------- 1 u10101 zzz 2712 Sep 21
1111 decomp.job.out - -rw------- 1 u10101 zzz 2381 Sep 21
1111 decomp.job2.log - -rw------- 1 u10101 zzz 11490 Sep 21
1111 decomp.job2.out
m in first column means file has been migrated
6Home Directories, cont.
- A word about quotas
- Use the quota command to view them
- mcurie 154 quota
- File system /u5
- User deboni, Id 9950
- Aggregate blocks (512 bytes)
Inodes - User Quota 3906240 ( 19.2)
3500 ( 17.2) - Warning 3515616 ( 21.4)
2975 ( 20.2) - Usage 751416
602
Current usage is this of max usage is
this of warning level
- Maximum usage
- allowed
- Level at which
- warning will be
- issued
- Current usage
7Scratch Space
- Also known as temporary storage or working
storage - A pool of fast RAID drives
- The fastest file system available
- Unique to each batch system
- Not backed up
- Usage limits are larger than for HOME
- 75 GB and 5000-6000 inodes on T3E, unsettled on
J-90s - Should be used for large files and high
performance jobs - This is transient space, and persistence will
vary with usage and demand
8Scratch Space, Cont.
- System environment variable TMPDIR
- Created for each session or batch job
- Randomly named, so always use TMPDIR to refer to
it - Deleted at the end of session or job
- Use TMPDIR if you want the OS to manage your
scratch space usage for you. - E.g., on the J-90s, you cant log on to batch
machines, and each has its own scratch space, so
you cant get at it directly, as you can on the
T3E.
9Scratch Space, cont.
- /tmp or /usr/tmp
- Create directories there for yourself
- Watch out for name collisions with other users
directories - Delete files and directories as you finish with
them - This space will be scavenged depending on demand
largest and oldest files are usually deleted
first - It should be safe for 7 to 14 days
- You must manage this scratch space for yourself
10Scratch Space, cont.
- Pre-staging files to scratch space is a good
idea, but... - You dont know when your batch job will run, so
it may not work when batch queues are heavily
loaded - Staging files in a batch script is a good idea,
but... - It idles your processor ensemble and uses up
serial time - So, do it both ways
- !/bin/csh -f
- Change to scratch directory
- cd /tmp/mydir
- Check for presence of pre-staged files
- if (-e foo.input) then
- echo "input file prestaged"
- else
- echo "fetching input file"
- hsi "get foo.input"
- endif
11Scratch Space, cont.
- What about intermediate I/O? Example to follow
- What about I/O from parallel programs?
- This is a deep topic
- Beware of interleaving of multiple outputs to a
single file - J-90 codes typically use a small number of files
at a time - T3E codes may use hundreds of files at a time
- Special mechanisms exist to manage parallel files
and I/O - General rule the more you do, the faster it
should be - See these sources for further info
- http//home.nersc.gov/software/prgenv/opt/binary.h
tml - http//home.nersc.gov/training/tutorials/T3E/IO/
- http//www.cray.com/swpubs/
- (See, especially , " CRAY T3E Fortran
Optimization Guide, SG-2518 3.0)
12Mass Storage
- NERSC provides the High Performance Storage
System (HPSS) - A modern, flexible, hierarchical system
- Optimized for large files and fast transfers
- Built on multiple disk farms and tape libraries
- Used for system backups, file migration, and user
archives - Has multiple user interface utilities
- HSI - powerful, flexible, convenient utility,
from SDSC and NERSC - pftp - parallel ftp, locally customized, fastest
for large files - ftp - traditional version, available everywhere
- The proper place to save large, important,
long-lived files (e.g. raw output and restart
data) - Requires a separate storage account (DCE
account), but can automatically authenticate
after initial setup
13Networked File Systems
- Networked or distributed file systems are
intended to decouple a files physical location
from its logical location - Can be very convenient, but also dangerous
- There are three of interest
- NFS - Developed at Sun and has become a standard
in workstation environments - Used as little as possible at NERSC, due to
security and performance concerns - AFS - More modern, global in scope, with pretty
good security - Used at NERSC via the gateway system
dano.nersc.gov - Use AFS with care - it can ruin performance
- DFS - A coming standard that NERSC is evolving
toward also has good security and will be global
in scope
14Resource Conservation
- Critical resources are expensive and rare
- They are shared among (competed for by) all users
- Four critical resources related to file systems
use - Storage space - the actual files and bytes of
data - File system entries - inodes one per file or
directory - Bandwidth - bits per second, in transfers between
devices - Time - servers, I/O devices, and CPU cycles
- NERSC meters (charges for) all these types
- Resource conservation must be engineered in
(dont depend on luck)
15Bandwidth Conservation
- Design parallel I/O carefully
- Human readable I/O probably should be done by a
single (master) process(or) to/from a single file - Binary I/O may be done by one process(or) or by
many, as required - Binary data may occupy many files which match
problem decomposition for parallel execution - Limits exist on the number of files that can be
open at any one time - Flushing larger buffers is usually a better idea
than flushing smaller ones more often. - For further info, see
- Cray publication Application Programmer's I/O
Guide - NERSC web doc http//home.nersc.gov/training/tutor
ials/T3E/IO/ - Man page for the assign command
16Bandwidth Conservation, cont.
- Transfer files carefully
- Session setup is not free - move files to/from
mass storage in as few sessions or commands as
possible dont run pftp in a loop - Use multiple-file transfer commands, such as
mget, when possible - Use the appropriate utility for the job
- Meta-data operations do not involve actual file
access - Renaming files or directories or moving files
around within HPSS - changing file or directory permissions in HPSS
- Use hsi for these sorts of operations, for
efficiency - For further info see
- NERSC web doc http//home.nersc.gov/hardware/stora
ge/hpss.html - NERSC web doc http//home.nersc.gov/hardware/stora
ge/hsi.html
17Bandwidth and Time Conservation
- Use the fastest utility available
- Use pftp when moving within the NERSC domain
- Use multiple-file transfer commands, such as
mget, when possible - Use ftp when moving files into or out of the
NERSC domain - Use the fastest devices and networks available
- This is a deep area and oversimplified here ...
- Dont make a fast machine wait on a slow one
- Pre-stage and de-migrate files needed by batch
jobs into fast storage space - Sometimes a multiple-step process is better
- First, move files from outside NERSC onto a NERSC
computer (a workstation) - Then, move files from the NERSC computer to the
destination device - Avoid networked file systems
- For further info see
- NERSC web doc http//home.nersc.gov/hardware/stora
ge/hpss.html - Man pages for ftp, pftp, and hsi
18Bandwidth and Storage Conservation
- Shrink files, if appropriate
- If the file contains redundant or unimportant
data - Such as white space in formatted output
- Use Unix commands compress and gzip
- Combine files into archives
- If the files are small, transferring them
individually may involve more setup time than
transfer time - Use Unix commands tar, ar, and cpio
- For more info, see
- Man pages for all the above commands
19Example 1 - batch pftp multiple file access with
a here-doc
- !/bin/csh
- ...
- First, copy the source from the submitting
directory and compile it. - pftp -i -v archive ltlt
- cd my_HPSS_directory
- mget data myprog
- quit
-
- ja
- ./myprog ltdata gtoutfile
- ja -cst
-
- Save the output file in HPSS.
- pftp -i -v archive ltlt
- cd my_HPSS_directory
- mput outfile restart
- quit
-
Here-document
20Example 2 - batch hsi multiple file access
- !/bin/csh
- ...
- First, copy the source from the submitting
directory and compile it. - hsi archive cd my_HPSS_directory mget data
myprog - ja
- ./myprog ltdata gtoutfile
- ja -cst
- Save the output file in HPSS.
- hsi archive cd my_HPSS_directory mput outfile
restart - exit
21Example 3 - Minimizing parallel cpu idling during
pftp I/O
- !/bin/csh -f
- preliminary job steps, including fetching
executables and input files - . . .
- parallel code execution with mpprun
- set i 1
- mpprun -n 128 a.out lt bigjob.input gt
bigjob.output - Intermediate file movement, to save output file
to mass storage - mv bigjob.output bigjob.outputi
- mv bigjob.restart bigjob.restartI
- Generate a separate serial job to do the
actual I/O - echo "pftp -i -v archive ltltEOF\
- mkdir bigjobs/job.06.15.99\
- cd bigjobs/job.06.15.99 \
- mput bigjob.outputi bigjob.restarti \
- ls\
- quit\
- EOF" qsub -q serial
22Example 4 - Minimizing parallel cpu idling during
HSI I/O
- !/bin/csh -f
- preliminary job steps, including fetching
executables and input files - . . .
- parallel code execution with mpprun
- set i 1
- mpprun -n 128 a.out lt bigjob.input gt
bigjob.output - Intermediate file movement, to save output file
to mass storage - mv bigjob.output bigjob.outputi
- mv bigjob.restart bigjob.restarti
- Generate a separate serial job do to the
actual I/O - echo hsi archive mkdir bigjobs/job.06.15.99 cd
bigjobs/job.06.15.99 mput bigjob.outputi
bigjob.restarti ls qsub -q serial - . . .
- further parallel code execution, perhaps
through shell script looping - _at_i i 1
- . . .
23Example 5 - Minimizing usage with tar and compress
- mcurie 181 ls -al STD bigjob
- -rw-r--r-- 1 deboni mpccc 0 Dec 23
1226 STDIN.e48938 - -rw-r--r-- 1 deboni mpccc 0 Dec 23
1227 STDIN.e48939 - -rw-r--r-- 1 deboni mpccc 1126 Dec 23
1226 STDIN.l48938 - -rw-r--r-- 1 deboni mpccc 1126 Dec 23
1227 STDIN.l48939 - -rw-r--r-- 1 deboni mpccc 7181 Dec 23
1226 STDIN.o48938 - -rw-r--r-- 1 deboni mpccc 7181 Dec 23
1227 STDIN.o48939 - -rw------- 1 deboni mpccc 486 Feb 4
1124 bigjob.output - -rw------- 1 deboni mpccc 486 Feb 4
1210 bigjob.output1 - -rw------- 1 deboni mpccc 486 Feb 4
1210 bigjob.output2 - -rw------- 1 deboni mpccc 972 Feb 4
1124 bigjob.restart - -rw------- 1 deboni mpccc 972 Feb 4
1210 bigjob.restart1 - -rw------- 1 deboni mpccc 972 Feb 4
1210 bigjob.restart2 - --------------------total space 20988 bytes
and 12 inodes - mcurie 182 tar cf batch.tar STD bigjob
- mcurie 183 ls -al batch.tar
- -rw------- 1 deboni mpccc 65536 Feb 5
0915 batch.tar - mcurie 184 compress batch.tar
- mcurie 185 ls -al batch.tar
24Example 6 - Minimizing usage with cpio
- cd HOME
- /bin/find . -type f -size -15000c -atime 90 ! \
- ( -type m -o -type M \) \ -print gt hitlist
- vi hitlist
- cat hitlist cpio -co gt myfiles.cpio
- cat hitlist xargs rm -f
- Here's what the above commands (NOT a shell
script!) do - 1) First, cd to home directory and generate a
list of eligible files - 2) the find command will find regular files
smaller than 15000 chars, that have not been
accessed in 90 days, and are not migrated. - 3) Use "vi" to examine the list and delete items
from it that you do not want removed. - 4) The fourth line will create the cpio file
archive. - 5) The fifth line will remove all the files now
stored in the archive.
25Details - some useful ftp and pftp commands
- FTP Commands Meanings or actions
PFTP Variants - get ltrfgt ltlfgt retrieve a file
pget - put ltlfgt ltrfgt store a file
pput - mget ltfgt ltfgt retrieve multiple files
mpget - mput ltfgt ltfgt store multiple files
mpput - del ltfgt delete a file
- mdel ltfgt ltfgt delete multiple files
- mkdir ltdgt create a remote directory
- rmdir ltdgt delete a remote directory
- cd ltdgt change to remote dirctory
- lcd ltdgt change to local directory
- ls, dir list files in directory
- ldir list files in local directory
- !ltcmdgt perform ltcmdgt locally outside
ftp/pftp - ltfgt file name, ltlfgt local file name, ltrfgt
remote file name - Caveats
- Be aware of where your actions will take place
- Watch out for name collisions
26Details - HSI commands
- HPSS File and Directory Commands
- get, mget, recv - Copy file(s) from
HPSS to a local - directory
- cget - Copy file from HPSS
to a local - directory if not
already there - put, mput, replace,
- save, store, send - Copy local file(s)
to HPSS - cput - Copy local file to
HPSS if it - doesnt already
exist there - cp, copy - Copy file within
HPSS - mv, move, rename - Rename/relocate an
HPSS file - delete, mdelete, erase, rm - Remove a file from
HPSS - ls, list - List directory
27Details - HSI commands, cont.
- HPSS File and Directory Commands, cont.
- find - Traverse a
directory tree looking - for a file
- mkdir, md, add - Create an HPSS
directory - rmdir, rd, remove - Delete an HPSS
directory - pwd - Print current
directory - cd, cdls - Change current
directory - Local File and Directory Commands
- lcd, lcdls - Change local
directory - lls - List local
directory - lpwd - Print current local
directory - ! - Issue shell command
28Details - HSI commands, cont.
- File Administrative Information
- chmod - Change permissions of file or
directory - umask - Set file creation permission
mask -
- Miscellaneous HSI commands
- help - Display help file
- quit, exit, end - Terminate HSI
- in - Read commands from a local
file - out - Write HSI output to a local
file - log - Write commands and responses
to a log file - prompt - Toggles prompting for mget,
mput, mdelete -
29Details - HSI commands, cont.
- HSI can accept input several different ways
- From a command session, consisting of multiple
lines and ending with an explicit termination
command - From a single line command, with semicolons ()
separating commands - hsi mkdir foo cd foo put data_file
- From a command file
- hsi in command_file
- HSI can read from standard input and write to
standard output - tar cvf - . hsi put - datadir.tar
- hsi get - datadir.tar tar xvf -
- Wildcards are supported, but quoting must be used
in one-line commands to prevent shell
interpretation. - hsi cd foo mget data
30Details - HSI commands, cont.
- WARNING For 'get' and 'put' operations, HSI uses
a different syntax than ftp a colon () is
used to separate local and remote file names. - put local_file hpss_file
- get local_file hpss_file
- Recursive operations are allowed for the
following commands - cget, chgrp, chmod, chown, cput, delete, get,
ls, - mdelete, mget, mput, put, rm
- Special commands exist for setting up variables
whose values are directories, commands, and
command-sets. - The complete HSI manual is online at
http//home.nersc.gov/hardware/storage/hsi.html
31Details - Tasks that HSI Simplifies
- Accessing segmented CFS files
- CFS handled files larger then 400 MB by spliting
them into smaller subfiles and storing the
subfiles. HSI is the only utility that can read
and rejoin segmented CFS files to reproduce their
original state. The procedure for handling such
files is quite simple simply read the first of
the segmented subfiles from the archive storage
system. - Renaming/moving or copying an entire subdirectory
- ltmv/cpgt path1 path2 renames/copies path1 to
path2 - Changing the permissions of several files at
once - chmod perms files changes the permissions
of all files to perms the
file specifications may
include wildcards the permissions may
be given as octal numbers or via
symbolic designators.
32Details - Getting Access to AFS Directories
- Dont do this in batch jobs!
- killeen 210 telnet dano.nersc.gov
- Trying 128.55.200.40...
- Connected to dano.nersc.gov.
- Escape character is ''.
- Hello killeen.nersc.gov.
-
-
- gt WARNING Unauthorized access to this
computer system is lt - gt prohibited, and is subject to criminal
and civil penalties. lt - gt Use constitutes consent to security
testing and monitoring. lt -
-
-
- UNIX(r) System V Release 4.0 (dano)
-
- login u10101
- Password
33Details - Getting Access to AFS Directories, cont.
-
- AFS gateway user interface
-
- used to enable AFS access on J90's
and T3E - (select enable attached hosts (1)
before exiting) -
-
- 1) enable attached hosts (knfs)
- 2) disable attached hosts (unlog)
- 3) list tokens (tokens)
- 4) authenticate to another cell (klog)
-
- 5) help
- 0) exit (logoff)
-
-
- enter command(0-5) 3
-
- Tokens held by the Cache Manager
34Details - Getting Access to AFS Directories, cont.
- enter command(0-5)0
- Connection closed by foreign host.
-
- killeen 213 pwd
- /U3/u10101
- killeen 214 cd /afs/nersc.gov
- killeen 215 pwd
- /afs/nersc.gov
- Option 4 is used to attach other cells,
regardless of location, but you must have login
and password to use in the klog process.
35Details - Dealing With Your DCE Account
- DCE is a modern authentication methodology that
will likely evolve into general use at NERSC - Right now, it merely control access to HPSS
- DCE accounts and login/password info must be
gotten from NERSC Support staff - Initial login is necessary to change from initial
password, and to set up future automatic
authentication - It has been occasionally necessary for a few
users to re-initialize their accounts - Both procedures are easy
- DCE is currently most reliable on
killeen.nersc.gov
36Details - Dealing With Your DCE Account, cont.
- Initial Setup Once you have your initial DCE
login and password, change it with the following
procedure, on any NERSC mainframe - dce_login
- Enter Principal Name ltHPSS_user_namegt
- Enter Password ltcurrent_or_temporary_HPSS/DCE_
passwordgt - chpass -p
- Changing registry password for HPSS_user_name
- New password ltnew_HPSS/DCE_passwdgt
- Re-enter new password ltnew_HPSS/DCE_passwdgt
- Enter your previous password for verification
ltcurrent_or_temporary_HPSS/DCE_passwordgt - kdestroy
- exit
-
- You will need to log in to HPSS only on your next
use, and thereafter you will be automatically
authenticated.
37Details - Dealing With Your DCE Account, cont.
- If you should get the following message from
HPSS... - mcurie 224 hsi hpss
- credential user mismatch
- use -l option to generate a new cred file
- DCE Principal
- it means automatic authentication has failed.
- You must authenticate manually, until you
re-initialize authentication - mcurie 232 hsi -l hpss
- DCE Principal u10101
- Password
- ------------------------------------------------
----------- - NERSC HPSS USER SYSTEM(hpss)
- ------------------------------------------------
----------- - V1.5 Username u10101 UID 0123
- ? quit
- Subsequent usage should not require full login.
- In rare and unusual situations, do rm .hsipw
and then repeat the above.