Title: Memory Leak Detection in CAM
1Memory Leak Detection in CAM
Dirty method (Sidd) Sophisticated method using
Totalview debugger (Juli)
2Signature of Memory Leak Ps l u sghosh F S
UID PID PPID C PRI NI ADDR SZ
WCHAN TTY TIME CMD 300001 A 3092 21846
65764 120 60 20 647591 196872 -
005 cam 300001 A 3092 21846 65764 120 60
20 647591 474588 - 035 cam
300001 A 3092 21846 65764 64 60 20 647591
505712 - 105 cam 300001 A
3092 21846 65764 67 60 20 647591 517560
- 135 cam 300001 A 3092 21846
65764 70 60 20 647591 529456 -
205 cam 300001 A 3092 21846 65764 73 60
20 647591 537180 - 235 cam
300001 A 3092 21846 65764 78 60 20 647591
549284 - 305 cam 300001 A
3092 21846 65764 81 60 20 647591 561168
- 335 cam 300001 A 3092 21846
65764 83 60 20 647591 569064 -
405 cam 300001 A 3092 21846 65764 87 60
20 647591 580768 - 435 cam
3A small C routine to get memory usage include
ltsys/resource.hgt include ltstdlib.hgt void
getrss( int mem ) struct rusage
usage int rc rc
getrusage(RUSAGE_SELF, usage) mem
usage.ru_maxrss
4And a Fortran wrapper subroutine getmem( file,
line ) use mpi character() file integer
line, mem, err integer, save prevmem 0, tid
0 if ( prevmem .eq. 0 ) call MPI_Comm_rank(
MPI_COMM_WORLD, tid, err ) call getrss( mem ) if
( tid .eq. 0 .and. mem .gt. prevmem ) then
write(6,'("From getrss",(a60),"",i6,2x,i8,2x,i8)
')trim(file),line,(mem-prevmem),mem prevmem
mem end if end subroutine getmem
5Insert call to this Fortran wrapper after each
routine in main time loop do while ( .not.
nlend ) ! Phase 1 of atmosphere run
call atm_run1( atm_out, atm_in ) call
getmem(__FILE__,__LINE__) And the filterred
stdout 0From getrss
../cam.F90 160 3948 571728 0From
getrss ../cam.F90 160 3948
575676 0From getrss ../cam.F90
160 3880 579556 0From getrss
../cam.F90 160 3860 583416
0From getrss ../cam.F90 160
3864 587280 0From getrss
../cam.F90 160 3860 591140 0From
getrss ../cam.F90 160 3860
595000 0From getrss ../cam.F90
160 3860 598860 0From getrss
../cam.F90 160 3864 602724
0From getrss ../cam.F90 160
3860 606584 0From getrss
../cam.F90 160 3860 610444
6Line 160 of ../cam.F90 call atm_run1(
atm_out, atm_in ) call getmem(__FILE__,__LINE__)
- The routine atm_run1 or a routine below that call
stack has the leak.. - insert similar calls there right after each
subsequent routine call - the portion of stdout now
- 0From getrss ../cam_comp.F90 222
3860 579584 - 0From getrss ../cam_comp.F90 231
20 579604 - 0From getrss ../cam_comp.F90 222
3860 583464 - 0From getrss ../cam_comp.F90 222
3864 587328 - 0From getrss ../cam_comp.F90 222
3860 591188 - 0From getrss ../cam_comp.F90 222
3860 595048 - 0From getrss ../cam_comp.F90 222
3860 598908 - 0From getrss ../cam_comp.F90 222
3864 602772 - 0From getrss ../cam_comp.F90 222
3860 606632 - In couple of similar steps .. We are at the
leaking routine.. Examine all - The allocate/deallocate statements and .. Fix!
7Link to malloc replacement library
- configure -spmd -nosmp -dyn fv -res 1x1.25 \
- -cam_exedir ../run -usr_src USRSRC -ldflags \
- "-L/usr/local/totalview/toolworks/totalview.7.1.0
-1/rs6000/lib \ - -L/usr/local/totalview/toolworks/totalview.7.1.0-1
/rs6000/lib \ - /usr/local/totalview/toolworks/totalview.7.1.0-1/r
s6000/lib/aix_malloctype64_5.o - build-namelist -csmdata /fis/cgd/cseg/csm/inputdat
a \ - -o ../run/namelist -namelist "camexp nsrest0
nelapse-5 mss_irt0 nrefrq0 /"
8Run CAM under totalview - bluesky
- !/bin/csh
- _at_ account_no XXXXXXXX
- _at_ wall_clock_limit 14400
- _at_ outputout.(jobid)
- _at_ errorerr.(jobid)
- _at_ job_typeparallel
- _at_ network.MPIcsss,shared,IP
- _at_ node_usageshared
- _at_ node2
- _at_ total_tasks2
- _at_ classshare
- _at_ queue
- setenv MP_PGMMODEL SPMDsetenv MP_COREFILE_FORMAT
xxx - setenv XLSMPOPTS "stack100000000setenv
OMP_NUM_THREADS 1 - setenv MP_LABELIO yessetenv MP_PROCS 2
- totalview poe -a ./cam
9 10(No Transcript)
11- Analyzing memory leak takes a while
12- Drilling down to line numbers