Title: Preliminary Look
1Preliminary Look
- Steven Piepers
- Quantum Monte Carlo Code
2What is the code?As far as I can tell
- Quantum Monte Carlo
- Generate input file
- Chomp (all the time)
- Solving wave equation/minimizing potentials
- Send minimized nuclei to master
- Possible redist work
- Synchronize
- Lots of repetition
3Help from the source
- Profiling pre-exists
- Time, FLOPS, memory
- Memory is just IBM/Mac
- FLOPS counts in code
- MPI
- Tracks the amount communicated in messages
4How does it perform?
- Computation takes from 80-95 of wall clock time
- MPI is very simple
- One whole nucleus per proc
- Memory Limitation
- Keeps MPI costs very low
- i.e. can run on ethernet
- So, it scales quite well.
5Basic Algorithm
- Init fbn wave function
- Init some positions (randomly)
- Init wave functions and probability density
- Propogate time/synchronize procs
6- Number Total msec/
Flop/ MFLOPS - of cases min. case
case - Wavefunctions 670250 1.8
0.163 165499 1014.881 - 2-body Prop. 0 0.0
0.000 0 0.000 - 3-body Prop. Vijk 0 0.0
0.000 0 0.000 - Propagation step 0 0.0
0.000 0 0.000 - Other Propagation 0 0.0
0.000 0 0.000 - Other vij 0 0.0
0.000 0 0.000 - Total accounted for time 110926. MFLOP
1.8 Min 1014.881 MFLOPS - MFLOP/Wall-second 2857.285
- 96.1 of total compute time is accounted for
- ..
- Master got 53.45 Mbytes 1.3767
Mbytes/sec - Master sent 0.00 Mbytes 0.0000
Mbytes/sec - Master got 30005 messages 772.8822
messages/sec
7Totals for all slaves Num
Cases Total Time Time/case
Min.
Sec. Propagation 4059060
116.9 0.002 Branching 379
0.0 0.003 Energies
223050 59.2 0.016
Kinetic
22.2 0.006 2-b Potential
31.9 0.009 3-b
Potential 4.4
0.001 Densities
0.7 0.000 Other compute
223050 0.3 0.000 All compute
223050 176.4
0.047 Config. write (wall) 223050 6.7
0.002 Above include Number
Total msec/ Flop/ MFLOPS
of cases min.
case case Wavefunctions 22612208
62.2 0.165 165031 1000.523 2-body
Prop. 4059060 39.8 0.588
324854 552.399 3-body Prop. Vijk 4282110
60.1 0.842 805563 957.102 Propagation
step 4059060 7.9 0.117 34435
293.892 Other Propagation 4059060 0.6
0.009 2115 237.629 Other vij
223050 4.7 1.251 1210116
967.236
8Total accounted for time 8918108. MFLOP
175.2 Min
848.397
MFLOPS MFLOP/Wall-second 15321.884 99.3
of total compute time is accounted for
Master got 505.62 Mbytes 0.8687
Mbytes/sec Master sent 5.21 Mbytes
0.0089 Mbytes/sec Master got 223639 messages
384.2262 messages/sec Master wall min. in
loop 9.7 Master idle wall min.
9.4 Total available wall min.
184.3 Total compute min.
176.4 Efficency
95.7 Speed up 18.2
9Data Structures
- Pretty straight forward arrays
- Wave function solution on the grid
- Grows 2 with number of particle
- Quickly moves from FLOPS to memory bound