Title: QDP and Chroma
1QDP and Chroma
- Robert Edwards
- Jefferson Lab
- Collaborators
- Balint Joo
2SciDAC Software Structure
3Overlapping communications and computations
- C(x)A(x) shift(B, mu)
- Send face forward non-blocking to neighboring
node. - Receive face into pre-allocated buffer.
- Meanwhile do AB on interior sites.
- Wait on receive to perform AB on the face.
- Lazy Evaluation (C style)
- Shift(tmp, B, mu)
- Mult(C, A, tmp)
4QMP Simple Example
- char bufsize
- QMP_msgmem_t mm
- QMP_msghandle_t mh
- mm QMP_declare_msgmem(buf,size)
- mh QMP_declare_send_relative(mm,x)
- QMP_start(mh)
- // Do computations
- QMP_wait(mh)
- Receiving node coordinates with the same steps
except - mh QMP_declare_receive_from(mm,-x)
Multiple calls
5Data Parallel QDP/C,C API
- Hides architecture and layout
- Operates on lattice fields across sites
- Linear algebra tailored for QCD
- Shifts and permutation maps across sites
- Reductions
- Subsets
- Entry/exit attach to existing codes
6QDP Type Structure
- Lattice Fields have various kinds of indices
- Color Uab(x) Spin Gab Mixed yaa(x), Qabab(x)
- Tensor Product of Indices forms Type
- QDP forms these types via nested C
templating - Formation of new types (eg half fermion)
possible
7Data-parallel Operations
- Unary and binary
- -a a-b
- Unary functions
- adj(a), cos(a), sin(a),
- Random numbers
- // platform independent
- random(a), gaussian(a)
- Comparisons (booleans)
- a lt b,
- Broadcasts
- a 0,
- Reductions
- sum(a),
- Fields have various types (indices) Tensor
Product
8QDP Expressions
- QDP/C code
- multi1dltLatticeColorMatrixgt u(Nd)
- LatticeDiracFermion b, c, d
- int mu
- ceven umu shift(b,mu) 2 d
- PETE Portable Expression Template Engine
- Temporaries eliminated, expressions optimised
9Linear Algebra Implementation
- Naïve ops involve lattice temps inefficient
- Eliminate lattice temps -PETE
- Allows further combining of operations (adj(x)y)
- Overlap communications/computations
- Full performance expressions at site level
// Lattice operation A adj(B) 2 C
// Lattice temporaries t1 2 C t2
adj(B) t3 t2 t1 A t3
// Merged Lattice loop for (i ... ... ...)
Ai adj(Bi) 2 Ci
10QDP Optimization
- Optimizations under the hood
- Select numerically intensive operations through
template specialization. - PETE recognises expression templates like
- z a x y
- from type information at compile time.
- Calls machine specific optimised routine (axpyz)
- Optimized routine can use assembler, reorganize
loops etc. - Optimized routines can be selected at
configuration time, - Unoptimized fallback routines exist for
portability
11Performance Test Case -Wilson Conjugate Gradient
LatticeFermion psi, p, r Real c, cp, a, d
Subset s for(int k 1 k lt MaxCG k)
// c rk-1 2 c cp //
ak rk-1 2 / ltM pk, Mpk gt //
Mp M(u) p M(mp, p, PLUS) // Dslash
// d mp 2 d norm2(mp, s) a
c / d // Psik ak pk psis
a p // rk - ak Mdag.M.pk
M(mmp, mp, MINUS) rs - a mmp cp
norm2(r, s) if ( cp lt rsd_sq ) return
// bk1 rk2 / rk-12 b
cp / c // pk1 rk bk1 pk
ps r bp
- In C significant room for perf. degradation
- Performance limitations in Lin. Alg. Ops (VAXPY)
and norms - Optimization
- Funcs return container holding function type and
operands - At , replace expression with optimized code by
template specialization - Performance
- QDP overhead 1 peak
- Wilson QCDOC 283Mflops/node _at_350 MHz, 44/node
12Chroma
- A lattice QCD toolkit/library built on top of
QDP - Library is a module can be linked with other
codes. - Features
- Utility libraries (gluonic measure, smearing,
etc.) - Fermion support (DWF, Overlap, Wilson, Asqtad)
- Applications
- Spectroscopy, Props 3-pt funcs, eigenvalues
- Heatbath, HMC
- Optimization hooks level 3 Wilson-Dslash for
Pentium, QCDOC, BG/L, IBM SP-like nodes (via
Bagel)
13Software Map
- Autoconf/make based.
- Installed packages leave a bin script for other
packages
14Chroma Lib Structure
- Chroma Lattice Field Theory library
- Support for gauge and fermion actions
- Boson action support
- Fermion action support
- Fermion actions
- Fermion boundary conditions
- Inverters
- Fermion linear operators
- Quark propagator solution routines
- Gauge action support
- Gauge actions
- Gauge boundary conditions
- IO routines
- Enums
- Measurement routines
- Eigenvalue measurements
- Gauge fixing routines
- Gluonic observables
- Hadronic observables
- Measurement routines
- Eigenvalue measurements
- Gauge fixing routines
- Gluonic observables
- Hadronic observables
- Inline measurements
- Eigenvalue measurements
- Glue measurements
- Hadron measurements
- Smear measurements
- Psibar-psi measurements
- Schroedinger functional
- Smearing routines
- Trace-log support
- Gauge field update routines
- Heatbath
- Molecular dynamics support
- Hamiltonian systems
- HMC trajectories
15Fermion Actions
- Actions are factory objects (foundries)
- Do not hold gauge fields only params
- Factory/creation functions with gauge field
argument - Takes a gauge field - creates a State applies
fermion BC. - Takes a State creates a Linear Operator
(dslash) - Takes a State creates quark prop. solvers
- Linear Ops are function objects
- E.g., class Foo int operator() (int x) fred
// int zfred(1) - Argument to CG, MR, etc. simple functions
- Created with XML
16Fermion Actions - XML
- ltFermionActiongt
- ltFermActgtWILSONlt/FermActgt
- ltKappagt0.11lt/Kappagt
- ltFermionBCgt
- ltFermBCgtSIMPLE_FERMBClt/FermBCgt
- ltboundarygt1 1 1 -1lt/boundarygt
- lt/FermionBCgt
- ltAnisoParamgt
- ltanisoPgtfalselt/anisoPgt
- ltt_dirgt3lt/t_dirgt
- ltxi_0gt1.0lt/xi_0gt
- ltnugt1.0lt/nugt
- lt/AnisoParamgt
- lt/FermionActiongt
- Tag FermAct is key in lookup map of constructors
- During construction, action reads XML
- FermBC tag invokes another lookup
- XPath used in chroma/mainprogs/main/propagator.cc
- /propagator/Params/FermionAction/FermAct
17HMC and Monomials
- ltMonomialsgt
- ltelemgt
- ltNamegtTWO_FLAVOR_WILSON_FERM_MONOMIAL
- lt/Namegt
- ltFermionActiongt
- ltFermActgtWILSONlt/FermActgt
-
- lt/FermionActiongt
- ltInvertParamgt
- ltinvTypegtCG_INVERTERlt/invTypegt
- ltRsdCGgt1.0e-7lt/RsdCGgt
- ltMaxCGgt1000lt/MaxCGgt
- lt/InvertParamgt
- ltChronologicalPredictorgt
- ltNamegtLAST_SOLUTION_4D_PREDICTORlt/Namegt
- lt/ChronologicalPredictorgt
- lt/elemgt
- ltelemgt . lt/elemgt
- HMC built on Monomials
- Monomials define Nf, gauge, etc.
- Only provide Mom à deriv(U) and S(U) .
Pseudoferms not visible. - Have Nf2 and rational Nf1
- Both 4D and 5D versions.
18Gauge Monomials
- Gauge monomials
- Plaquette
- Rectangle
- Parallelogram
- Monomial constructor will invoke constructor for
Name in GaugeAction
- ltMonomialsgt
- ltelemgt . lt/elemgt
- ltelemgt
- ltNamegtWILSON_GAUGEACT_MONOMIALlt/Namegt
- ltGaugeActiongt
- ltNamegtWILSON_GAUGEACTlt/Namegt
- ltbetagt5.7lt/betagt
- ltGaugeBCgt
- ltNamegtPERIODIC_GAUGEBClt/Namegt
- lt/GaugeBCgt
- lt/GaugeActiongt
- lt/elemgt
- lt/Monomialsgt
19Chroma Inline Measurements
ltInlineMeasurementsgt ltelemgt
ltNamegtMAKE_SOURCElt/Namegt
ltParamgt...lt/Paramgt ltPropgt
ltsource_filegt./source_0lt/source_filegt
ltsource_volfmtgtMULTIFILElt/source_volfmtgt
lt/Propgt lt/elemgt ltelemgt
ltNamegtPROPAGATORlt/Namegt
ltParamgt...lt/Paramgt ltPropgt
ltsource_filegt./source_0lt/source_filegt
ltprop_filegt./propagator_0lt/prop_filegt
ltprop_volfmtgtMULTIFILElt/prop_volfmtgt
lt/Propgt lt/elemgt ltelemgt.lt/elemgt lt/InlineMe
asurementsgt
- HMC has Inline meas.
- Chroma.cc is Inline only code.
- Former mainprogs now inline meas.
- Meas. are registered with constructor call.
- Meas. given gauge field no return value.
- Only communicate to each other via disk (maybe
mem. buf.??)
20For More Information
- U.S. Lattice QCD Home Page
- http//www.usqcd.org/
- The JLab Lattice Portal http//lqcd.jlab.org/
- High Performance Computing at JLab
- http//www.jlab.org/hpc/