Disconnected Diagrams, Multigrid, Nvidia - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Disconnected Diagrams, Multigrid, Nvidia

Description:

Quad City DJ's, Southern rap group ... Monte Carlo update (Long auto correlations times) ... DMA between GPU on Quad system and network for cluster ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 52
Provided by: wwwcgiUni
Category:

less

Transcript and Presenter's Notes

Title: Disconnected Diagrams, Multigrid, Nvidia


1
Disconnected Diagrams, Multi-grid, Nvidia
all thaty
Richard Brower (Boston University)? James
Brannick (Penn) Ron Babich (BU)? Kipton Barros
(BU) Mike Clark (BU)? George Fleming
(Yale)? James Osborn (Argonne)? Claudio Rebbi
(BU)? QCDNA 2008 Regensburg Sept 5, 2008
y WARNING Much here is a FUTURE plan NOT proven
results but .....
2
What is QCD?
Acronym Definition QCD Qualified Charitable
Distribution (IRS) QCD Quality, Cost,
Delivery QCD Quantum Chromodynamics QCD QuarkCopyD
esk (file extension) QCD Quasi-Cyclic
Dyadic QCD Quick Change Directory QCD Quick Claim
Deed (real estate) QCD Quintessential CD (PC
media player) QCD Quit Claim Deed (real
estate) QCD Quality Control Department
3
What do these mean?
  • QCD From Wikipedia, the free encyclopedia
  • Quintessential Player, formerly known as
    Quintessential CD
  • Quality, Cost, Delivery, a three-letter
    acronym used in lean manufacturing
  • Quad City DJ's, Southern rap group
  • Quick Control Dial, a control on many DSLR
    cameras, like the Canon EOS 40D
  • Quote-Comma-Delimited known also as
    Comma-separated values
  • Quantum chromodynamics, the theory
    describing the Strong Interaction

4
Outline
  • Physics (How strange is the proton?)
  • Algorithms (Multi-grid to the rescue?)
  • Hardware (GPU propagator farm?)

5
Physics Disconnected Diagrams
  • Connected vs.
    Disconnected
  • Want matrix element

6
How strangey is the proton? Who cares?
  • Violation of Standard Model
  • Dark Energy (Neutralino scattering)
  • NuTev anomaly
  • Nucleon Physics (include u/d s quares)
  • iso-scalar Form Factors, nucleon structure
    function, Spin crisis for proton, matrix element
    etc.

y see Lattice 2008 http//conferences.jlab.org/la
ttice2008/parallel-bytopic-struct.htmlS.Collins,
G. Bali, A.Schafer Hunting for the strangeness
... nucleonTakumi Doi et al
Strangeness and glue in the nucleon from
lattice QCDRon Babich et al
Strange quark content of the nucleon
7
(No Transcript)
8
Direct detection of dark matter
  • In SUSY, the neutralino scatters from a nucleon
    via Higgs exchange
  • The strange scalar matrix element is a major
    uncertainty
  • Uncertainty in fTs gives up to a factor of 4
    uncertainty in the cross-section!
  • Bottino et al., hep-ph/0111229
  • Ellis et al., hep-ph/0502001

9
Nuclear Experiment
PVES BNL E734 (?p scattering)?
Parity-violating electron scattering (SAMPLE,
HAPPEx, PVA4, G0)?
J. Liu et al., arXiv0706.0226 nucl-ex
(see also Young et al., nucl-ex/0605010)?
Pate et al., arXiv0805.2889 hep-ex
10
Algorithm
  • Monte Carlo update (Long auto correlations
    times)
  • Global Heat bath aka Stochastic Estimator
    (Zero auto correlations)
  • Find Á D-1 for Gaussian or Gauge
    or Z2
  • (Zero auto correlations!)
  • With lt y x gt yx

Axy
11
Improving Stochastic Estimate
  • Variance reduction
  • Dilution vs hopping parametery (Short
    distance)
  • Multi-grid vs deflation/truncationy (Long
    distance)
  • Curing volume divergence
  • Trace versus Gauge fluctuations
  • Better and more source (all to all?).
  • Full multi-grid O(N long N) Trace?

x
y
y S.Collins, G. Bali, A.Schafer Hunting for
the strangeness ... nucleon
12
Trace estimation
  • Two sources of error gauge noise and error in
    trace. In this calculation, we largely eliminate
    the second source by calculating a nearly exact
    trace on four time-slices.
  • 864 sources (x12 for color/spin). A given source
    is nonzero on 4 sites on each of 4 time-slices.
  • Minimal spatial separation between sites is .
    Small residual contamination is
    gauge-variant and averages to zero.
  • Equivalent to using a single stochastic source
    with
    extreme dilution.

4 x 63 864
13
Preliminary Methods
  • Configurations were provided by the LHPC
    Spectrum Collaboration
  • anisotropic lattice with
  • 2 dynamical flavors, Wilson fermion and gauge
    actions
  • 863 configurations
  • 64 (x 12) inversions per configuration at the
    light quark mass, for the nucleon correlators
  • 864 (x 12) inversions per configuration at the
    strange mass, for the trace

14
Strange scalar form factor
15
Ratio approach
  • Conventionally, one extracts the (e.g.
    zero-momentum) form factor from the large t
    behavior of the ratio
  • (or from a similar expression integrated over
    time).
  • Instead, we fit the numerator directly, since
    this allows us
  • to avoid contamination from backward-propagating
    states, which are problematic due to the short
    temporal
  • extent of our lattice (
    ).
  • to explicitly take into account the contribution
    of (forward-propagating) excited states.
  • In the following, we always treat the system
  • symmetrically with

16
(No Transcript)
17
Direct fit
  • First, we perform a fit to the nucleon two-point
    function, of the form
  • The coefficients and masses are very
    well-determined, since we are required to
    calculate correlators from all initial times (a
    total of 863 x 64 55,232).
  • Next, we perform a fit to the three-point
    function,
  • Here j1 and j2 are the form factors for the
    proton and its first excited state, and j12 is a
    transition matrix element between them. In
    practice, we expect j2 and j12 to absorb the
    contribution of still higher states, and trust
    only j1 to be reliable.

18
Strange scalar form factor
  • For the renormalization-invariant quantity fTs,
    we estimate
  • where we have inserted the physical nucleon
    mass. The second error is the uncertainty in
    relating this mass to the lattice scale, the
    first error is statistical, and no other
    systematics are included.
  • Note that the matrix element in the numerator was
    calculated for a world
    with a 400 MeV pion. If we work
    consistently in such a world by inserting our
    calculated nucleon mass, the scale dependence
    drops out, and
    we find

19
Momentum dependence of GS(q2)?
s
PRELIMINARY
20
Strange axial form factor
PRELIMINARY
  • Results have not been renormalized.
  • Calculated value is distinct from zero at the 3-s
    level.

21
Error O(L3/2) ) as L3 ) 1 For Exact Trace in a
Connect correlator,
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Most Important New Trick Multi-grid Variance
Reduction
  • The signal and variance of the first term is down
    by 1 to 2 orders of magnitude because Dc D
  • The Coarse level Trace for D-1c is as cheap to
    calculate as the level down operator inverse.
  • This can of course be done recursively giving (I
    think) an O(N log N)trace calculation to fixed
    tolerance.

36
HARDWARE
  • Graphics hardware is well suited to highly
    parallel numerical tasks.
  • Hardware vendors provide development tools to
    support high performance computing.
  • NVIDIA'S CUDA offers direct access to graphics
    hardware through a programming language similar
    to C.
  • Dirac-Wilson operator which runs at an effective
    68 Gigaflops on the Tesla C870 GPU.
  • The recently released GTX 280 GPU at 92 Gigaflops
    and we expect improvement pending code
    optimization.
  • (Now 98 Gigaflops hope to get O(150) Gigaflops)

37
Nvidia GPU architecture
38
Two Generations Consumer vs HPC GPUs
  • Consumer cards ) High Performance (HPC) GPUs
  • I. 8880 GTX ) Tesla C870
  • (16 multi-processor with 8 cores each)
  • II. GTX 280 ) Tesla C1060
  • (30 multi-processor with 8 cores each)

39
C870 code using 60 of the memory bandwidth.
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
http//www.scala-lang.org/
45
Future software Plans
  • Need find out why we are only saturating 60 of
    Memory bandwidth
  • Further educe memory traffic
  • 8 real number per SU(3) matrix (2/3 of 12 used
    now)
  • shear spinors in 43 blocks (5/9 of used now)
  • Generalize to clover Wilson Domain Wall
    operator (slightly better flops/mem ratio).
  • DMA between GPU on Quad system and network for
    cluster
  • Start to design SciDAC API for many-core
    technologies.

46
Tesla 10-Series Whats the Big Deal?
47
Consumer Chip GTX 280 ) Tesla C1060
48
1 U Quad S1070 System 8K
49
CUDA 2.0 (Compute Unified Device Architecture)
  • Can compile CUDA code into highly efficient
    SSE-based multi-threaded C code

50
Need a GPU Dirac Propagator Farm
  • The Clark-Kennedy RHMC Paradox(Faster you go
    harder it is to keep up)
  • Analysis is now the ?????e?? heel
  • Solution Dedicated Analysis farm.
  • GPU can deliver O(10) to O(100) gain in flops/
  • Two quad Tesla ) 1 Sustained Teraflop!
  • Two quad Tesla _at_ 25K One BG/L rack _at_ 2,000
    K

51
Commercial Break
  • BOSTON POST DOC IN SEPT 2009
  • PetaAPPS/SciDAC fellow
  • (QCDNA in Boston Fall 2009?)
Write a Comment
User Comments (0)
About PowerShow.com