Monte Carlo Linear Algebra Techniques and Their Parallelization - PowerPoint PPT Presentation

About This Presentation

Title:

Monte Carlo Linear Algebra Techniques and Their Parallelization

Description:

Take a random walk, based on these probabilities. Define random variables Xi ... Each random walk can be used to estimate the kj th component of Cj h ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 18

Provided by: asri9

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Monte Carlo Linear Algebra Techniques and Their Parallelization

1
Monte Carlo Linear Algebra Techniques and Their
Parallelization

Ashok Srinivasan
Computer Science
Florida State University

www.cs.fsu.edu/asriniva
2
Outline

Background
Monte Carlo Matrix-vector multiplication
MC linear solvers
Non-diagonal splitting
Dense implementation
Sparse implementation
Parallelization
Conclusions and future work

3
Background

MC linear solvers are old!
von Neumann and Ulam (1950)
Were not competitive with deterministic
techniques
Advantages of MC
Can give approximate solutions fast
Feasible in applications such as preconditioning,
graph partitioning, information retrieval,
pattern recognition, etc
Can yield selected components fast
Are very latency tolerant

4
Matrix-vector multiplication

Compute Cj h, C e R nxn
Choose probability and weight matrices such that
Cij PijWij and hi pi wi
Take a random walk, based on these probabilities
Define random variables Xi
X0 w0, and Xi Xi-1 Wki ki-1
E(Xj dikj) (Cj h)i
Each random walk can be used to estimate the kj
th component of Cj h
Convergence rate independent of n

pk0
k0
Pk1k0
k1
Pk2k1
k2
kj
Update (Cj h)kj
5
Matrix-vector multiplication ... continued
pk0

Smj0 Cj h too can be similarly estimated
Smj0 (BC) jBh will be needed by us
It can be estimated using probabilities on both
matrices, B and C
Length of random walk is twice that for the
previous case

k0
Pk1k0
k1
Pk2k1
k2
Pk3k2
k3
Update (Cj h)k2j1
k2m1
6
MC linear solvers

Solve Ax b
Split A as A N M
Write the fixed point iteration
xm1 N-1 Mxm N-1 b Cxm h
If we choose x0 h, then we get
xm Smj0 Cj h
Estimate the above using the Markov chain
technique mentioned earlier

7
Current techniques

Current techniques
Choose N a diagonal matrix
Ensures efficient computation of C
C is sparse when A is sparse
Example N Diagonal of A yields the Jacobi
iteration, and the corresponding MC estimate

8
Properties of MC linear solvers

MC techniques estimate the result of a stationary
iteration
Errors from the iterative process
Errors from MC
Reduce the error by
Variance reduction techniques
Residual correction
Choose a better iterative scheme!

9
Non-diagonal splitting

Observations
It is possible to construct an efficient MC
technique for specific splittings, even if
explicit construction of C were computationally
expensive
It may be possible to implicitly represent C
sparsely, even if C is not actually sparse

10
Our example

Choose N to be the diagonal and sub-diagonal of A

d1 s2 d2 sn dn
. . . . . .

N
N-1

Computing N-1 C is too expensive
Compute xm Smj0 (N-1 M)j N-1 b instead

11
Computing N-1

Using O(n) storage and precomputation time, any
element of N-1 can be computed in constant time
Define T(1) 1, T(i1) T(i) si1/di1
N-1ij
0, if i lt j
1/di, if i j
(-1)i-j /dj T(i)/T(j), otherwise
The entire N-1, if needed, can be computed in
O(n2) time

12
Dense implementation

Compute N-1 and store in O(n2) space
Choose probabilities proportional to the weight
of the elements
Use the alias method to sample
Precomputation time proportional to the number of
elements
Constant time to generate each sample
Estimate Smj0 (N-1 M)j N-1 b

13
Experimental results
Walk length 2
14
Sparse implementation

We cannot use O(n2) space or time!
Sparse implementation for M is simple
Sparse representation of N-1
Choose Pij
0, if i lt j
1/(n-j1) otherwise
Sampled from the uniform distribution
Choose Wij N-1ij Pij
Constant time to determine any Wij
Minor modifications needed when si 0

15
Parallelization
Proc 2 RNG 2
Proc 3 RNG 3
23.7
23.2
23.6
23.5

MC is embarrassingly parallel
Identical algorithms are run independently on
each processor, with the random number sequences
alone being different

16
MPI vs OpenMP on Origin 2000

Cache misses cause poor performance of the
OpenMP parallelization

17
Conclusions and future work

Demonstrated that is possible to have effective
MC implementations with non-diagonal splittings
too
Need to extend this to better iterative schemes
Non-replicated parallelization needs to be
considered

Write a Comment

User Comments (0)