Title: An Overview of TSFCore
1An Overview of TSFCore
- Roscoe A. Bartlett
-
- 9211, Optimization and Uncertainty Estimation
Sandia is a multiprogram laboratory operated by
Sandia Corporation, a Lockheed Martin
Company,for the United States Department of
Energy under contract DE-AC04-94AL85000.
2TSFCore SAND Reports
Get most recent copy at Trilinos/doc/TSFCore
3Nonlinear Equations Foundation for all our Work!
- Applications
- Discretized PDEs (e.g. finite element, finite
volume, finite difference etc.) - Network problems (e.g. Xyce)
4Nonlinear Equations Sensitivities
- Related Algorithms
- Gradient-based optimization
- SAND
- NAND
- Nonlinear equations (NLS)
- Multidisciplinary analysis
- Linear (matrix) analysis
- Block iterative solvers
- Eigenvalue problems
- Uncertainty quantification
- SFE
- Stability analysis / continuation
- Transients (ODEs, DAEs)
B. van Bloemen Waanders, R. A. Bartlett, K. R.
Long and P. T. Boggs. Large Scale Non-Linear
Programming PDE Applications and Dynamical
Systems, Sandia National Laboratories,
SAND2002-3198, 2002
5Applications, Algorithms, Linear-Algebra Software
- Key points
- Complex algorithms
- Complex software
- Complex interfaces
- Complex computers
- Duplication of effort?
APP Application (e.g. MPSalsa, Xyce, SIERRA,
NEVADA etc.) LAL Linear-Algebra Library (e.g.
Petra/Ifpack, PETSc, Aztec etc.) ANA Abstract
Numerical Algorithm (e.g. optimization, nonlinear
solvers, stability analysis, SFE, transient
solvers etc.)
6TSFCore
- Key points
- Maximizing development impact
- Software can be run on more sophisticated
computers - Fosters improved algorithm development
7Requirements for TSFCore
- TSFCore should
- Be portable to ASCI platforms
- Provide for stable and accurate numerical
computations - Represent a minimal but complete interface that
will result in implementations that are - Near optimal in computational speed
- Near optimal in storage
- Be independent of computing environment (SPMD,
MS, CS etc.) - Be easy to develop adapters for existing
libraries (e.g. Epetra, PETSc etc.)
8Example ANA Linear Conjugate Gradient Solver
9TSFCore Basic Linear Algebra Interfaces
ltltcreategtgt
Warning! Unified Modeling Langage (UML) Notation!
10TSFCore Basic Linear Algebra Interfaces
ltltcreategtgt
11TSFCore Basic Linear Algebra Interfaces
ltltcreategtgt
12TSFCore Basic Linear Algebra Interfaces
ltltcreategtgt
13TSFCore Basic Linear Algebra Interfaces
- The Key to success!
- Reduction/Transformation Operators
- Supports all needed vector operations
- Data/parallel independence
- Optimal performance
R. A. Bartlett, B. G. van Bloemen Waanders and M.
A. Heroux. Vector Reduction/Transformation
Operators, Accepted to ACM TOMS, 2003
14Background for TSFCore
- 1996 Hilbert Class Library (HCL), Symes and
Gockenbach - Abstract vector spaces, vectors, linear operators
- 2000 Epetra, Heroux
- Concrete multi-vectors
- 2001 Trilinos Solver Framework (TSF) 0.1,
Long - 2001 AbstractLinAlgPack (ALAP) (MOOCHO LA
interfaces), Bartlett - Reduction/transformation operators (RTOp)
- Abstract multi-vectors
15TSFCore Basic Linear Algebra Interfaces
16TSFCore Details
- All interfaces are templated on Scalar type
(support real and complex) - Smart reference counted pointer class
TeuchosRefCountPtrltgt used for all dynamic
memory management - Many operations have default implementations
based on very few pure virtual methods - RTOp operators (and wrapper functions) are
provided for many common level-1 vector and
multi-vector operations - Default implementation provided for MultiVector
(MultiVectorCols) - Default implementations provided for serial
computation VectorSpace (SerialVectorSpace),
VectorSpaceFactory (SerialVectorSpaceFactory),
Vector (SerialVector)
17Vector-Vector Operations Provided with TSFCore
namespace TSFCore templateltclass Scalargt
Scalar sum( const VectorltScalargt v )
// result sum(v(i)) templateltclass
Scalargt Scalar norm_1( const VectorltScalargt v )
// result v1
templateltclass Scalargt Scalar norm_2( const
VectorltScalargt v ) // result
v2 templateltclass Scalargt Scalar
norm_inf( const VectorltScalargt v_rhs )
// result vinf templateltclass Scalargt
Scalar dot( const VectorltScalargt x
,const VectorltScalargt y
) // result x'y
templateltclass Scalargt Scalar get_ele( const
VectorltScalargt v, Index i ) // result
v(i) templateltclass Scalargt void set_ele(
Index i, Scalar alpha
,VectorltScalargt v )
// v(i) alpha templateltclass Scalargt
void assign( VectorltScalargt y, const Scalar
alpha ) // y alpha templateltclass
Scalargt void assign( VectorltScalargt y
,const
VectorltScalargt x ) // y x
templateltclass Scalargt void Vp_S( VectorltScalargt
y, const Scalar alpha ) // y alpha
templateltclass Scalargt void Vt_S( VectorltScalargt
y, const Scalar alpha ) // y alpha
templateltclass Scalargt void Vp_StV(
VectorltScalargt y, const Scalar alpha
,const
VectorltScalargt x ) // y
alphax y templateltclass Scalargt void
ele_wise_prod( const Scalar alpha ,const
VectorltScalargt x, const VectorltScalargt v,
VectorltScalargt y ) // y(i)alphax(i)v(i)
templateltclass Scalargt void ele_wise_divide(
const Scalar alpha ,const
VectorltScalargt x, const VectorltScalargt v,
VectorltScalargt y ) // y(i)alphax(i)/v(i)
templateltclass Scalargt void seed_randomize(
unsigned int ) // Seed for
randomize() templateltclass Scalargt void
randomize( Scalar l, Scalar u, VectorltScalargt v
) // v(i) random(l,u) // end namespace
TSFCore
18TSFCore Vectors and Vector Spaces
Mathematical notation
C code templateltclass Scalargt Scalar foo(
const VectorSpaceltScalargt S )
TeuchosRefCountPtrltVectorltScalargt gt x
S.createMember(), // create
x y S.createMember()
// create y assign( x, 1.0 )
// x 1 randomize( -1.0,
1.0, y ) // y rand(-1,1)
Vp_StV( y, -2.0, x )
// y -2.0 x Scalar gamma dot(x,y)
// gamma xy return
gamma
19TSFCore Applying a Linear Operator
C Prototype namespace TSFCore enum
ETransp NOTRANS, TRANS, CONJTRANS
templateltclass Scalargt class LinearOp
public virtual OpBaseltScalargt public
virtual void apply( ETransp M_trans,
const VectorltScalargt x, VectorltScalargt y
,Scalar alpha 1.0, Scalar beta 0.0
) const 0
Example templateltclass Scalargt void myOp( const
VectorltScalargt x, const LinearOpltScalargt M
,VectorltScalargt y ) M.apply(
NOTRANS, x, y )
20Example ANA Linear Conjugate Gradient Solver
21Multi-vector Conjugate-Gradient Solver Single
Iteration
templateltclass Scalargt void CGSolverltScalargtdoI
teration( const LinearOpltScalargt M, ETransp
opM_notrans ,ETransp opM_trans,
MultiVectorltScalargt X, Scalar a ,const
LinearOpltScalargt M_tilde_inv ,ETransp
opM_tilde_inv_notrans, ETransp opM_tilde_inv_trans
) const const Index m
currNumSystems_ int j if( M_tilde_inv
) M_tilde_inv-gtapply( opM_tilde_inv_notra
ns, R_, Z_ ) else assign( Z_,
R_ ) dot( Z_, R_, rho_0 ) if(
currIteration_ 1 ) assign( P_,
Z_ ) else
for(j0jltmj) beta_j rho_j/rho_old_j
update( Z_, beta_0, 1.0, P_ )
M.apply( opM_notrans, P_, Q_ )
dot( P_, Q_, gamma_0 )
for(j0jltmj) alpha_j rho_j/gamma_j
update( alpha_0, 1.0, P_, X )
update( alpha_0, -1.0, Q_, R_ )
22The TSFCore Trilinos package
- packages/TSFCore
- src
- interfaces
- Core VectorSpace, Vector, LinearOp etc
- Solvers Iterative linear solver interfaces
(unofficial!) - Nonlin Nonlinear problem interfaces
(unofficial!) - utilities
- Core Testing etc
- Solvers Some iterative solvers (CG, BiCG,
GMRES) - Nonlin Testing etc
- adapters
- mpi-base Node classes for MPI-based vector
spaces - Epetra EpetraVectorSpace, EpetraVector etc
- examples
23TSFCoreNonlin Interfaces to Nonlinear Problems
- Supported Areas
- NAND optimization
- SAND optimization
- Nonlinear equations
- Multidisciplinary analysis
- Stability analysis / continuation
- SFE
24TSFCoreNonlin Interfaces to Nonlinear Problems
State constraints and response functions
- Supported Areas
- SAND
- Nonlinear equations
- Multidisciplinary analysis
- Stability analysis / continuation
- SFE
25Summary
SAND Reports R. A. Bartlett, M. A. Heroux and K.
R. Long. TSFCore A Package of Light-Weight
Object-Oriented Abstractions for the Development
of Abstract Numerical Algorithms and Interfacing
to Linear Algebra Libraries and Applications,
Sandia National Laboratories, SAND2003-1378,
2003 R. A. Bartlett, TSFCoreNonlin An
Extension of TSFCore for the Development of
Nonlinear Abstract Numerical Algorithms and
Interfacing to Nonlinear Applications, Sandia
National Laboratories, SAND2003-1377,
2003 Location Trilinos/doc/TSFCore
26The End
Thank You!
27Extra Slides
28Examples of Non-Standard Vector Operations
Currently in MOOCHO gt 40 vector operations!
29Goals for a Vector Interface
Compute efficiency
gt Near optimal performance Optimizat
ion developers add new operations gt
Independence of linear algebra .
library developers Compute
environment independence gt
Flexible optimization software Minimal number of
methods gt Easy to
write adapters
30Approaches to Developing Vector Interfaces
(1) Linear algebra library allows direct access
to vector elements (2) Optimizer-specific
interfaces (3) General-purpose primitive vector
operations
31Vector Reduction/Transformation Operators Defined
- Reduction/Transformation Operators (RTOp) Defined
- z 1i z qi opt( i , v 1i v pi , z 1i z qi
) element-wise transformation - b opr( i , v 1i v pi , z 1i z
qi ) element-wise reduction - b2 oprr( b1 , b2 )
reduction of intermediate
reduction objects - v 1 v p Î R n p non-mutable input vectors
- z 1 z q Î R n q mutable input/output
vectors - b reduction target
object (many be non-scalar (e.g. yk ,k), or
NULL)
- Key to Optimal Performance
- opt() and opr() applied to entire sets of
subvectors (i ab) independently - z 1ab z qab , b op( a, b , v 1ab v pab
, z 1ab z qab , b ) - Communication between sets of subvectors only
for b ¹ NULL, oprr( b1 , b2 ) b2
32Object-Oriented Design for User Defined RTOp
Operators
- Advantages
- Functionality
- Linear-algebra implementations can be changed
with no impact on optimizer - Optimizer developers can unilaterally add new
vector operations - Performance
- Near optimal performance (large subvectors)
- Multiple simultaneous global reductions gt no
sequential bottlenecks - No unnecessary temporary vectors or multiple
vector read/writes - Disadvantages
- New concepts, initially harder to understand
interfaces?
33RTOp vs. Primitives Communication
128 processors on CPlant
- Compare
- RTOp (all-at-once reduction (i.e. ISIS QMR
solver)) - a, g, x, r, e (xT x)1/2, (vT v)1/2, (wT
w)1/2, wT v, vT t - Primitives (5 separate reductions)
- a (xT x)1/2, g (vT v)1/2, x (wT w)1/2, r
wT v, e vT
34RTOp vs. Primitives Multiple Ops and Temporaries
1 processor (gcc 3.1 under Linux)
- Compare
- RTOp (all-at-once reduction)
- max a x a d ³ b min max( (b - xi)/di,
0 ), for i 1 n a - Primitives (5 temporaries, 6 vector operations)
- -xi ui, xi b vi, vi / di wi, 0
yi, maxwi,yi zi, minzi,i1n a
35Parallel Scalability of MOOCHO
Where is the parallel bottleneck? Is it OO C
or MPI?
Answer gt
MPI
Question Does OO C allow for good scalability
for massively parallel computing (i.e. 100 to
10000 processors)?
Serial overhead of MOOCHO (n2, Np1)
0.41 milliseconds per rSQP iteration
Overhead of MPI communication (Np4)
0.42 milliseconds per global reduction
- Red Hat Linux cluster (4 nodes)
- 2.0 GHz Intel P4 processors
- MPICH 1.2.2.1