Title: Domain Decomposition and Parallel Finite Element Methods
1Domain DecompositionandParallel Finite Element
Methods
Abhijit Bose Center for Advanced Computing,
and Dept. of Electrical Eng.. and Computer
Science University of Michigan Ann Arbor, MI
48109 (734) 615 1490 Email abose_at_umich.edu
2 What is Domain Decomposition ? - divide a
large problem into smaller problems so that
each small problem can be solved on a
single node of a multi-processor
system - most commonly used in solving partial
and ordinary differential equations in
computational fluid dynamics, structural
mechanics, electrical circuits,
automotive engineering - often used for solving
large systems of finite element and finite
volume meshes on distributed memory
supercomputers and clusters
3Computational Fluid Dynamics
4Outline Part I An introduction to Domain
Decomposition via the Substructuring
Method Part II A practical implementation of
some of the concepts in C
5Part I
6(No Transcript)
7(No Transcript)
8Processor zero effectively serialized gt Lack of
scalability gt Poor Performance!!
9But, Can we solve the Schur complement problem in
parallel ?
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18Part II Implementation in C (a) Mesh
Partitioning - data structure often requires
in-house development (b) Parallel
Linear Solvers - many publicly available
packages available nowadays (c)
Recursive Interface Partitioning - data
structure often requires in-house
development
19 Examples are taken from the Pfinics Project
Scalable Parallel h-p Finite Element Methods
Code Developed by Abhijit Bose (Univ. of
Michigan)
Valmor F. de Almeida (ORNL) Some of the
software are in public domain. Please send an
email to abose_at_umich.edu with requests for
software. A Globus-enabled version of
Pfinics that can be installed in wide-area
distributed Grid Computing environments will
be available in 1Q/2003. The next few slides
will introduce some basic C techniques before
we move on to the Parallel data structures part.
20Why Use Object-Oriented Programming for
Developing Parallel Adaptive Finite
Element/Volume Codes ?
- Growing number of libraries for writing
object-oriented scientific computing codes - OOP offers several advantages
abstract data types
class
hierarchies
inheritance
polymorphism
virtual functions - Optimizing compilers are increasingly available
21- Grid based methods (finite elements, finite
volumes) offer a natural object-oriented
programming model - Specific examples from Pfinics project
- Support for hierarchical classes
Global Computational Domain
Subdomains
Surfaces
Elements (h, p, h-p adaptivity)
Nodes
22Classes for Communication
PEList
NeighPE
NeighPE
NeighPE
PE 0
NeighNode
NeighNode
NeighNode
NeighNode
PE 1
NeighNode
NeighNode
PE 2
NeighNode can be list of nodes or list of
nodal objects to be communicated with neighbors
23Processor-level Class PEList class PEList
private class NeighNodeList
Public
NeighNodeList(int NeighNodeIn)link(0),
NeighNode(NeighNodeIn)
NeighNodeList link int
NeighNode public NeighPEList(int
NeighPEIn)link(0),NeighPE(NeighPEIn),
first(0), last(0)
NeighPEList link NeighNodeList
first, last NeighNodeList ptemp int
NeightPE // functions to manipulate the
linked-lists as follows
24Processor-level Class PEList (continued)
void add (int NeighNodeIn) int CountNodes(
) void getNodeList(PfVecI NodeList) publ
ic NeighPEList first, last
NeighPEList ptemp PEList ( ) void
add(int NeighPEIn, int NeighNodeIn) int
GetprocIntfcList(PfVecI ProcIntfcList) int
MaxmIntfcNodes( ) int SendNodalObject(int
PEIndx)
25Constructing Derived Classes - Very Useful for
Grid-based Methods
Base
Base
dc1
dcB1
dcA1
dc2
dcB2
dcA2
dc derived class
dcA2_1
dcA2_2
Virtual functions do NOT span multiple
inheritance hierarchies Examples of derived
classes nodes, elements, patches,
subdomains Advice keep it simple - use only 2
levels whenever possible
26Early versus Late Binding Early binding same
as Fortran subroutine and C function calls
Relative function addresses are determined at
compilation time Late binding Relative
addresses can only be determined at
run-time e.g. function pointers. In many cases,
function pointers can be replaced by
virtual functions. Example (late binding)
include ltiostream.hgt include ltmath.hgt //
function prototyping void CalcSin(double a)
void CalcCosin(double a) void
CalcTangent(double a) // define a function
pointer (with input type as double) void (fun
) (double ) CalcSin, CalcCosin,
CalcTangent main( ) double a int ia cin
gtgt a cin gtgt ia funia(a) // write code for
CalcSin, CalcCosin and CalcTangent below...
27Virtual Functions Provides a late binding
mechanism for resolving common namespaces Consider
the following problem (without virtual
functions)
Base class Element
nnodes gnodes printNode( )
inh. inherited
nnodes(inh.) gnodes(inh.) printNode()(inh.) n3Dfac
es
nnodes(inh.) gnodes(inh.) printNode()(inh.) nrefLe
vels
nnodes(inh.) gnodes(inh.) printNode()(inh.) nfaces
Elem3Dcube
Elem2Dtriangle
Elem2Dquad
28Virtual Functions (continued)
class Element protected
int nnodes, gnodes public
void printNode( ) // define
ElementprintNode( ) also...
Base Class Definition
class Elem2Dquad public Element int
nfaces public
Elem2Dquad(int, int, int) void
printNode( )
Derived Class Definition
29Virtual Functions (continued)
Next, if you define a main program as follows
main ( ) Element elements 3 //
define pointers to class objects elements0
Elem2Dquad(2,3,4) // initialize elements
elements1 Elem3Dcube(3,4,5) // with
constructors elements0-gtprintNode( )
Question printNode( ) is defined in both the
base class Element and the
derived class Elem2Dquadwhich printNode()
gets printed in the above line
? Answer printNode( ) from the base class !
Reason Elem2Dquad made no attempt to override
printNode( ) function of its
parent and uses the one it inherits. Fix
Virtual Function fixes this problem !.see next
slide
30Virtual Functions (continued) change the based
class Element as follows
class Element protected
int nnodes, gnodes public
virtual void printNode( ) //
define ElementprintNode( ) also...
Base Class Definition
Now, Elem2Dquads function printNode( ) will be
used ! If an element type is changed gt NO need
to change code Possible cases for using virtual
functions quadrature rules (different for
different element types) adaptivity rules (quads
get divided into 4 quads, triangles
into 3 triangles,.etc)
31Virtual Functions (continued) You can still use
the Base classs printNode( ) if you wish
elements0-gtElementprintNode( ) This is
useful if you want to hide certain data defined
for one of the derived classes. Possible
scenario (credit reporting agency) Base Class
customer (data credit rating, amount of
loan) Derived Class customerJoe( additional
data bad past credit history) You may want to
use base class customers print function that
only prints customerJoes credit rating and
amount of loan while responding to a banks
request but does not give away customerJoes past
credit history.
32 Derived classes for various element types Define
a basic element class called Element with the
attributes class Element protected
int eltype // element type ID
int nnodes // number of nodes for field
variables int gnodes // number of
nodes for geometry variables int
kbasis // geometry basis function order
int nedges // number of edges (2D) or
surfaces (3D) int nqpx,nqpy,nqpz //
quadrature points in x, y, z direction
PfVecI plevels // p-level descriptor for
elements int refnlevel //
h-refinement level for an element real
xc, yc, zc //coordinates node
nodes // nodes for this element public
Element(int, int, int, int, int)
Element( )
33Derived classes for various element types
(continued) (file Element.C) include
Element.h ElementElement(int A, int B, int C,
int D, int E,) eltype(A),
nnode(B), gnode(C), kbasis(D), nedges(E) //
define common properties here via virtual
functions public void
GaussTable( ) void
DivideElement( ) An example program for
mixing element types (increasingly
popular) include ElementLibrary.h //contains
header files for element types main( )
Elem2Dquad quad2d Elem2Dtriangle
triang2d Element elem new Element 2 //
allocates memory for 2 elements elem0 quad2d
elem1 triang2d cout ltlt elem0 ltlt elem1
34 Derived classes for various element types
(continued) protected access provides both
data protection and inheritance !! Variables
defined with protected specifier are only
available to Element class AND any other
class derived from it. PfVecI is a class that
dynamically constructs p-degree polynomials
for higher order finite elements 2D
rectangular arbitrary p-degree finite element
(file Element2Dquad.h) ifndef
_Element2Dquad_H define _Element2Dquad_H include
ltiostream.hgt // C I/O class
library include Element.h // basic
finite element class include ../PFMemory/PfAlloc
.h // dynamic memory allocation include
../PFSystem/PfMathF.h // Math and Fortran
interface .. (next page)
35 Derived classes for various element types
(continued) class Element2Dquad public Element
protected char myname25 //
ascii name of an element public
Element2Dquad(int21, int9, int8, int2,
int4, char
2D p-Quad) friend ostream operator ltlt
(ostream out, Element2Dquad
em) void
DerivePLevels(PfAlloc P) ..other functions
.. int GenDofs(PfAlloc P, PfVecD pf, int
nelem, int ktype, int
npoin, int iprvr, int idsub, int numConstrt)
friend void FortranShared(SubDomainPE PE, Solver
solv, Element2Dquad
em) Element2Dquad( ) endif
36How about sending groups of data across
processors ? An example for sending mesh-related
information to other processors given in the
following slides. (will go over the actual
code) domain a prototype of a class domain
containing the entire mesh elregion, tempfield
temporary data structure to pack the data
before
message-passing Basic steps find out type of
data (can do hardware-probing)
define a new datatype for holding different
types of data pack data
send data via the new
datatype MPI 1.1 standard does not provide C
bindings for message-passing MPI 2.0 does have
bindings for C (All non-static member
functions in class MPI will be virtual
functions)
37include mpicomm.h void sendsubdom( domain
dom, PfVecI ProcElemList, PfVecI
LocalNodes, PfVecI Llnods, PfVecI
XLlnods,int nelem, int TotalProcElem, int
ndimLlnods, int npoinLocal, int iproc)
elregion eltTotalProcElem tempfield
fieldsnf int ie, blen1 1, 1,
blen2 1, 1, 1 int
nsize,je,lnode,ke,headernheader,
nN,ndata 0 double val0 PfVecD
flddata MPI_Comm comm
38MPI_Datatype elregion_type, tempfield_type
MPI_Datatype type12 MPI_INT, MPI_INT
MPI_Datatype type23 MPI_INT, MPI_INT,
MPI_INT MPI_Aint
disp12, disp23 comm MPI_COMM_WORLD //
gather element type,region index information, //
tags 4. Use hardware probing of lengths //
- safe implementation.
MPI_Address(elt0.reg, disp10)
MPI_Address(elt0.type, disp1) for (i1
i gt 0 i--) disp1i - disp10
MPI_Type_struct(2, blen1, disp1, type1,
elregion_type)
MPI_Type_commit(elregion_type)
39for ( ie 0 ie lt TotalProcElem ie )
eltie.type dom-gtelt
ProcElemList(ie) .type eltie.reg
dom-gtelt ProcElemList(ie) .reg
// gather field information - integer data
MPI_Address(fields0.type, disp20)
MPI_Address(fields0.dim, disp21)
MPI_Address(fields0.size, disp22) for
(i2 i gt 0 i--) disp2i - disp20
MPI_Type_struct(3, blen2, disp2, type2,
tempfield_type)
MPI_Type_commit(tempfield_type)
40for ( ie 0 ie lt dom-gtnf ie )
fieldsie.type dom-gtfld ie.type
fieldsie.dim dom-gtfld ie.dim
fieldsie.size dom-gtfld ie.dim
npoinLocal ndata
dom-gtfld ie.dim npoinLocal //
calculate headers as data packet 1 , tag 1
ndimLnods XLlnods(TotalProcElem)-XLlnods(0)
header(0) dom-gtdim header(1)
dom-gtnf header(2)
npoinLocal header(3) TotalProcElem
header(4) ndimLnods header(5) ndata
41// gather the data associated with the fields //
align the data field-wise in a vector of //
type double. flddata.resize(ndata) nsize
0 for ( ie 0 ie lt dom-gtnf ie )
for ( je 0 je lt fieldsie.dim je )
for ( ke 0 ke lt npoinLocal ke )
lnode LocalNodes(ke) nN dom-gtnN
val0 dom-gtfld ie.data je
(nN-1) lnode
flddata(nsizeje(npoinLocal-1)ke)
val0
42 nsize fieldsie.size //finally, send the
data packets.blocking call //
here. MPI_Send(header(0),nheader,MPI_INT,iproc,1
,comm) MPI_Send(XLlnods(0),TotalProcElem,MPI_INT
,iproc,2,comm) MPI_Send(Llnods(0),ndimLnods,MPI
_INT,iproc,3,comm) MPI_Send(elt0,TotalProcEle
m,elregion_type,iproc,4,comm) MPI_Send(fields0
,dom-gtnf,tempfield_type,iproc,5,comm) MPI_Send(f
lddata(0,ndata,MPI_DOUBLE,iproc,6,comm) //
deallocate everything here. // all structure
and sv objects .code deleted.
43Object-oriented scientific software on the web
Directories www.cs.bham.ac.uk/jdm/cpp.html
oonumerics.org/oon Public
Domain Finite Element/Finite Volume Software
MOUSE (finite volume, fire8.vug.uni-duisbur
g.de/MOUSE PETSc (www-fp.mcs.anl.gov/
petsc) Diffpack (used to be freenow
the group has started a
company. www.nobjects.com/Diffpack
slightly old freeware may still be
on the web ) Overture
(www.llnl.gov/casc/Overture ) FEMLib
(quattro.me.uiuc.edu/tiller/femlib.html
no longer supported )
Pfinics (contact abose_at_umich.edu, webpage
under
development) C Scientific Library (and other
excellent software) www.netlib.org
and nhse.cs.utk.edu
44Graph Partitioning Tools METIS, hMETIS and
ParMETIS
Objective Given a connectivity graph of vertices
and edges, we want to partition it into k
equal-sized parts such that the number of
interface edges is minimized. METIS and
ParMETIS are very popular graph partitioning
tools that o can also produce
fill-reducing orderings for sparse matrices o
have some support for partitioning finite
element/volume meshes o consistently produce
better partitions than spectral partitioning
algorithms (such as Chaco) o are
very fast (Metis can partition a graph with 1
million vertices in 256 parts in less
than 20 seconds on a typical desktop PC) For
very large graphs (e.g. an automobile or an
aircraft), the connectivity matrix can be very
large for these, one needs to solve the
partitioning problem in parallel
(ParMETIS) hMETIS is a serial hypergraph
partitioning tool (e.g. can be used in
partitioning large VLSI circuits) More
information and software are available at
http//www-users.cs.u
mn.edu/karypis/metis/
45Examples of Mesh Partitioning Using METIS and
ParMETIS (Courtesy Dr. Valmor de Almeida, Oak
Ridge National Laboratory)
46(Courtesy Dr. Valmor de Almeida, Oak Ridge
National Laboratory)
It is often necessary to reorder the partition
numbers as well the element numbers within a
partition
47(Courtesy Dr. Valmor de Almeida, Oak Ridge
National Laboratory)
48(No Transcript)
49Part II(a) Programming Techniques for Mesh
Partitioning
Given global domain with a finite element
or finite volume discretization,
associated physical properties, weights to
different parts of the domain Objective
generate a partition of the global
domain into a specified number of
subdomains such that gt all subdomains
have roughly equal number of unknowns
to compute gt the communication time
among neighboring subdomains is
minimized
50Part II(a) Programming Techniques for Mesh
Partitioning
Four components global domain data structure
(includes physical properties and a given
discretization) GlobalDomain.h
adjacency graph for the given finite
element discretization (requires knowledge
of specific element types) Element.h
(base class) graph partitioning software such
as Metis, Chaco generation of detailed
communication lists
51Data Structures for Mesh Partitioning
Component GlobalDomain object extern "C" //
METIS Graph generation Package Library Call int
KMETIS(int nelem, int Xadj, int adjncy, int
vwgts, int ewgts, int weightflag,
int nparts, int options, int
numbering, int edgecut, int
partition) int PMETIS(int nelem, int Xadj,
int adjncy, int vwgts, int ewgts,
int weightflag, int nparts, int
options, int numbering, int edgecut,
int partition)
52Data Structures for Mesh Partitioning
class Element2Dquad class GlobalDomain
private int ktype // elemnt type
int numPE // number of
colors/processors int numEdges // number
of edges in element graph int nelem
// number of elements in domain int dimn
// dimension of global domain int npoin
// number of grid points in domain int
idsub // p.d.e identification tag int
idfemtp // fem analysis type(1h, 2p,
3h-p) int numConstrt // number of
constraint variables int MaxmP //
maxm order of polynomial basis int iprvr
// number of primary vars int iscvr
// number of secondary vars int
dim_ielord // fortran routines use this value
53Data Structures for Mesh Partitioning
// Element graph-related data structure double
rdata3 // temporary real array PfVecI
ElemColor // partition level of elements
PfVecI lnods // nodal connectivity vector
PfVecI Xlnods // pointers to lnods
PfMatI ielord // p-level descriptor for
elements PfMatI ElemGraph // element
neighbor connectivities PfVecI Xadj //
adjacency pointers for elements PfVecI adjncy
// adjacency vector for elements PfMatD
coord // coordinates of all nodes public
GlobalDomain(int NumProcs)
GlobalDomain( ) friend void
ReadGlobalData(GlobalDomain domain,
Element2Dquad em,char fname)
friend void DrawPartitions(GlobalDomain domain,
Element2Dquad em,char
fname) void GetElemGraph( ) void
PrepProcFiles(char fname,int typeintfc)
54Data Structures for Mesh Partitioning
class Element // a base element class -
specific types are inherited. protected
int ktype_el // number of element type i.d.
int nnode // number of solution nodes
int ngnode // number of geometry
nodes int kbasis // element basis
function degree int nfc // number
of faces of this element int Maxndfc
// number of maxm. nodes per face int
maxnfc // maxm. number of faces allowed
int maxdofn // maxm. dofs per node
int MaxNevab // maxm. unknowns per element
char myname25 // ascii label of my name
public Element(int A,int
B,int C,int D,int E,int F, int G,
char H ) Element( ) //null constructor
for allocating memory friend ostream
operatorltlt(ostream out,
Element em) Element( )
55First Step create an adjacency graph
- Objective is to generate a graph of
connectivities among - the elements in the mesh, i.e. if element i is
connected - to element j via one or more nodes, there is
an edge - from i to j in the graph.
- Generating adjacency graphs is time-consuming,
especially - for large meshes,
-
- Faster algorithms available
- void GlobalDomainGetElemGraph( )
-
- int ielem, ipos
- /
- Calculate number of edges and element-connectivit
y graph from the given mesh. - /
- FindElemGraph(lnods(0),Xlnods(0),ElemGraph(0,0)
,nelem,numEdges)
56First Step create an adjacency graph
/ Now transfer domain.ElemGraph matrix to
vectors domain. Xadj and domain.adjncy and
deallocate domain.ElemGraph. /
Xadj.newsize(nelem 1) adjncy.newsize(2num
Edges) ipos 0 for (ielem 0
ielem lt nelem ielem)
Xadj(ielem) ipos ipos
ElemGraph(ielem,0) Xadj(nelem)
Xadj(nelem-1) ElemGraph(nelem-1,0) ipos
0 for (ielem 0 ielem lt nelem ielem)
int Jstrt Xadj(ielem)
int Jstop Xadj(ielem1) int entries
Jstop - Jstrt
57First Step create an adjacency graph
for ( int ientries 0 ientries lt entries
ientries ) adjncy(iposientries)
ElemGraph(ielem,ientries1) ipos ipos
entries
58Second Step partition element graph via Metis
// Metis Options (see Metis manual for more
information) int options4 int
weightflag 0 int numbering 0 int
edgecut 0 options0 0 if(
PMETIS(nelem,Xadj(0),adjncy(0),NULL,NULL,
weightflag, numPE,options0,numberin
g, edgecut,ElemColor(0)) )
cerr ltlt Problem in GlobalDomainGetElemGraph
module " exit(0) // output of this
function is Elemcolor array
59Linked List Structure PEElemList
60- Finally, we have the following
- a list of subdomains with ordered lists of
interior and boundary - element numbers
- a list of nodes on each of the interfaces that a
subdomain shares - with its neighbors
- linked lists or arrays for communication with
neighboring - subdomains
- a list of data objects (e.g. field values, mesh
partitions in case - of dynamic partitioning) that need to be
communicated with - neighbors, in the solver or during periodic
updates
61The next step Choosing a particular linear
system solver Options AZTEC (Parallel
iterative solvers)
(http//www.cs.sandia.gov/CRF/aztec1.html
) PARMS, SPARSKIT and PSPARSKIT (Parallel
iterative solvers)
(http//www-users.cs.umn.edu/saad/software/home.h
tml ) Parallel MA28 (LU factorization MPI
version, commercial)
(http//www.cse.clrc.ac.uk/Activity/HSL
) Pfinics (Schur complement method MPI
version, free academic license)
(abose_at_umich.edu) PETSc (Parallel preconditioned
Linear solvers) (http//www-fp.mcs.a
nl.gov/petsc/ )