Title: PARALLEL FULLY AUTOMATIC HPADAPTIVE 2D FINITE ELEMENT PACKAGE
1PARALLEL FULLY AUTOMATIC HP-ADAPTIVE 2D FINITE
ELEMENT PACKAGE
- Principal investigator
- Leszek Demkowicz (ICES, UT Austin)
- Team
- Maciej Paszynski (ICES, UT Austin)
- Jason Kurtz (ICES, UT Austin)
- Collaborators
- Waldemar Rachowicz (Cracow University of
Technology) - Timothy Walsh (Sandia National Laboratories)
- David Pardo (ICES, UT Austin)
- Dong Xue (ICES, UT Austin)
- Kent Milfeld (TACC, UT Austin )
2INTRODUCTION
- We are working on
- Parallel fully automatic hp-adaptive finite
element 2D and 3D codes - The code automatically produces a sequence of
optimal meshes with global exponential
convergence rate - Currently, we have running 2D version of the code
for the Laplace equation - All stages of the code are fully parallel
- The code will be soon extended to solve 3D
Hemholtz and time harmonic Maxwell equations - The work is driven by 3 Challenging Applications
- Simulation of EM waves in the human head
- Calculation of the Radar Cross-sections (3D
scattering problems) - Simulation of Logging While Drilling EM measuring
devices
3PLAN OF THE PRESENTATION
- General idea of hp-adaptive strategy
- Parallel data structure and data migration
- Parallel direct solver
- Parallel mesh refinements and mesh reconciliation
- Communication strategy
- Results
- Conclusions
4GENERAL IDEA OF HP-ADAPTIVE
STRATEGY
5 ORTHOTROPIC HEAT EQUATION
- 5 materials, some orthotropic some not
- large jumps in material data generate
singularities at 3-material interfaces
- requires anisotropic refinements
6COARSE MESH, FINE MESH AND OPTIMAL MESH
- Initial mesh coarse mesh
- for the 1st step
- of the iteration
Optimal mesh coarse mesh for the 2nd step of
the iteration
Fine mesh
Optimal mesh
7 8PARALLEL DATA STRUCTURES AND DATA MIGRATION
9PARALLEL DATA STRUCTURES
- Refinements trees are grown vertically
from the initial mesh on each process
- Each process generates initial mesh elements
in only a portion of the global
geometry
- Identical copies of global
geometry are stored on each process
10 DATA MIGRATION
- Load balancing performed by ZOLTAN library
- ZOLTAN provides 6 different
- domain decomposition algorithms
- Initial mesh elements together with
- refinements trees migrate through subdomains
11PARALLEL DIRECT SOLVER
12 PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS
- Both the coarse and fine mesh problems
- are solved using the parallel frontal solver
- Frontal solver extension of the Gaussian
elimination - Assembly Elimination performed together
- on the frontal submatrix of the global matrix
- Domain decomposition approach
Reference T.Walsh,L.Demkowicz A Parallel
Multifrontal Solver for hp-Adaptive Finite
Elements, TICAM Report 99-01
13 PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS
Global matrix
14 PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS
Distribution of the global matrix into processors
15 PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS
- Run the forward elimination stage
- with fake elements
- over each subdomain
16 PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS
After the forward elimination with fake elements,
frontal matrices contains contributions to the
interface problem
17 PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS
2. Formulate the interface problem
18 PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS
3. Solve the interface problem
- Broadcast the solution together with upper
triangular form - of the interface problem matrix
19 PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS
The backward substitution can be finally run in
parallel, over each subdomain
20PARALLEL MESH REFINEMENTSANDMESH RECONCILIATION
21 MESH REGULARITY
Each quad element has 4 vertex nodes 4
mid-edge nodes 1 middle node
22 MESH REGULARITY
Isotropic h-refinement Element is broken
into 4 sons (in horizontal and vertical
directions) Big element has 2 smaller
neighbors Mid-edge nodes and common vertex node
of smaller elements remain constrained with no
new nodes generated Mesh with constrained nodes
is called irregular mesh
23 MESH REGULARITY
It may happen that smaller elements need to be
broken once again. Newly created nodes are called
multiply constrained nodes Multiply constrained
nodes are not allowed.
24 MESH REGULARITY
1-irregularity rule Edge of given element can
be broken only once, without breaking neighboring
elements Neighboring element must be broken in
this example
25 PARALLEL MESH REFINEMENTSEXAMPLE 1
26 PARALLEL MESH REFINEMENTSEXAMPLE 1
27 PARALLEL MESH RECONCILIATIONEXAMPLE 1
28 PARALLEL MESH RECONCILIATIONEXAMPLE 1
29 PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONEXAMPLE 1
Conclusion It is required to exchange refinement
trees data between neighboring subdomains.
30 PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONCODING THE REFINEMENT TREES
Refinement trees are exchanged between
neighboring subdomains in a compressed binary
format
31 MESH RECONCILIATIONEXAMPLE 2
Constrained nodes
Coarse mesh
Global hp-refinement -gt Fine mesh
Optimal mesh
32 PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONEXAMPLE 2
Conclusion It is required to exchange
information about constrained nodes located at
the same place on both neighboring subdomains The
mesh reconciliation must be performed after
creation of the fine mesh (after global
hp-refinement) and also after creation of the
optimal mesh
33 PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONEXAMPLE 3
When interface edge with constrained node is
broken orders of newly created nodes must be
established It is done by comparing coarse and
fine grid solutions over the edge But in current
subdomain there is no element having this entire
edge We need to ask neighboring subdomains for
orders of just created nodes
34 PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONEXAMPLE 3
Conclusion After breaking interface edge with
constrained node, it is necessary to ask
neighboring subdomain for order of approximation
of newly created nodes
35 PARALLEL MESH REFINEMENTSSUMMARY
The mesh refinements algorithm is running on each
subdomain separately 1-irregularity rule is
enforced The rule is telling that edge of given
element can be broken only once, without breaking
neighboring elements Nodes situated on the global
interface are treated at the same way as
internal nodes After parallel mesh refinements it
is necessary to run
the mesh reconciliation algorithm
36 PARALLEL MESH RECONCILIATIONSUMMARY
Two adjacent elements from neighboring subdomains
The first is not refined, the second one is
refined
Create constrained node on the interface edge (in
order to have the same number of degrees of
freedom)
37 PARALLEL MESH RECONCILIATIONSUMMARY
Two adjacent elements from neighboring subdomains
Both refined
Create constrained nodes on the interface edges
Exchange constrained nodes data between subdomains
Remove interface constrained nodes situated at
the same place on both subdomains
38 PARALLEL MESH RECONCILIATIONSUMMARY
Two adjacent elements from neighboring subdomains
The first one is not refined, the second one is
refined
Exchange refinement trees between subdomains
Second one is refined once again
Break the element
Remove constrained node
Create constrained node
39 PARALLEL MESH REFINEMENTSSUMMARY
- We can summarize our algorithm in the following
stages - Parallel mesh refinements
- Exchange information about interface edge
refinement trees, constrained nodes and orders of
approximation along the interface - Mesh reconciliation
- The repetition of stages 2 and 3 may be required
if some of the interface edges were modified
during the last iteration.
40RESULTS
41 RESULTSTHE LAPLACE EQUATION OVER L-SHAPE DOMAIN
Optimal mesh obtained after parallel iterations
over 3 subdomains. Exponential convergence is
obtained to the accuracy of 1 relative error.
42 RESULTSTHE BATTERY PROBLEM
The solution with the accuracy of 0.1 relative
error. Exponential convergence curve for the
parallel execution (16 processors)
43 RESULTSTHE BATTERY PROBLEM
Optimal mesh obtained after parallel iterations
over 15 subdomains giving the accuracy of 0.1
relative error.
44 RESULTSTHE BATTERY PROBLEM
Partition of hp-refined mesh into 15 processors
45 PERFORMANCE MEASURMENTS
Computation times for each part of the sequential
code, measured during each hp-adaptive iteration
46 PERFORMANCE MEASURMENTS
Computation times for each part of the parallel
code, executed over 16 processors measured
during each hp-adaptive iteration
47 PERFORMANCE MEASURMENTS
Computation times for each part of the parallel
code, executed over 32 processors measured
during each hp-adaptive iteration
48 PERFORMANCE MEASURMENTS
When load over one element is higher then overall
load for all other elements, the optimal load
balance uses only 2 processors
49 PERFORMANCE MEASURMENTS
Load distribution over 32 processors during
particular iterations Only 16 processors are
working
50 PERFORMANCE MEASURMENTS
- The load balancing is performed on the level of
initial mesh elements. - In the Sandia battery problem, there are 16
initial mesh elements covering areas with
strongest singularities. - Most of hp-refinements are required in the
neighborhood of these areas. - Optimal load balancing needs only 16 processors.
51 PERFORMANCE MEASURMENTS
- Observations
- When a singularity is covered by initial mesh
element, then computational load over this
element is high - Is the load balance on the higher level then
initial mesh elements necessary? - The singularity inside the initial mesh element
can be solved with accuracy 0.1 relative error
within 20 seconds
52 PERFORMANCE MEASURMENTS
Computation times for each part of the parallel
code, executed over 16 processors measured
during each hp-adaptive iteration
53 PERFORMANCE MEASURMENTS
- Observations
- When a singularity is covered by initial mesh
element, then computational load over this
element is high - Is the load balance on the level of initial mesh
elements enough? - The singularity within initial mesh element can
be solved with accuracy 0.1 relative error
within 20 seconds - The problem over any larger subdomain can also be
solved with accuracy 0.1 relative error within
20 seconds - (assuming the communication scales well)
54 PERFORMANCE MEASURMENTS
- Observations
- Number of hp-adaptive strategy iterations
required to obtain the solution with given
accuracy is LOWER in the parallel implementation - (parallel direct solver basing on domain
decomposition approach has less error than direct
solver over entire domain)
55CONCLUSIONS
- We have developed the parallel fully automatic
hp-adaptive 2D code for the Laplace equation,
where - Load balancing is performed by ZOLTAN library
- Both coarse and fine mesh problems are solved by
the parallel frontal solver - Mesh is refined fully in parallel
- References
- M.Paszynski, J.Kurtz,L.Demkowicz Parallel Fully
Automatic hp-Adaptive 2D Finite Element Package,
ICES Report 04-07 - M.Paszynski,K.Milfeld h-Relation Personalized
Communication Strategy For HP-Adaptive
Computations , ICES Report 04-40
56FUTURE WORK
- Future work will include
- Implementation of parallel version of 3D code
- Extending the code to be able to solve 3D
Hemholtz and
time-harmonic Maxwell equations - Parallel version of two grid solver
- Challenging applications
- Simulation of EM waves in the human head
- Calculation of the Radar Cross-sections (3D
scattering problems) - Simulation of Logging While Drilling EM
measuring devices
- References
- M.Paszynski, J.Kurtz,L.Demkowicz Parallel Fully
Automatic hp-Adaptive 2D Finite Element Package,
ICES Report 04-07 - M.Paszynski,K.Milfeld h-Relation Personalized
Communication Strategy For HP-Adaptive
Computations , ICES Report 04-40