PARALLEL FULLY AUTOMATIC HPADAPTIVE 2D FINITE ELEMENT PACKAGE - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

PARALLEL FULLY AUTOMATIC HPADAPTIVE 2D FINITE ELEMENT PACKAGE

Description:

The backward substitution can be finally run in parallel, over each subdomain ... But in current subdomain there is no element having this entire edge ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 57

Provided by: maciejpa

Category:

more less

Transcript and Presenter's Notes

Title: PARALLEL FULLY AUTOMATIC HPADAPTIVE 2D FINITE ELEMENT PACKAGE

1
PARALLEL FULLY AUTOMATIC HP-ADAPTIVE 2D FINITE
ELEMENT PACKAGE

Principal investigator
Leszek Demkowicz (ICES, UT Austin)
Team
Maciej Paszynski (ICES, UT Austin)
Jason Kurtz (ICES, UT Austin)
Collaborators
Waldemar Rachowicz (Cracow University of
Technology)
Timothy Walsh (Sandia National Laboratories)
David Pardo (ICES, UT Austin)
Dong Xue (ICES, UT Austin)
Kent Milfeld (TACC, UT Austin )

2
INTRODUCTION

We are working on
Parallel fully automatic hp-adaptive finite
element 2D and 3D codes
The code automatically produces a sequence of
optimal meshes with global exponential
convergence rate
Currently, we have running 2D version of the code
for the Laplace equation
All stages of the code are fully parallel
The code will be soon extended to solve 3D
Hemholtz and time harmonic Maxwell equations
The work is driven by 3 Challenging Applications
Simulation of EM waves in the human head
Calculation of the Radar Cross-sections (3D
scattering problems)
Simulation of Logging While Drilling EM measuring
devices

3
PLAN OF THE PRESENTATION

General idea of hp-adaptive strategy
Parallel data structure and data migration
Parallel direct solver
Parallel mesh refinements and mesh reconciliation
Communication strategy
Results
Conclusions

4
GENERAL IDEA OF HP-ADAPTIVE
STRATEGY

5

ORTHOTROPIC HEAT EQUATION

5 materials, some orthotropic some not

large jumps in material data generate
singularities at 3-material interfaces

requires anisotropic refinements

6
COARSE MESH, FINE MESH AND OPTIMAL MESH

Initial mesh coarse mesh
for the 1st step
of the iteration

Optimal mesh coarse mesh for the 2nd step of
the iteration
Fine mesh
Optimal mesh
7

8
PARALLEL DATA STRUCTURES AND DATA MIGRATION

9
PARALLEL DATA STRUCTURES

Refinements trees are grown vertically
from the initial mesh on each process

Each process generates initial mesh elements
in only a portion of the global
geometry

Identical copies of global
geometry are stored on each process

10

DATA MIGRATION

Load balancing performed by ZOLTAN library
ZOLTAN provides 6 different
domain decomposition algorithms
Initial mesh elements together with
refinements trees migrate through subdomains

11
PARALLEL DIRECT SOLVER

12

PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS

Both the coarse and fine mesh problems
are solved using the parallel frontal solver
Frontal solver extension of the Gaussian
elimination
Assembly Elimination performed together
on the frontal submatrix of the global matrix
Domain decomposition approach

Reference T.Walsh,L.Demkowicz A Parallel
Multifrontal Solver for hp-Adaptive Finite
Elements, TICAM Report 99-01
13

PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS

Global matrix
14

PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS

Distribution of the global matrix into processors
15

PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS

Run the forward elimination stage
with fake elements
over each subdomain

16

PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS

After the forward elimination with fake elements,
frontal matrices contains contributions to the
interface problem
17

PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS

2. Formulate the interface problem
18

PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS

3. Solve the interface problem

Broadcast the solution together with upper
triangular form
of the interface problem matrix

19

PARALLEL FRONTAL SOLVERWITH FAKE ELEMENTS

The backward substitution can be finally run in
parallel, over each subdomain
20
PARALLEL MESH REFINEMENTSANDMESH RECONCILIATION

21

MESH REGULARITY
Each quad element has 4 vertex nodes 4
mid-edge nodes 1 middle node

22

MESH REGULARITY
Isotropic h-refinement Element is broken
into 4 sons (in horizontal and vertical
directions) Big element has 2 smaller
neighbors Mid-edge nodes and common vertex node
of smaller elements remain constrained with no
new nodes generated Mesh with constrained nodes
is called irregular mesh

23

MESH REGULARITY

It may happen that smaller elements need to be
broken once again. Newly created nodes are called
multiply constrained nodes Multiply constrained
nodes are not allowed.
24

MESH REGULARITY

1-irregularity rule Edge of given element can
be broken only once, without breaking neighboring
elements Neighboring element must be broken in
this example
25

PARALLEL MESH REFINEMENTSEXAMPLE 1
26

PARALLEL MESH REFINEMENTSEXAMPLE 1
27

PARALLEL MESH RECONCILIATIONEXAMPLE 1
28

PARALLEL MESH RECONCILIATIONEXAMPLE 1
29

PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONEXAMPLE 1
Conclusion It is required to exchange refinement
trees data between neighboring subdomains.
30

PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONCODING THE REFINEMENT TREES
Refinement trees are exchanged between
neighboring subdomains in a compressed binary
format
31

MESH RECONCILIATIONEXAMPLE 2

Constrained nodes
Coarse mesh
Global hp-refinement -gt Fine mesh
Optimal mesh
32

PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONEXAMPLE 2
Conclusion It is required to exchange
information about constrained nodes located at
the same place on both neighboring subdomains The
mesh reconciliation must be performed after
creation of the fine mesh (after global
hp-refinement) and also after creation of the
optimal mesh
33

PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONEXAMPLE 3
When interface edge with constrained node is
broken orders of newly created nodes must be
established It is done by comparing coarse and
fine grid solutions over the edge But in current
subdomain there is no element having this entire
edge We need to ask neighboring subdomains for
orders of just created nodes
34

PARALLEL MESH REFINEMENTSAND MESH
RECONCILIATIONEXAMPLE 3
Conclusion After breaking interface edge with
constrained node, it is necessary to ask
neighboring subdomain for order of approximation
of newly created nodes
35

PARALLEL MESH REFINEMENTSSUMMARY
The mesh refinements algorithm is running on each
subdomain separately 1-irregularity rule is
enforced The rule is telling that edge of given
element can be broken only once, without breaking
neighboring elements Nodes situated on the global
interface are treated at the same way as
internal nodes After parallel mesh refinements it
is necessary to run
the mesh reconciliation algorithm

36

PARALLEL MESH RECONCILIATIONSUMMARY
Two adjacent elements from neighboring subdomains

The first is not refined, the second one is
refined
Create constrained node on the interface edge (in
order to have the same number of degrees of
freedom)
37

PARALLEL MESH RECONCILIATIONSUMMARY
Two adjacent elements from neighboring subdomains

Both refined
Create constrained nodes on the interface edges
Exchange constrained nodes data between subdomains
Remove interface constrained nodes situated at
the same place on both subdomains
38

PARALLEL MESH RECONCILIATIONSUMMARY
Two adjacent elements from neighboring subdomains

The first one is not refined, the second one is
refined
Exchange refinement trees between subdomains
Second one is refined once again
Break the element
Remove constrained node
Create constrained node
39

PARALLEL MESH REFINEMENTSSUMMARY

We can summarize our algorithm in the following
stages
Parallel mesh refinements
Exchange information about interface edge
refinement trees, constrained nodes and orders of
approximation along the interface
Mesh reconciliation
The repetition of stages 2 and 3 may be required
if some of the interface edges were modified
during the last iteration.

40
RESULTS

41

RESULTSTHE LAPLACE EQUATION OVER L-SHAPE DOMAIN

Optimal mesh obtained after parallel iterations
over 3 subdomains. Exponential convergence is
obtained to the accuracy of 1 relative error.
42

RESULTSTHE BATTERY PROBLEM

The solution with the accuracy of 0.1 relative
error. Exponential convergence curve for the
parallel execution (16 processors)
43

RESULTSTHE BATTERY PROBLEM

Optimal mesh obtained after parallel iterations
over 15 subdomains giving the accuracy of 0.1
relative error.
44

RESULTSTHE BATTERY PROBLEM

Partition of hp-refined mesh into 15 processors
45

PERFORMANCE MEASURMENTS

Computation times for each part of the sequential
code, measured during each hp-adaptive iteration
46

PERFORMANCE MEASURMENTS
Computation times for each part of the parallel
code, executed over 16 processors measured
during each hp-adaptive iteration
47

PERFORMANCE MEASURMENTS
Computation times for each part of the parallel
code, executed over 32 processors measured
during each hp-adaptive iteration
48

PERFORMANCE MEASURMENTS
When load over one element is higher then overall
load for all other elements, the optimal load
balance uses only 2 processors
49

PERFORMANCE MEASURMENTS
Load distribution over 32 processors during
particular iterations Only 16 processors are
working
50

PERFORMANCE MEASURMENTS

The load balancing is performed on the level of
initial mesh elements.
In the Sandia battery problem, there are 16
initial mesh elements covering areas with
strongest singularities.
Most of hp-refinements are required in the
neighborhood of these areas.
Optimal load balancing needs only 16 processors.

51

PERFORMANCE MEASURMENTS

Observations
When a singularity is covered by initial mesh
element, then computational load over this
element is high
Is the load balance on the higher level then
initial mesh elements necessary?
The singularity inside the initial mesh element
can be solved with accuracy 0.1 relative error
within 20 seconds

52

PERFORMANCE MEASURMENTS
Computation times for each part of the parallel
code, executed over 16 processors measured
during each hp-adaptive iteration
53

PERFORMANCE MEASURMENTS

Observations
When a singularity is covered by initial mesh
element, then computational load over this
element is high
Is the load balance on the level of initial mesh
elements enough?
The singularity within initial mesh element can
be solved with accuracy 0.1 relative error
within 20 seconds
The problem over any larger subdomain can also be
solved with accuracy 0.1 relative error within
20 seconds
(assuming the communication scales well)

54

PERFORMANCE MEASURMENTS

Observations
Number of hp-adaptive strategy iterations
required to obtain the solution with given
accuracy is LOWER in the parallel implementation
(parallel direct solver basing on domain
decomposition approach has less error than direct
solver over entire domain)

55
CONCLUSIONS

We have developed the parallel fully automatic
hp-adaptive 2D code for the Laplace equation,
where
Load balancing is performed by ZOLTAN library
Both coarse and fine mesh problems are solved by
the parallel frontal solver
Mesh is refined fully in parallel

References
M.Paszynski, J.Kurtz,L.Demkowicz Parallel Fully
Automatic hp-Adaptive 2D Finite Element Package,
ICES Report 04-07
M.Paszynski,K.Milfeld h-Relation Personalized
Communication Strategy For HP-Adaptive
Computations , ICES Report 04-40

56
FUTURE WORK

Future work will include
Implementation of parallel version of 3D code
Extending the code to be able to solve 3D
Hemholtz and
time-harmonic Maxwell equations
Parallel version of two grid solver
Challenging applications
Simulation of EM waves in the human head
Calculation of the Radar Cross-sections (3D
scattering problems)
Simulation of Logging While Drilling EM
measuring devices

References
M.Paszynski, J.Kurtz,L.Demkowicz Parallel Fully
Automatic hp-Adaptive 2D Finite Element Package,
ICES Report 04-07
M.Paszynski,K.Milfeld h-Relation Personalized
Communication Strategy For HP-Adaptive
Computations , ICES Report 04-40

Write a Comment

User Comments (0)