Introduction to PETSc

About This Presentation

Title:

Introduction to PETSc

Description:

The same program goes to all. Each uses separate memory ... The control node (0) reads in the matrix and distributes the rows amongst the processors. ... – PowerPoint PPT presentation

Number of Views:163

Avg rating:3.0/5.0

Slides: 114

Provided by: tis4

Learn more at: https://www.cmor-faculty.rice.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to PETSc

1
Introduction to PETSc

VIGRE Seminar, Wednesday, November 8, 2006

2
Parallel Computing

How (basically) does it work?

3
Parallel Computing

How (basically) does it work?
Assign each processor a number

4
Parallel Computing

How (basically) does it work?
Assign each processor a number
The same program goes to all

5
Parallel Computing

How (basically) does it work?
Assign each processor a number
The same program goes to all
Each uses separate memory

6
Parallel Computing

How (basically) does it work?
Assign each processor a number
The same program goes to all
Each uses separate memory
They pass information back and forth as necessary

7
Parallel Computing

Example 1 Matrix-Vector Product

8
Parallel Computing

Example 1 Matrix-Vector Product

and are inputs into the program. 0
and are inputs into the program. 1
and are inputs into the program. 2
9
Parallel Computing

Example 1 Matrix-Vector Product

The control node (0) reads in the matrix and distributes the rows amongst the processors. 0 (a, b, c)
The control node (0) reads in the matrix and distributes the rows amongst the processors. 1 (d, e, f)
The control node (0) reads in the matrix and distributes the rows amongst the processors. 2 (g, h, i)
10
Parallel Computing

Example 1 Matrix-Vector Product

The control node also sends the vector to each processors memory. 0 (a, b, c) (j, k, l)
The control node also sends the vector to each processors memory. 1 (d, e, f) (j, k, l)
The control node also sends the vector to each processors memory. 2 (g, h, i) (j, k, l)
11
Parallel Computing

Example 1 Matrix-Vector Product

Each processor computes its own dot product. 0 (a, b, c) (j, k, l) ajbkcl
Each processor computes its own dot product. 1 (d, e, f) (j, k, l) djekfl
Each processor computes its own dot product. 2 (g, h, i) (j, k, l) gjhkil
12
Parallel Computing

Example 1 Matrix-Vector Product

The processors send their results to the control node, which outputs . 0 (a, b, c) (j, k, l) ajbkcl
The processors send their results to the control node, which outputs . 1 (d, e, f) (j, k, l) djekfl
The processors send their results to the control node, which outputs . 2 (g, h, i) (j, k, l) gjhkil
13
Parallel Computing

Example 2 Matrix-Vector Product

Suppose for memory reasons each processor only has part of the vector. 0 (a, b, c) j
Suppose for memory reasons each processor only has part of the vector. 1 (d, e, f) k
Suppose for memory reasons each processor only has part of the vector. 2 (g, h, i) l
14
Parallel Computing

Example 2 Matrix-Vector Product

Before the multiply, each processor sends the necessary information elsewhere. 0 (a, b, c) j (k from 1) (l from 2)
Before the multiply, each processor sends the necessary information elsewhere. 1 (d, e, f) (j from 0) k (l from 2)
Before the multiply, each processor sends the necessary information elsewhere. 2 (g, h, i) (j from 0) (k from 1) l
15
Parallel Computing

Example 2 Matrix-Vector Product

After the multiply, the space is freed again for other uses. 0 (a, b, c) j
After the multiply, the space is freed again for other uses. 1 (d, e, f) k
After the multiply, the space is freed again for other uses. 2 (g, h, i) l
16
Parallel Computing

Example 3 Matrix-Matrix Product

The previous case illustrates how to multiply matrices stored across multiple processors. 0 (a, b, c) (j, k, l)
The previous case illustrates how to multiply matrices stored across multiple processors. 1 (d, e, f) (m, n, o)
The previous case illustrates how to multiply matrices stored across multiple processors. 2 (g, h, i) (p, q, r)
17
Parallel Computing

Example 3 Matrix-Matrix Product

Each column is distributed for processing in turn. 1) (a, b, c)(j, m, p)a 0 2) (a, b, c)(k, n, q)ß 3) (a, b, c)(l, o, r)?
Each column is distributed for processing in turn. 1) (d, e, f)(j, m, p)d 1 2) (d, e, f)(k, n, q)e 3) (d, e, f)(l, o, r)?
Each column is distributed for processing in turn. 1) (g, h ,i)(j, m, p)? 2 2) (g, h, i)(k, n, q)? 3) (g, h, i)(l, o, r)?
18
Parallel Computing

Example 3 Matrix-Matrix Product

The result is a matrix with the same parallel row structure as the first matrix and column structure as the right. 0 (a, ß, ?)
The result is a matrix with the same parallel row structure as the first matrix and column structure as the right. 1 (d, e, ?)
The result is a matrix with the same parallel row structure as the first matrix and column structure as the right. 2 (?, ?, ?)
19
Parallel Computing

Example 3 Matrix-Matrix Product

The original indices could also have been sub-matrices, as long as they were compatible. 0 (a, ß, ?)
The original indices could also have been sub-matrices, as long as they were compatible. 1 (d, e, ?)
The original indices could also have been sub-matrices, as long as they were compatible. 2 (?, ?, ?)
20
Parallel Computing

Example 4 Block Diagonal Product

Suppose the second matrix is block diagonal. 0 (A, B, C) (J, 0, 0)
Suppose the second matrix is block diagonal. 1 (D, E, F) (0, K, 0)
Suppose the second matrix is block diagonal. 2 (G, H, I) (0, 0, L)
21
Parallel Computing

Example 4 Block Diagonal Product

Much less information needs to be passed between the processors. 1) AJa 0 2) BKß 3) CL?
Much less information needs to be passed between the processors. 1) DJd 1 2) EKe 3) FL?
Much less information needs to be passed between the processors. 1) GJ? 2 2) HK? 3) IL?
22
Parallel Computing

When is it worth it to parallelize?

23
Parallel Computing

When is it worth it to parallelize?
There is a time cost associated with passing
messages

24
Parallel Computing

When is it worth it to parallelize?
There is a time cost associated with passing
messages
The amount of message passing is dependent on the
problem and the program (algorithm)

25
Parallel Computing

When is it worth it to parallelize?
Therefore, the benefits depend more on the
structure of the problem and the program than on
the size/speed of the parallel network
(diminishing returns).

26
Parallel Networks

How do I use multiple processors?

27
Parallel Networks

How do I use multiple processors?
This depends on the network, but
Most networks use some variation of PBS, a job
scheduler, and mpirun or mpiexec.

28
Parallel Networks

How do I use multiple processors?
This depends on the network, but
Most networks use some variation of PBS, a job
scheduler, and mpirun or mpiexec.
A parallel program needs to be submitted as a
batch job.

29
Parallel Networks

Suppose I have a program myprog, which gets data
from data.dat, which I call in the following
fashion when only using one processor
./myprog f data.dat
I would write a file myprog.pbs that looks like
the following

30
Parallel Networks

PBS q compute (name of the processing queue
not necessary on all networks)
PBS -N myprog (the name of the job)
PBS l nodes2ppn1,walltime001000 (number
of nodes and number of processes per node,
maximum time to allow the program to run)
PBS -o /home/me/mydir/myprog.out (where the
output of the program should be written)
PBS -e /home/me/mydir/myprog.err (where the
error stream should be written)
These are the headers that tell the job scheduler
how to handle your job.

31
Parallel Networks

Although what follows depends on the MPI software
that the network runs, it should look something
like this
cd PBS_O_WORKDIR (makes the processors run the
program in the directory where myprog.pbs is
saved)
mpirun machinefile PBS_NODEFILE np 2 myprog
f mydata.dat (tells the MPI software which
processes to use and how many processes to start
notice that command line arguments follows as
usual)

32
Parallel Networks

Once the .pbs file is written, it can be
submitted to the job scheduler with qsub
qsub myprog.pbs

33
Parallel Networks

Once the .pbs file is written, it can be
submitted to the job scheduler with qsub
qsub myprog.pbs
You can check to see if your job is running with
the command qstat.

34
Parallel Networks

Some systems (but not all) will allow you to
simulate running your program in parallel on one
processor, which is useful for debugging
mpirun np 3 myprog f mydata.dat

35
Parallel Networks

What parallel systems are available?

36
Parallel Networks

What parallel systems are available?
RTC Rice Terascale Cluster 244 processors.

37
Parallel Networks

What parallel systems are available?
RTC Rice Terascale Cluster 244 processors.
ADA Cray XD1 632 processors.

38
Parallel Networks

What parallel systems are available?
RTC Rice Terascale Cluster 244 processors.
ADA Cray XD1 632 processors.
caamster CAAM department exclusive 8(?)
processors.

39
PETSc

What do I use PETSc for?

40
PETSc

What do I use PETSc for?
File I/O with minimal understanding of MPI

41
PETSc

What do I use PETSc for?
File I/O with minimal understanding of MPI
Vector and matrix based data management (in
particular sparse)

42
PETSc

What do I use PETSc for?
File I/O with minimal understanding of MPI
Vector and matrix based data management (in
particular sparse)
Linear algebra routines familiar from the famous
serial packages

43
PETSc

At the moment, ada and caamster (and harvey) have
PETSc installed

44
PETSc

At the moment, ada and caamster (and harvey) have
PETSc installed
You can download and install PETSc on your own
machine (requires cygwin for Windows), for
educational and debugging purposes

45
PETSc

PETSc builds on existing software BLAS and
LAPACK which implementations to use can be
specified at configuration

46
PETSc

PETSc builds on existing software BLAS and
LAPACK which implementations to use can be
specified at configuration
Has (slower) debugging configuration and (faster,
tacit) optimized configuration

47
PETSc

Installation comes with documentation, examples,
and manual pages.

48
PETSc

Installation comes with documentation, examples,
and manual pages.
The biggest part of learning how to use PETSc is
learning how to use the manual pages.

49
PETSc

It is extremely useful to have an environmental
variable PETSC_DIR in you shell of choice, which
gives the path to the installation of PETSc, e.g.
PETSC_DIR/usr/local/src/petsc-2.3.1-p13/
export PETSC_DIR

50
PETSc

Makefile

51
PETSc

Makefile
You can pretty much copy/paste/modify the
makefiles in the examples, but here is the basic
setup

52
PETSc

Makefile
() (Other definitions for CFLAGS, etc.)
LOCDIR /mydir
include PETSC_DIR/bmake/common/base
(This is why it is useful to have this variable
saved)
myprog myprog.o chkopts
-CLINKER -o myprog myprog.o
PETSC_LIB
RM myprog.o

53
PETSc

Headers

54
PETSc

Headers
include petsc.h in all files, unless the
routines that you use need more specific headers.

55
PETSc

Headers
include petsc.h in all files, unless the
routines that you use need more specific headers.
How do you know? Consult the manual pages!

56
PETSc

Data Types

57
PETSc

Data Types
PETSc has a slew of its own data types PetscInt,
PetscReal, PetscScalar, etc.

58
PETSc

Data Types
PETSc has a slew of its own data types PetscInt,
PetscReal, PetscScalar, etc.
Usually aliases of normal data types PetscInt
int, PetscReal double

59
PETSc

Data Types
PETSc has a slew of its own data types PetscInt,
PetscReal, PetscScalar, etc.
Usually aliases of normal data types PetscInt
int, PetscReal double
Safer to use for compatibility

60
PETSc

Usage in C/C

61
PETSc

Usage in C/C
The top program should begin
Static char helpYour message here.
int main(int argc,char argv)
( declarations)
PetscInitialize(argc,argv,PETSC_NULL,help)

62
PETSc

Usage in C/C
The top program should end
()
PetscFinalize()
return(0)

63
PETSc

Usage in C/C
When first programming, include the following
variable
PetscErrorCode ierr
Where youd call a PETSc routine,
Routine(arg)
write instead
ierrRouting(arg)CHKERRQ(ierr)

64
PETSc

Usage in C/C
When you try to run your program, you will be
informed of any problems with incompatible data
types/dimensions/etc.

65
PETSc

Data
Anything data type larger than a scalar has a
Create and a Destroy routine.

66
PETSc

Data
Anything data type larger than a scalar has a
Create and a Destroy routine.
If you run ./myprog log_summary, you get
created and destroyed for each data type, to
find memory leaks.

67
PETSc

Example Vec

68
PETSc

Example Vec
Two types global and local

69
PETSc

Example Vec
Two types global and local
Dependent on function do other processors need
to see this data?

70
PETSc

Example Vec
Two types global and local
Dependent on function do other processors need
to see this data?
Basic usage
Vec X
VecCreate(PETSC_COMM_WORLD/PETSC_COMM_SELF,X)

71
PETSc

Example Vec
Advanced usage

72
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)

73
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)

74
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
VecLoad(instream,VECSEQ,X)

75
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
VecLoad(instream,VECSEQ,X)
VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
)

76
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
VecLoad(instream,VECSEQ,X)
VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
)
VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)

77
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
VecLoad(instream,VECSEQ,X)
VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
)
VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
)

78
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
VecLoad(instream,VECSEQ,X)
VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
)
VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
)
VecLoad(instream,VECMPI,X)

79
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
VecLoad(instream,VECSEQ,X)
VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
)
VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
)
VecLoad(instream,VECMPI,X)
VecDuplicate(Y,X)

80
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
VecLoad(instream,VECSEQ,X)
VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
)
VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
)
VecLoad(instream,VECMPI,X)
VecDuplicate(Y,X)
MatGetVecs(M,X,PETSC_NULL)

81
PETSc

Example Vec
Advanced usage
VecCreateSeq(PETSC_COMM_SELF,n,X)
VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
VecLoad(instream,VECSEQ,X)
VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
)
VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
)
VecLoad(instream,VECMPI,X)
VecDuplicate(Y,X)
MatGetVecs(M,X,PETSC_NULL)
MatGetVecs(M,PETSC_NULL,X)

82
PETSc

Example Vec
If not created with array or loaded from file,
values still needed

83
PETSc

Example Vec
If not created with array or loaded from file,
values still needed
To copy the values of another Vec, with the same
parallel structure, use VecCopy(Y,X).

84
PETSc

Example Vec
If not created with array or loaded from file,
values still needed
To copy the values of another Vec, with the same
parallel structure, use VecCopy(Y,X).
To set all values to a single scalar value, use
VecSet(X,alpha).

85
PETSc

Example Vec
There are routines for more complicated ways to
set values

86
PETSc

Example Vec
There are other routines for more complicated
ways to set values
PETSc guards the block of data where the actual
values are stored very closely

87
PETSc

Example Vec
There are other routines for more complicated
ways to set values
PETSc guards the block of data where the actual
values are stored very closely
An assembly routine must be called after these
other routines

88
PETSc

Example Vec
Other routines

89
PETSc

Example Vec
Other routines
VecSetValue

90
PETSc

Example Vec
Other routines
VecSetValue
VecSetValueLocal (different indexing used)

91
PETSc

Example Vec
Other routines
VecSetValue
VecSetValueLocal (different indexing used)
VecSetValues

92
PETSc

Example Vec
Other routines
VecSetValue
VecSetValueLocal (different indexing used)
VecSetValues
VecSetValuesLocal

93
PETSc

Example Vec
Other routines
VecSetValue
VecSetValueLocal (different indexing used)
VecSetValues
VecSetValuesLocal
VecSetValuesBlocked

94
PETSc

Example Vec
Other routines
VecSetValue
VecSetValueLocal (different indexing used)
VecSetValues
VecSetValuesLocal
VecSetValuesBlocked
VecSetValuesBlockedLocal

95
PETSc

Example Vec
Once a vector is assembled, there are routines
for (almost) every function we could want from a
vector AXPY, dot product, absolute value,
pointwise multiplication, etc.

96
PETSc

Example Vec
Once a vector is assembled, there are routines
for (almost) every function we could want from a
vector AXPY, dot product, absolute value,
pointwise multiplication, etc.
Call VecDestroy(X) to free its array when it
isnt needed anymore.

97
PETSc

Example Mat

98
PETSc

Example Mat
Like Vec, a Mat can be global or local (MPI/Seq)

99
PETSc

Example Mat
Like Vec, a Mat can be global or local (MPI/Seq)
A Mat can take on a large number of data
structures to optimize and \, even though the
same routine is used on all structures.

100
PETSc

Example Mat
Row compressed
Block row compressed
Symmetric block row compressed
Block diagonal
And even dense

101
PETSc

File I/O

102
PETSc

File I/O
The equivalent to a stream is a viewer.

103
PETSc

File I/O
PETSc has equivalent routines to printf, but you
must decide if you want every node to print or
just the control node

104
PETSc

File I/O
PETSc has equivalent routines to printf, but you
must decide if you want every node to print or
just the control node
To ensure clarity when multiple nodes print, use
PetscSynchronizedPrintf followed by
PetscSynchronizedFlush.

105
PETSc

File I/O
The equivalent to a stream is a viewer, but a
viewer organizes data across multiple processors.

106
PETSc

File I/O
The equivalent to a stream is a viewer, but a
viewer organizes data across multiple processors.
A viewer combines an output location
(file/stdout/stderr), with a format.

107
PETSc

File I/O
The equivalent to a stream is a viewer, but a
viewer organizes data across multiple processors.
A viewer combines an output location
(file/stdout/stderr), with a format.
Most data types have a View routine such as
MatView(M,viewer)

108
PETSc