Title: Account Setup and MPI Introduction
 1Account Setup and MPI Introduction
- Parallel Computing 
-  Bioinformatics Lab
Sylvain Pitre (spitre_at_scs.carleton.ca) Web 
http//cgmlab.carleton.ca 
 2Overview
- CGM Cluster specs 
- Account Creation 
- Logging in Remotely (Putty, X-Win32) 
- Account Setup for MPI 
- Checking Cluster Load 
- Listing Your Jobs 
- MPI Introduction and Basics
3CGM Lab Cluster 
 4CGM Lab Cluster (2)
- 8 dual-core workstations (total of 16 processors) 
- Named cgm01, cgm02cgm08. 
- Intel Core 2 Duo 1.6GHz, 4GB DDR2 RAM, 320GB 
 disks
- Server (cgm01) has an extra terabyte (1TB) disk 
 space.
- Connected through a dedicated gigabit switch. 
- Running Fedora 8 (64-bit). 
- OpenMPI (http//www.open-mpi.org/) 
- cgmXX.carleton.ca (SSH, where XX01 to 08) 
- Putty (terminal) http//www.putty.nl/download.htm
 l
- WinSCP (file transfer) http//winscp.net/eng/inde
 x.php
- XWin-32 (http//www.starnet.com/)
5CGM Lab Cluster (3)
- Accounts are handled by LDAP (Lightweight 
 Directory Access Protocol) on the server.
- User files are stored on the server and accessed 
 by every workstation using NFS (Network File
 System).
- Same login and password will work on any 
 workstation.
6CGM Lab Cluster (4)
cgm01
cgm02
cgm03
cgm04
NFS Server LDAP Server
Carleton Network
cgm05
cgm06
cgm07
cgm08 
 7Account Creation
- To get an account send an email to Sylvain Pitre 
 (spitre_at_scs.carleton.ca)
- Include in your email 
- your full name 
- your email address (if different from the one 
 used to send the email).
- your supervisor name (or course professor). 
- your preferred login name (8 characters max)
8Logging In Remotely
- You can login remotely to the cluster by SSH 
 (Secure Shell).
- Users familiar to unix/linux should already know 
 how to do this.
- Windows users can use Putty, a lightweight SSH 
 client (see link on slide 4)
- Windows users can also log in by X-Win32 
- DNS names cgmXX.carleton.ca (XX01 to 08) 
- Log in any node except cgm01 (server)
9Logging in with Putty
- Under Host Name, enter the cgm machine you want 
 to log into (cgm03 in this case) then click Open.
- A terminal will open and ask you for your 
 username then password.
- Thats it! You are logged into one of the cgm 
 nodes.
10Login with X-Win32
- You can also log in to the nodes using X-Win32 
- Open the X-Win32 Configuration program (X-Config) 
- Under the Sessions Tab, click on Wizard. 
- Enter a name for the session (ex cgm03) and 
 under Type click on ssh then click Next.
- As host enter the name of the node you wish to 
 connect to (ex cgm03.carleton.ca) then click
 Next.
- Enter your login name and password and Click 
 Next.
- For Command, click on Linux then click Finish. 
- The new session is now added to your Sessions 
 Window.
11Login with X-Win32 (2)
- Click on the newly created session then click on 
 Launch.
- After launching the session, you might get asked 
 to accept a key (click on yes).
- You should now be at a terminal. 
- You can work in this terminal if you wish (like 
 in Putty) but if you wish to have the visual
 interface type
- gnome-session  
- After a few seconds the visual interface will 
 start up.
- Now you have access to all the menus and windows 
 of the Fedora 8 interface (using Gnome).
12Login with X-Win32 (3)
  13Account Setup
- First time login 
- Once you have your account (send me an email to 
 get one) and login, change your password with the
 passwd command.
- If you are unfamiliar with unix/linux 
- I strongly recommend reading some tutorials and 
 playing around with commands (but be careful!).
- I assume you have some basic unix/linux knowledge 
 in the rest of the slides.
14Password-less SSH
- In order to run MPI on different nodes 
 transparently, we need to setup SSH to it doesnt
 constantly ask us for a password. Type
-  ssh-keygen -t rsa     
-  cd .ssh 
-  cp id_rsa.pub authorized_keys2 
-  chmod go-rwx authorized_keys2 
-  ssh-agent SHELL 
-  ssh-add 
-  cd .. 
15Password-less SSH (2)
- Now after your initial login you should be able 
 to SSH into any other cgmXX machine without a
 password. SSH to every workstation in order to
 add that node to your known_hosts. Type
-  ssh cgm01 date (answer yes when asked) 
-  ssh cgm02 date 
-   
-  ssh cgm08 date 
-  
16Ready for MPI!
- After completing the steps above your account is 
 now ready to run MPI jobs.
- Running big jobs on multiple processors 
- Since there is no job scheduler jobs are launched 
 manually so please be considerate. Use nodes that
 are not in use or that have less load (Ill show
 you how to check).
- If you need all the nodes for a longer period of 
 time well try to reserve them for you.
17Network Vs. Local Files
- If you need to do a lot of disk I/O, it is 
 preferable to use the local disks /tmp
 directory.
- Since your account is mounted by NFS, all files 
 written to your home directory are sent to the
 server (network bottleneck).
- To reduce network transfers, place your large 
 input/output files in /tmp on your local node.
- Make the filename unique.
18Checking Cluster Load
- To check the load on each workstation type the 
 command load
19Listing Your Jobs
- To check all of your jobs (processes) across the 
 cluster type listjobs
20MPI Introduction
- Message Passing Interface (MPI) 
- Portable message-passing standard that 
 facilitates the development of parallel
 applications and libraries.
- For parallel computers, clusters 
- Not a language in its own. It is used as a 
 package with another language, like C or Fortran.
- Different implementations OpenMPI, LAM/MPI, 
 MPICH
- Portable  not limited to a specific architecture.
21MPI Basics
- Every node (process) executes the same code. 
- Nodes can follow different paths (Master/slave 
 model) but dont abuse!
- Communication is done by message passing. 
- Every node has a unique rank (ID) from 0 to p. 
- The total number of nodes is known to every node. 
- Synchronous or asynchronous messages. 
- Thread safe. 
22Compiling/Running MPI Programs
- Compiler mpicc 
- Command line 
- mpirun n ltpgt --hostfile lthostfilegt ltproggt 
 ltparamsgt
-  Where ltpgt is the number of processes you want to 
 use. Can be greater than the number of processors
 available (used for overloading or simulation).
23Hostfile
- For running a job on more than one node, a 
 hostfile must be used.
- Whats in a hostfile 
- Node name or IP. 
- How many processors on each node (1 by default). 
- Example 
-  cgm01 slots2 
-  cgm02 slots2 
-   
24MPI Startup/Finalize
- include "mpi.h" 
- int main(int argc, char argv)  
-  int rank, wsize 
-  MPI_Init (argc, argv) 
-  MPI_Comm_rank(MPI_COMM_WORLD, rank) 
-  MPI_Comm_size(MPI_COMM_WORLD, wsize) 
-  / CODE / 
-  MPI_Finalize() 
-  return 0 
25MPI Types
MPI C Type C Type
MPI_CHAR char
MPI_SHORT signed short int
MPI_INT signed int
MPI_LONG signed long int
MPI_UNSIGNED_CHAR unsigned char
MPI_UNSIGNED_SHORT unsigned short int 
MPI_UNSIGNED unsigned int
MPI_UNSIGNED_LONG unsigned long int
MPI_FLOAT float
MPI_DOUBLE double
MPI_LONG_DOUBLE long double
MPI_BYTE -
MPI_PACKED - 
 26MPI Functions
- Send/receive 
- Broadcast 
- All to all 
- Gather/Scatter 
- Reduce 
- Barrier 
- Other
27MPI Send/Receive (synch) 
 28MPI Send/Receive (synch)
- Communication between nodes (processors). 
- Blocking 
- int MPI_Send(void buf, int count, MPI_Datatype 
 datatype, int dest, int tag, MPI_Comm comm)
- int MPI_Recv(void buf, int count, MPI_Datatype 
 datatype, int source, int tag, MPI_Comm comm,
 MPI_Status status)
- buf send buffer address 
- count number of entries in buffer 
- datatype data type of entries 
- dest destination process rank 
- tag message tag 
- comm communicator 
- status status after operation (returned)
29MPI Send/Receive (asynch)
- A buffer can be used with asynchronous messages. 
- Problems occur when the buffer becomes empty or 
 full.
30MPI Send/Receive (asynch)
- Non-blocking (not guaranteed to be received) 
- int MPI_Isend(void buf, int count, MPI_Datatype 
 datatype, int dest, int tag, MPI_Comm comm)
- int MPI_Irecv(void buf, int count, MPI_Datatype 
 datatype, int source, int tag, MPI_Comm comm)
- Parameters are the same as MPI_Send() and 
 MPI_Recv()
31MPI Broadcast
- One to all (including itself).
32MPI Broadcast (syntax)
int MPI_Bcast(void buf, int count, MPI_Datatype 
datatype, int root, MPI_Comm comm) buf send 
buffer address count number of entries in 
buffer datatype data type of entries root rank 
of root 
 33MPI All to All
- Flood a message from every process to every 
 process.
- MPI_AlltoAll(void sendbuf, int sendcount, 
 MPI_Datatype sendtype, void recvbuf, int
 recvcount, MPI_Datatype datatype, MPI_Comm comm)
- sendbuf send buffer address 
- sendcount number of send buffer elements 
- sendtype data type of send elements 
- recvbuf receive buffer address (loaded) 
- recvcount number of elements each receive 
- recvtype data type of receiving process 
- comm communicator
34MPI All to All 
 35MPI All to All (alternative)
- MPI_AlltoAllv() 
- Sends data to all processes, with displacement. 
- MPI_Alltoallv ( void sendbuf, int sendcounts, 
 int sdispls, MPI_Datatype sendtype, void
 recvbuf, int recvcnts, int rdispls,
 MPI_Datatype recvtype, MPI_Comm comm )
36MPI Gather (Description)
- MPI_Gather() 
- Each process in comm sends the contents of send 
 buf to the process with rank root. The process
 root concatenates the received data in process
 rank order in recvbuf That is the data from
 process is followed by the data from process
 which is followed by the data from process, etc.
 The recv arguments are signicant only on the
 process with rank root. The argument recv count
 indicates the number of items received from each
 process not the total number received
37MPI Scatter (Description)
- MPI_Scatter() 
- The process with rank root distributes the 
 contents of sendbuf among the processes. The
 contents of sendbuf are split into p segments
 each consisting of sendcount items The first
 segment goes to process 0, the second to process
 1, etc. The send arguments are significant only
 on process root.
38MPI Gather/Scatter
Gather
Scatter 
 39MPI Gather/Scatter (syntax)
- int MPI_Gather(void sendbuf, int sendcount, 
 MPI_Datatype sendtype, void recvbuf, int
 recvcount, MPI_Datatype recvtype, int root,
 MPI_Comm comm)
- int MPI_Scatter(void sendbuf, int sendcount, 
 MPI_Datatype sendtype, void recvbuf, int
 recvcount, MPI_Datatype recvtype, int root,
 MPI_Comm comm)
- sendbuf send buffer address 
- sendcount number of send buffer elements 
- sendtype data type of send elements 
- recvbuf receive buffer address (loaded) 
- recvcount number of elements each receive 
- recvtype data type of receiving process 
- root rank of sending (scatter) or receiving 
 (gather) process
- comm communicator
40MPI Gatherv/Scatterv
- Similar functions than gather/scatter, but allows 
 for varying amounts of data to be sent instead of
 a fixed amount.
- For example, varying parts of an array can be 
 scattered/gathered in one step.
- See Parallel Image Processing example to see how 
 they can be used.
41MPI Gatherv/Scatterv (Syntax)
- int MPI_Scatterv(void sendbuf,int 
 sendcounts,int displs,MPI_Datatype
 sendtype,void recvbuf,int recvcount,MPI_Datatype
 recvtype,int root,MPI_Comm comm)
- int MPI_Gatherv(void sendbuf,int 
 sendcount,MPI_Datatype sendtype,void recvbuf,int
 recvcounts,int displs,MPI_Datatype recvtype,int
 root,MPI_Comm comm)
- sendcounts number of send buffer elements for 
 each processes
- recvcounts number of elements each receive from 
 each processes
- displs displacement for each processor 
- Other parameters are the same as gather/scatter.
42MPI Reduce
- Gather results and reduce them to one value using 
 an operation (Max, Min, Sum, Product).
43MPI Reduce (syntax)
- int MPI_Reduce(void sendbuf, void recvbuf, int 
 count, MPI_Datatype datatype, MPI_Op op, int
 root, MPI_Comm comm)
- sendbuf send buffer address 
- recvbuf receive buffer address 
- count number of send buffer elements 
- datatype data type of send elements 
- op reduce operation 
-  - MPI_MAX Maximum 
-  - MPI_MIN Minimum 
-  - MPI_SUM Sum 
-  - MPI_PROD Product 
- root root process rank for result 
- comm communicator
44MPI Barrier
- Blocks until all processes have called it. 
- int MPI_Barrier(MPI_Comm comm) 
- comm communicator 
45Other MPI Routines
- MPI_Allgather() Gather values and distribute to 
 all.
- MPI_Allgatherv() Gather values into specified 
 locations and distribute to all.
- MPI_Reduce_scatter() Combine values and scatter 
 results.
- MPI_Wait() Waits for a MPI send/receive to 
 complete then returns.
46Parallel Problem Examples
- Embarrassingly Parallel 
- Simple Image Processing (Brightness, Negative) 
- Pipelined computations 
- Sorting 
- Synchronous computations 
- Heat Distribution Problem 
- Cellular Automata 
- Divide and Conquer 
- N-Body Problem
47MPI Hello World!
- include "mpi.h" 
- int main(int argc, char argv)  
-  int rank, wsize 
-  MPI_Status status 
-  MPI_Init (argc, argv) 
-  MPI_Comm_rank(MPI_COMM_WORLD, rank) 
-  MPI_Comm_size(MPI_COMM_WORLD, wsize) 
-  printf("Hello World!, I am processor 
 d.\n",rank)
-  
-  MPI_Finalize() 
-  return 0 
48Parallel Image processing
- Input Image of size MxN. 
- Output Negative of the image. 
- Each processor should have an equal share of the 
 work, roughly (MxN)/P.
- Master/slave model 
- The master will read in the image and distribute 
 the pixels to the slave nodes. Once done the
 slaves will return the results to the master who
 will output the negative image.
49Parallel Image processing (2)
- Workload 
- If we have 32 pixels to process, and 4 CPUs, each 
 CPU will process 8 pixels.
- For P0, the work will start at pixel 0 
 (displacement) and process 8 pixels (count).
50Parallel Image processing (3)
- Find the displacement/count for each processor. 
- Master processor scatters the image 
- Execute the negative operation 
- Gather the results on the master processor. 
- Displacement (displs) tells you where to start, 
 count (counts) tells you how many to do.
- MPI_Scatterv (image, counts, displs, MPI_CHAR, 
 image, counts myId, MPI_CHAR, 0,
 MPI_COMM_WORLD)
- MPI_Gatherv (image, counts myId, MPI_CHAR, 
 image, counts, displs, MPI_CHAR, 0,
 MPI_COMM_WORLD)
51MPI Timing
- Calculate the wall clock time of some code. Can 
 be executed by master to find out total runtime.
- double start, total 
- start  MPI_Wtime() 
- //Do some work! 
- total  MPI_Wtime() - start 
- printf( Total Runtime f \n", total)
52Compiling  Running Your First MPI Program
- Download the MPI_hello.tar.gz example from the 
 cgmlab.carleton.ca website. In the terminal type
- wget http//cgmlab.carleton.ca/files/MPI_hello.tar
 .gz
- Uncompress the files by typing 
-  tar zxvf MPI_hello.tar.gz 
- Compile the program by typing 
-  make 
- Run the program on all 16 cores by typing 
-  mpirun np 16 --hostfile hostfile ./hello
53What To Do Next?
- There is also a prefix sums example on the 
 cgmlab.carleton.ca website.
- Try other examples you find on the web. 
- Find MPI tutorials online or in books. 
- Write your own MPI programs. 
- Have fun )
54References
- Parallel Programing Techniques and Applications 
 Using Networked Workstations and Parallel
 Computers, Barry Wilkinson and Michael Allen,
 Prentice Hall, 1999.
- MPI Information/Tutorials 
- http//www-unix.mcs.anl.gov/mpi/learning.html 
- A draft of a Tutorial/User's Guide for MPI by 
 Peter Pacheco.
- ftp//math.usfca.edu/pub/MPI/mpi.guide.ps 
- OpenMPI (http//www.open-mpi.org/)