Introduction to OpenMP Programming - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to OpenMP Programming

Description:

... 0 3 2 0 1 0 3 0 1 3 Run cpuinfo command bash: $ cpuinfo Architecture : x86_64 Hyperthreading: disabled Packages : 2 Cores : 4 Processors : 4 ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 21
Provided by: Be46
Category:

less

Transcript and Presenter's Notes

Title: Introduction to OpenMP Programming


1
Introduction to Symmetric Multiprocessors
Süha TUNA
Bilisim Enstitüsü
UHeM Yaz Çalistayi - 21.06.2012
2
Outline
  • Shared Memory Architecture
  • SMP Architectures (NUMA, ccNUMA)
  • Cache Cache Coherency Protocols
  • Snoopy
  • Directory Based
  • What is Thread?
  • What is Process?
  • Thread vs. Process
  • OpenMP vs MPI

3
(No Transcript)
4
(No Transcript)
5
  • Login to your UYBHM node using ssh
  • Run cpuinfo command

bash ssh du??_at_wsl-node??.uybhm.itu.edu.tr
bash cpuinfo Architecture
x86_64 Hyperthreading disabled Packages
2 Cores 4 Processors 4 Processor
identification Processor Thread Core Packag
e 0 0 0 0 1 0 0 3
2 0 1 0 3 0 1 3
6
  • Run cpuinfo command

bash cpuinfo Architecture
x86_64 Hyperthreading disabled Packages
2 Cores 4 Processors 4 Processor
identification Processor Thread Core Packag
e 0 0 0 0 1 0 0 3
2 0 1 0 3 0 1 3
Processor placement Package Cores
Processors 0 0,1 0,2 3 0,1 1,3
Cache sharing Cache Size Processors L1 32
KB no sharing L2 4 MB (0,2)(1,3)
7
(No Transcript)
8
Shared Memory Architecture
  • NUMA Architecture Types
  • ccNUMA means cache coherent NUMA architecture.
  • Cache coherence is integrity of data stored in
    local caches of a shared resource.

9
Shared Memory Architecture
  • Coherence defines the behavior of reads and
    writes to the same memory location.
  • If each processor has a cache that reflects the
    state of various parts of memory, it is possible
    that two or more caches may have copies of the
    same line.
  • If two threads make appropriately serialized
    changes to those data items, the result could be
    that both caches end up with different, incorrect
    versions of the line of memory.
  • The system's state is no longer coherent !!!

10
(No Transcript)
11
Shared Memory Architecture
  • Solution Cache Coherence Protocols !!!
  • Protocols takes two kind of action when a cache
    line (L) is written
  • Invalidate all copies of L from the other cache
    of the machine
  • They may update those lines with the new value
    being written
  • Most modern cache coherent multiprocessors use
    invalidation technique rather than update
    technique since it is easier to implement in
    hardware

12
Main Definitions
  • Process
  • It is the "heaviest" unit of kernel scheduling.
  • It is unit of allocation
  • Processes execute independently. Interact with
    each other via interprocess communication
    mechanisms
  • Processes have own resources allocated by the
    operating system. Resources include memory
    (address space) and state information
  • Own register set (temporary memory cell)

13
Main Definitions
  • Thread
  • It is the "lightest" unit of kernel scheduling.
  • It is unit of execution
  • At least one thread exists within each process.
    If multiple threads can exist within a process,
    then they share the same memory and file
    resources.
  • Share address space, register set, process stack
  • Threads do not own resources

An execution entity having a serial flow of
control, a set of private variables, and access
to shared variables. OpenMP Review Board
14
(No Transcript)
15
OpenMP vs. MPI
Pros of OpenMP considered by some to be easier
to program and debug (compared to MPI) data
layout and decomposition is handled automatically
by directives. allows incremental parallelism
directives can be added incrementally, so the
program can be parallelized one portion after
another and thus no dramatic change to code is
needed. unified code for both serial and
parallel applications OpenMP constructs are
treated as comments when sequential compilers are
used. original (serial) code statements need
not, in general, be modified when parallelized
with OpenMP. This reduces the chance of
inadvertently introducing bugs and helps
maintenance as well. both coarse-grained and
fine-grained parallelism are possible
16
OpenMP vs. MPI
Cons of OpenMP currently only runs efficiently
in shared-memory multiprocessor platforms
requires a compiler that supports OpenMP.
scalability is limited by memory architecture.
reliable error handling is missing. lacks
fine-grained mechanisms to control
thread-processor mapping. synchronization
between subsets of threads is not allowed.
mostly used for loop parallelization can be
difficult to debug, due to implicit communication
between threads via shared variables.
17
OpenMP vs. MPI
Pros of MPI does not require shared memory
architectures which are more expensive than
distributed memory architectures can be used on
a wider range of problems since it exploits both
task parallelism and data parallelism can run
on both shared memory and distributed memory
architectures highly portable with specific
optimization for the implementation on most
hardware
18
OpenMP vs. MPI
Cons of MPI requires more programming changes
to go from serial to parallel version can be
harder to debug
19
OpenMP vs. MPI
Different MPI and OpenMP applications for matrix
multiplication
20
MPI vs OpenMP Programing
Message-Passing Parallelism
Shared-Memory Parallelism
Write a Comment
User Comments (0)
About PowerShow.com