Title: cOMPunity: The OpenMP Community
1cOMPunity The OpenMP Community
- Barbara Chapman
- Compunity
- University of Houston
2Contents
- A brief cOMPunity history
- Board of Directors
- Finances
- Workshops
- Meantime, in the rest of the world
- Membership
- Our web presence
- Participation in ARB committee work
- OpenMP Futures
3cOMPunity
- Goals
- provide continuity for workshops
- Participate in work of ARB
- Promote API
- Provide information, primarily via website,
- To join ARB, it was necessary to found a
(non-profit) company - Based officially in US state of Delaware
- Non-profit company, founded end of 2001
4Board of Directors
- Must have 4 directors according to by-laws
- Current
- Barbara Chapman (CEO, Finances)
- Mark Bull (Secretary)
- Dieter an Mey (web services)
- Mitsuhisa Sato (Asia)
- Alistair Rendell (Australasia)
- Eduard Ayguade (OpenMP language)
- Mike Voss (workshops)
- Rudi Eigenmann (workshops)
We need to hold elections soon
5Finances
- Goal build up reserve to ensure workshops
- Expenses
- Annual fee to agent in State of Delaware
- Delaware franchise tax (only 25 at present)
- Web domain registration
- ARB has waived membership fee
- Income membership fees and surplus from
workshops - Current balance ca. 18,400
- Non-profit status recognized for federal tax
purposes
All cOMPunity work without pay.
6OpenMP Focus Workshops
- First workshop at Lund, Sweden , 1999
- Since 2000 organized annually
- EWOMP in Europe
- WOMPAT in North America
- WOMPEI in Asia
- Strong regional participation
- Aachen introduced OMPlab to format
7wallcraf_at_nlr.com
8Comments from First Workshop
- It is easy to get OpenMP code up and running. Can
do this incrementally. - It is also easy to combine OpenMP and MPI
- straightforward migration path for MPI code
- OpenMP is well suited to SPMD style coding
- It is not easy to optimize for cache, but it is
essential for good performance - Compilers should do the cache optimization
9OpenMP Language Comments
- Some workarounds are required when porting
- Fortran 90 constructs not truly supported
- array reductions not possible
- need threadprivate for variables
- I/O needs more work consideration
- Extensions proposed for I/O, synchronization
- Extensions may be required for new kinds of HPC
applications - Libraries are needed
10Performance Comments
- Scalable applications have been developed
- Some specific performance problems, e.g. I/O and
memory allocation - Significant differences in overheads for OpenMP
constructs on different platforms - On cc-NUMA systems, performance too dependent on
OS and system load - EPCC OpenMP Microbenchmark available at
http//www.epcc.ed.ac.uk/research/openmpbench/
11OpenMP Language Comments
- Major problem with OpenMP is range of
applicability - Needs significant extension for use on cc-NUMA
and distributed memory systems - Data and thread placement may need to be
coordinated by user explicitly - Some vendor-specific extensions, but no standards
12Summary First Meeting
- High level of satisfaction with development
experience, but understanding of cache
optimizations often limited - Mostly SPMD programming style adopted
- Many using OpenMP together with MPI
- Some language and performance problems identified
- Much discussion of cc-NUMA performance
- Confidence in market for OpenMP expressed
13Uptake of OpenMP
- Widely available
- Single source code
- Ease of programming
- Major industry codes ported to OpenMP
- Many hybrid codes in use
Many more users experts and novice parallel
programmers
14 ECHAM5 OpenMP vs. MPI
p690
Speedup
Pn
Courtesy of Oswald Hahn, MPI for Meteorology
15What has the ARB Been Doing?
- OpenMP 1.0 to 2.0 to 2.5
- Much clearer specs
- Some nasty parts
- Especially flush
- Memory model
- This is actually a problem elsewhere too (e.g.
Pthreads) - Tools work didnt produce interfaces for
performance tools or debuggers - Lots of ideas for new features waiting for
discussion
16Life is Short, Remember?
Its official OpenMP is easier to use than MPI!
17Life is Short, Remember?
Its official OpenMP is easier to use than
MPI UPC! (Not actually, tested on real subject)
18Workshops What has changed?
- Workshops now merged
- One international workshop (IWOMP)
- Will rotate location
- IWOMP 2006 in Europe
- IWOMP 2007 in Australia
- Need a steering committee
- Email suggestions for this to chapman_at_cs.uh.edu
How does this affect the date of the event?
19Workshops Format and Content
- What about the content?
- Should we be working harder to get new kinds of
users? If so, how? - Publishable papers?
- Contributions from other sources
- OMP Lab
- Tutorial
20Status Membership
- BOD wanted membership in cOMPunity to be
more-or-less free to academics - Solution was to make it part of workshop
registration - In other words participants at workshops are
members - In the past, de facto discount for attendance at
multiple workshops - Now there is only one annual workshop
- So we are (pretty much) the current members
- You can join individually too (fee is 50)
21Membership ctd
- What is benefit of being a member?
- Ability to participate in ARB deliberations
- This needs to be better organized
- Members-only discussion list
- New proposal membership for two years from
attended workshop up to the workshop in that year
(not matter what the date)
22Our Web Presence
- www.compunity.org
- Seems to be pretty useful
- Was managed at UH, input from BOD members
- Now managed by RWTH Aachen
- www.iwomp.org
23Participation in ARB
- Participation in ARB committees
- ARB, Tools, Futures and Language
- ARB Barbara Chapman
- Reports are produced by Matthijs van Waveren
(Fujitsu) - Tools various, including originators of POMP
interface - Language UPC Barcelona, but no regular
participation on OpenMP 2.5 committee
24Challenges and Opportunities
- Single processor optimization
- Multiple virtual processors on a single chip need
multi-threading - Applications outside scientific computing
- Compute intensive commercial and financial
applications need HPC technology. - Multiprocessor game platforms are coming.
- Clusters and distributed shared memory
- Clusters are the fastest growing HPC platform.
Can OpenMP play a greater role?
Does OpenMP have the right language features for
these?
25Completeness
- If we dont cover a broad enough range of
parallel applications, some one else will. - Explicit Threading, Distributed Programming?
- Is OpenMP able to meet the needs of asynchronous
or scalable computing? - Is there an inherent problem or is some work on
the language needed?
The Risk Fragmentation of Parallel Programming
APIs bad for HPC
26OpenMP3.0
- A list of proposed features prepared this week
- Not all of them have a concrete proposal
- Listed in following slides
- Order of listing does NOT imply anything with
regard to priority, overall importance or status
of proposal
27OpenMP3.0 Suggested Features
- Task Queues
- There is a proposal
- Semaphores
- There is a proposal
- Collapse clause to allow parallelization of
perfect loop nests - There is a proposal
28OpenMP3.0 Suggested Features
- Parallelization of wavefront loop nests
- There is a proposal
- Thread groups, named sections and precedence
relationships - There is a proposal
- Add internal control variable and environment
variable to control slave thread stack size - There is a proposal
29OpenMP3.0 Suggested Features
- Automatic data scope clause
- There is a proposal
- SCHEDULE clause for sections
- There is no proposal
- Error reporting mechanism
- There is no proposal
30OpenMP3.0 Suggested Features
- More kinds of schedules, including one where
enough can be assumed to make NOWAIT useful - There are several proposals
- Reductions with user defined functions (esp.
min/max reductions in C/C - There is no proposal
- Array reductions in C/C
- There is no proposal
31OpenMP3.0 Suggested Features
- Reduce clause/construct to force reduction inside
a parallel region - There is no proposal
- Insist on (instead of permitting) multiple copies
of internal control variables - There is a proposal
- Define interactions with standard thread APIs
- There is no proposal
32OpenMP3.0 Suggested Features
- INDIRECT clause to specify partially parallel
loops - There is a proposal
- Add library routines to support nested
parallelism (team ids, global thread ids, etc.) - There is no proposal
- If POMP-like profiling interface never happens,
some basic profiling mechanism - There is no proposal
33OpenMP3.0 Suggested Features
- Support for default(private) in C/C
- There is no proposal
- Additional clauses to make workshare more
flexible - There is no proposal
- Include F2003 in set of base languages
- There is no proposal
34OpenMP3.0 Suggested Features
- Non-flushing lock routines
- There is no proposal
- Support for atomic writes
- There is no proposal
35OpenMP3.0 Proposed Fixes
- Remove possibility of storage reuse for private
variables - Define more clearly where constructors/destructors
are called - Define clearly how threadprivate objects should
be initialized - Widen scope of persistence of threadprivate in
nested parallel regions
36OpenMP3.0 Proposed Fixes
- Allow unsigned integers as parallel loop
iteration variables in C/C - Fix C/C directive grammar
- Address reading of environment variables when
libraries are loaded multiple times
37Validating OpenMP 2.5 for Fortran and C/C
- Mathias Mueller
- HLRS
- High Performance Computing Center Stuttgart
- University of Houston
38Moving OpenMP Forward
- What else matters?
- Modularity? Libraries? . . . . .?
- Even more widely
- Some users have been asking for a variety of
hints and/or assertions to give more information
to the compiler - This is not really OpenMP specific
39Moving OpenMP Forward
- Tools committee
- Many users complain about relative lack of tools
- How can we help get better tools?
- Can we share infrastructure to get more open
source tools? - What kind of tool support is (most) important?
40cOMPunity Activities
- Participation in ARB Committees ARB,
Futures/Language, Tools - Requires commitment
- Workshops
- Web presence
- Other?
- Need to participate in 3.0 effort
41Outlook
Lets round up those cycles!!!
42Elections
- Current officers are willing to serve
- Must have at least four
- Roles
- Chair
- Secretary
- Finances
- Outreach
- Workshops
- Regional events
43OpenMP ARB Current Organization
44OpenMP 3.0 Pointer chasing loops
- Can OpenMP today handle pointer chasing loops?
nodeptr list, p for (plist p!NULL
pp-gtnext) process(p-gtdata)
45OpenMP 3.0 Pointer chasing loops
- A better way has been proposed WorkQueuing
- pragma omp parallel taskq private(p)
- for (plist p!NULL pp-gtnext)
- pragma omp task
- process(p-gtdata)
- Key concept separate work iteration from work
generation, which is combined in omp for - Syntactic variations have been proposed by SUN
and the Nanos threads group - This method is very flexible
Reference Shah, Haab, Petersen and Throop,
EWOMP1999 paper.
46Parallelization of Loop Nests
Do I 1, 33 DO J 1, 33 body of loop
END DO END DO
- do i 1,33
- do j 1,33
- ....loop body
- end do
- end do
- With 32 threads, how can we get good load balance
without manually collapsing the loops? - Can we handle non-rectangular and/or imperfect
nests?
47Portability of OpenMP
- Thread stacksize
- Different vendor defaults
- Different ways to request a given size
- Need to standardize this
- Behavior of code between parallel regions
- Do threads sleep? Busy-wait? Can the user control
this? - Again, need to standardize options
48OpenMP EnhancementsOpenMP must be more modular
- Define how OpenMP interfaces to other stuff
- How can an OpenMP program work with components
implemented with OpenMP? - How can OpenMP work with other thread
environments? - Support library writers
- OpenMP needs an analog to MPIs contexts.
We dont have any solid proposals on the table to
deal with these problems.
49Automatic Data Scoping
- Create a standard way to ask the compiler to
figure out data scooping. - When in doubt, the compiler serializes the
construct
int j double x, resultCOUNT pragma omp
parallel for default(automatic) for (j0
jltCOUNT j) x bigCalc(j) resj
hugeCalc(x)
Ask the compiler to figure out that x should be
private.
50Execution with Reduced Synchronization
!OMP PARALLEL !OMP DO do i1,imt
RHOKX(imt,i) 0.0 enddo !OMP
ENDDO !OMP DO do i1, imt do j1,
jmt if (k .le. KMU(j,i)) then
RHOKX(j,i) DXUR(j,i)p5RHOKX(j,i)
endif enddo enddo !OMP
ENDDO !OMP DO do i1, imt do j1,
jmt if (k gt KMU(j,i)) then
RHOKX(j,i) 0.0 endif
enddo enddo !OMP ENDDO if (k 1)
then !OMP DO do i1, imt do j1,
jmt RHOKMX(j,i) RHOKX(j,i)
enddo enddo !OMP ENDDO !OMP DO do
i1, imt do j1, jmt
SUMX(j,i) 0.0 enddo enddo !OMP
ENDDO endif !OMP SINGLE factor
dzw(kth-1)gravp5 !OMP END SINGLE !OMP DO
do i1, imt do j1, jmt
SUMX(j,i) SUMX(j,i) factor
(RHOKX(j,i) RHOKMX(j,i))
enddo enddo !OMP ENDDO !OMP END PARALLEL
Part of computation of gradient of hydrostatic
pressure in POP code
51Producer/Consumer Example
- Correct version according to 2.5
Consumer do !omp flush(flag) while
(flag.eq.0) !omp flush(data) ... data
Producer data ... !omp flush(data,flag) flag
1 !omp flush(flag)
52Workshops
- Since 2000 organized annually
- EWOMP in Europe
- WOMPAT in North America
- WOMPEI in Asia
- Strong regional participation
- Aachen introduced OMPlab to format
- These have been a niche event
- Most OpenMP users are satisfied (or at least not
thinking about how it could evolve) - OpenMP is supposed to be easy, right?
53Whats in a Flush?
- Flush writes data to and reads from memory
- It doesnt synchronize threads
- According to the new rules
- Compiler is free to reorder flush directives if
they are on different variables - Two flushes on same variables must be seen by all
threads in the same order