Multi Processing - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Multi Processing

Description:

Multi Processing – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 39
Provided by: machinec
Category:

less

Transcript and Presenter's Notes

Title: Multi Processing


1
Multi Processing Matrix Math Linear
Algebra Or Why Work Twice as Hard to Write Ten
Times as Much Code That No One Can Even
Follow by Brad Morantz PhD
2
Goals
  • Solve the problem clearly and efficiently
  • Focus is on the problem
  • Generate code that
  • Runs fast
  • Parallel processing code
  • Is easy to understand
  • Written in math
  • Understandable by mathematicians and SME (subject
    matter experts)?
  • Will be able to understand it in six months
  • Is standard and portable
  • Taken anywhere
  • Run on any platform
  • That is close to the mathematical formulation
  • Code that is clear and concise

3
Typical Serial Processing
It can take a while until everyone has had a drink
4
Parallel Processing
Many drinking fountains
This will process three times as fast
Queuing would be even faster If the processes
were of varying time
The compiler does this for you automatically
5
Example
  • I want to declare some matrices, fill them
  • with values calculated in long equations, and
    then multiply them.
  • And then test this program on various machines
    and with varying number of processors.
  • I did this and got interesting results (see
  • pg 7)?

6
The code
  • program tryitout
  • implicit none
  • ! written by brad M to see if this works
  • real16, dimension(,), allocatable A, B, C
    ! 16 byte floating point matrix, 2D
  • integer4 row, i, j
  • real4 starttime, donetime
  • print, 'What size array? Start with square
    array'
  • read, row
  • allocate(A(row,row), B(row,row), C(row,row))?
  • call cpu_time(starttime)?
  • blockit forall (i 1row1, j 1row1)?
  • A(i, j) ((real(i)) 2) sin(real(i)/real(j))?
  • B(i, j) (real(i))/(real(j)) cos (real(i/j))?
  • end forall blockit

7
Results
  • For 1000 x 1000 matrix
  • On my old Athlon 2000 (Linux)?
  • 142 seconds
  • My dual Athlon 1800 (Linux)?
  • One processor 152 seconds
  • Two processors 76.5 seconds
  • My wifes Athlon 3000-64 (Linux)?
  • 5 seconds
  • Old Job on Dual Quad Core Xeon (Linux)?
  • One processor 2.8 seconds
  • Eight processors 0.24 seconds
  • My desk machine here (Core duo Windows)?
  • 126.25 seconds

8
The Solution
  • Design and develop parallel algorithm
  • Test out algorithm
  • Make flowchart
  • Develop the algorithm in Matlab or Mathematica
  • Then put it into Fortran 95/2003
  • VV
  • Mathematician to validate algorithm
  • Test matrix and verify output

9
Think Parallel(the hardest part)?
  • Programmer must think parallel
  • Think matrices not scalars
  • Think parallel not serial
  • Think wide not narrow
  • Algorithm must be parallel
  • Think in parallel mode
  • Will this go into a matrix
  • How can I put this into a vector
  • How can this be made more parallel
  • How can this be done in parallel mode

10
Please . . .
  • Do not confuse this with the old Fortran
  • Keep an open mind
  • Realize that this is for mathematics
  • Think in matrices
  • Think about parallel operation
  • Think What if the J-3 committee knew Matlab when
    they wrote the spec for F95
  • My focus
  • solve a math problem
  • get it done quickly
  • with a highly portable code

11
Where is parallelization?
  • The old way (uses 1 cpu)?
  • Do 100 J1, 1000, 1
  • X(J) J2
  • 100 continue
  • New way (uses all CPUs)?
  • Forall (j 110001) X(j) j2
  • What is really happening? (Compiler_at_work)?
  • The compiler first checks for dependence
  • Then it divides the 1000 by the of CPUs, e.g.4
  • Then it puts the code across the processors

12
Suppose . .
  • You had a matrix of numbers (a whole bunch of
    numbers) and wanted a slice of it
  • And there were some missing numbers (signified by
    the value 9999.0)?
  • And you wanted to get the mean of columns in this
    slice, creating a new matrix of 1 less dimension
  • And you wanted to utilize all of your processors.
  • And you want the code to be maintainable, easy to
    understand, portable

13
Matrix
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
slice of the matrix
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
It could be more than the 2 dimensions shown here
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1
2 3 4 5 6 7 8 9 0 8 7 6 5 4 3 2
14
In C or C
  • How many LOCs (lines of code)?
  • How long would it take?
  • How much other stuff would you need?
  • How long would it take to have working code?
  • How easy would it be to maintain the code?
  • Could people reading the code figure it out
    easily?
  • Would it be standard?
  • How portable?
  • Would the code run in 5 or 10 years?

15
In Fortran 95
  • mean(row,) sum(inarray(firstlast,), dim1,
    maskinarray(firstlast,) .NE. 9999.0)
  • ! note arrays must be conformable
  • adjust() count(inarray(firstlast,) .EQ.
    9999.0, dim1)
  • mean(row, ) mean(row,)/(last- first 1
    adjust())
  • ! this is code out of a program
  • This does it all in these 3 lines.
  • And it uses all of the processors.
  • It is even doing it on slices out of a big
    matrix. It creates a matrix of means.
  • Actual working code out of one of my programs
  • Could have been in 1 line, but would have been
    hard to read

16
Compile Line
  • g95 mean.f90 -o means -fbounds-check
  • the last option will look for any out of bounds
    conditions in any array and report what line of
    code, which array, which dimension, limit, and
    attained value.
  • Multiprocessing
  • ifort mean.f90 -o means parallel
  • In Windows use /Qparallel
  • It can be set that it only parallelizes loops
    more than some number N
  • it is this simple
  • P.S G95 is promising to go to multiprocessing,
    and it is free.
  • ifort is from Intel and is available on Raytheon
    STARS

17
Parallel Processing Matrix Functions
  • ALL are all values in mask true?
  • ANY are any values in mask true?
  • COUNT dimensional reduction, mask ok
  • MAXVAL reduction or scalar, mask OK
  • MINVAL reduction or scalar, mask OK
  • PRODUCT of element of array along DIM
  • SUM reduction or scalar, mask OK
  • MATMUL - matrix multiply
  • MERGE- merge 2 array with mask
  • SPREAD (Source, Dim, Ncopies)?

18
More
  • PACK(Array, Mask ,Vector)?
  • UNPACK(Vector, Mask, Field)?
  • TRANSPOSE
  • CSHIFT -circular shift
  • EOSHIFT end off shift
  • MAXLOC first location of max value
  • MINLOC first location of min value
  • FORALL parallel looping
  • WHERE masking function
  • ELSEWHERE masking function

19
More Array Functions
  • LBOUND - inquiry
  • UBOUND - inquiry
  • SIZE - inquiry
  • SHAPE - inquiry
  • RESHAPE - transformational
  • DOT_PRODUCT- dot product multiplication
  • ALLOCATE assign storage
  • ALLOCATED- inquiry

20
Elemental Matrix Math
  • A, B, C are matrices, same size
  • C AB multiplication
  • C AB addition
  • C A/B division
  • C A-B subtraction
  • C 0.0 initialization
  • Automatic multiprocessing
  • If enabled
  • Depending on compiler and machine

21
Independence
  • Compiler checks for independence
  • One cell depends on value of another
  • e.g A(x, y) A(x1, y)?
  • Order of operation must be independent
  • If compiler finds dependence in values or timing,
    it will not parallelize

22
Matrix Multiplication
  • A,B,C are matrices
  • C matmul(A,B)?
  • Arrays must be of same type
  • integer, real, complex, or logical
  • n,m x m,k n,k
  • m x m,k k
  • Automatic multiprocessing
  • depending on compiler and machine
  • if enabled

23
Addressing Matrices
  • Address a row
  • A(row,)?
  • Address a column
  • A(, col)?
  • Address a slice
  • A(df, h12)?
  • Address a matrix
  • A
  • print, A
  • Does this look like Matlab?

24
Assignments
  • A, B same size matrices
  • A B
  • A(fg, 3) B(fg, 5)?
  • c A(x,y) B(p,q)?
  • D A(13, 24) B(2325, 1618)?
  • Automatic multiprocessing
  • If enabled
  • Target must be same shape and size as source
  • Or source can be a scalar

25
MAXVAL
  • MAXVAL(ARRAY, DIM, MASK)?
  • creates a new array or assigns
  • 1 less dimension
  • max value along dimension DIM
  • corresponding to TRUE elements of MASK
  • Ex maxivals MAXVAL(A, DIM2, MASKA.GE.0.0)
  • MINVAL is the same, but gets the minimum

26
MINLOC
  • MINLOC(ARRAY, DIM, MASK)?
  • Creates or assigns array of 1 less dimension
  • Finds location of first element of ARRAY having
    minimum value of elements as defined by MASK
  • DIM is optional
  • Ex MINLOC(A, DIM1, A.GT.12)?

27
FORALL
  • Like the do loop, but uses all available CPUs
  • FORALL(I1N, J1M) A(I,J) IJ
  • Loop1 FORALL (I1N0.50)?
  • WHERE (A(2I).GE.43), A(INT(I)) I2
  • end loop1
  • (Naming loops is optional, but when used,
    compiler will match names)?
  • Part of HPF specification (Rice/MIT)?
  • automatic multiprocessing

28
WHERE
  • Masked array assignment
  • hot WHERE(A gt 27.32)?
  • temp temp degree
  • ELSEWHERE
  • cold cold degree
  • END WHERE hot
  • WHERE (A lt 0.0) A -A
  • Single line, creates non-negative matrix
  • A could be N dimension matrix
  • Can use multiprocessors
  • No need for indices

29
Impact
  • Intel has 8 core processors
  • Expect available in 2008
  • Announced 80 core processor
  • AMD will be try to match or exceed
  • This is a preview of what is coming
  • Neither makes a single core CPU any more
  • New AMD has 4 cores on 1 die and has native 128
    bit FP processor ( I just built a machine using
    this new Phenom chip)?

30
Overview Fortran 95/2003
  • Automatic Parallelization
  • Matrix functions
  • Object oriented
  • Complex math built in
  • 16 byte floating point (32 byte complex)?
  • Bit and string functions
  • Structures and arrays
  • ISO standard
  • Backward compatibility
  • Dynamic allocation
  • From matlab code to Fortran in minutes

31
Other Mathematical Features
  • 128 bit (16 byte) variables/ 32 byte complex
  • Compiler takes care of the work
  • Complex math (on all complex numbers)?
  • Arrays of structures
  • Structures of arrays (including allocatable)?
  • RECURSIVE
  • Operator function overloading
  • Modules
  • PURE
  • Floating point exception handling
  • IEEE floating point math

32
Resources
  • www.g95.org
  • free compiler, no multi-processing
  • promises multiprocessing in future
  • lots of links
  • Intel (lots of info on optimization)?
  • www.intel.com/software/products/compilers
  • Free compiler for home use only
  • With automatic multi-processing (APO)?
  • Available on STARS
  • www.fortran.com (here in Tucson)?
  • Amazon.com
  • Half.com

33
Intel Notes
  • It is Sometimes easier for the Fortan compiler to
    optimize matrix code than for the C compiler
    because the C standard permits more hidden
    aliasing of pointers than does Fortran
  • I have the Intel Notes on this

34
Summary
  • Fortran 95/2003
  • is for computationally intensive applications
  • Provides fast processing
  • Inherent parallelism
  • Compiles to assembly code
  • Runs efficiently
  • Converts quickly and easily from Matlab etc.
  • Deliverable tool
  • Multiprocessing with little effort

35
More Sample Code
  • mean sum(array,dim 2)/cols
  • ans A(,36, 2n-1.4x) B
  • divisor real(top-bottom 1)?
  • xbar(,spot) real(sum(numbers(bottomtop,
    ),dim1))/divisor
  • forall(ibottomtop,j1numobs)
  • diff(j,i) real(numbers(i,j))-xbar(j,spot)?
  • diffsq diff diff
  • divisor real(top-bottom)?
  • stddev(,spot) sqrt(sum(diffsq(,bottom
    top),dim2)/divisor)?
  • ! This is real code out of a program

36
Sample Math Program
  • Program calculate
  • ! declare variables
  • real16 answer
  • real8, dimension(,), allocatable arrayA,
    arrayB
  • integer4 row, col, numvars
  • logical, dimension(), allocatable exists
  • ! do some work
  • print, 'How many variables are in the data set?
    integer only'
  • read(,) numvars
  • allocate(exists(numvars))?
  • exists .FALSE.
  • !allocate array and then make sure empty
  • loop1 do row 1, 25, 0.5
  • answer answer row
  • end do loop1
  • col int(answer)?
  • allocate (arrayA(numvars,col),arrayB(numvars,col))
    ?
  • arrayB sqrt(answer) arrayA
  • ! this program is really nonsense

37
Neural Net
  • How many LOC's in C/C?
  • Wait until you see it in Fortran 95/2003
  • 4 lines!!
  • Maintainable
  • Mathematical
  • Easy to understand
  • Function linear . . . .
  • Function logistic . .(sigmoidal or hyperbolic
    tan)?
  • Mid logistic(matmul(input, weight1))?
  • Out linear(matmul(mid, weight2))?

38
Contact Information
  • Brad Morantz
  • Tel 520-665-7026
  • In directory
  • Web page www.machine-cognition.com
  • Personal e-mail bradscientist_at_ieee.org
  • As a decision scientist, my goal is to solve
  • problems and find the best possible solution.
  • The best answers are based upon knowledge
  • and information.
Write a Comment
User Comments (0)
About PowerShow.com