Friday, September 22, 2006 PowerPoint PPT Presentation

presentation player overlay
1 / 21
About This Presentation
Transcript and Presenter's Notes

Title: Friday, September 22, 2006


1
Friday, September 22, 2006
  • If one ox could not do the job they did not try
    to grow a bigger ox, but used two oxen.
  • Grace Murray Hopper
  • (1906-1992)

2
Today
  • Block matrix operations
  • Network topologies

3
Strided access
  • Stride
  • Sequence of memory reads and writes to
    addresses, each of which is separated from the
    last by a constant interval called "the stride
    length
  • Unit stride

4
  • do i 1, N
  • do j 1, N
  • Ai Ai Bj
  • enddo
  • enddo

N is large so Bj cannot remain in cache until
it is used again in another iteration of outer
loop. Little reuse between touches How many cache
misses for A and B?
5
Blocking
  • do i 1, N
  • do j 1, N, S
  • do jj j, MIN(jS, N)
  • Ai Ai Bjj
  • enddo
  • enddo
  • enddo
  • do i 1, N
  • do j 1, N
  • Ai Ai Bj
  • enddo
  • enddo

6
Blocking
  • do j 1, N, S
  • do i 1, N
  • do jj j, MIN(jS, N)
  • Ai Ai Bjj
  • enddo
  • enddo
  • enddo
  • do i 1, N
  • do j 1, N
  • Ai Ai Bj
  • enddo
  • enddo

S is the maximum number of elements of B that can
remain in cache between two iterations of the i
loop Block or strip mine How many cache misses
for A and B?
7
Operation Count vs. Memory Operations
  • Example Matrix multiplication
  • Previous example?

8
  • Block matrix operations

9
Matrix multiplication
  • int i,j,k
  • for (i0iltni)
  • for(j0jltnj)
  • for (k0kltnk)
  • cijcij aikbkj

Remember to initialize cij to zero
10
Matrix multiplication with blocking
  • int i,j,k,ii,jj,kk
  • for (ii0iiltniiS)
  • for (jj0jjltnjjS)
  • for (kk0kkltnkkS)
  • for(iiiiltmin((iiS),n)i)
  • for(jjjjltmin((jjS),n)j)
    for(kkkkltmin((kkS),n)k)
  • cijcijaikbkj

Remember to initialize cij to zero
11
Exercise
  • Matrix Vector Multiplication

12
Cache coherence in multiprocessor systems
  • Suppose two processors on a shared bus have
    loaded the same variable.
  • If one processor changes value of that variable
    then

13
Cache coherence in multiprocessor systems
  • Suppose two processors on a shared bus have
    loaded the same variable.
  • If one processor changes value of that variable
    then
  • Invalidate other copies
  • Update other copies

14
(No Transcript)
15
Cache coherence in multiprocessor systems
  • What if a processor reads a data item only once
    initially?
  • Invalidate protocol is more commonly used.

16
False Sharing (multiprocessor)
  • Two processors are accessing different data items
    in the same cache block.
  • What happens if they both attempt to write to it?

17
False Sharing (multiprocessor)
  • Two processors are accessing different data items
    in the same cache block.
  • What happens if they both attempt to write to it?
  • Padding in data structures (tradeoff space vs.
    time)

18
Network Topologies
  • Bus based, crossbar and multistage networks
  • Earth simulator crossbar
  • IBM SP-2 Multistage network

19
Network Topologies
Large number of links in completely
connected. Bottleneck in star topology.
20
Network Topologies
1-D torus
Intel Paragon 2-D Mesh BlueGene/L 3-D
torus Cray TE3 3-D Cube
21
  • 2-D and 3-D meshes are common in parallel
    computers
  • Regularly structured computation maps naturally
    to 2-D mesh.
  • 3-D network topologies weather modeling,
    structure modeling
Write a Comment
User Comments (0)
About PowerShow.com