Optimizing Matrix Multiplication with a Classifier Learning System - PowerPoint PPT Presentation

About This Presentation
Title:

Optimizing Matrix Multiplication with a Classifier Learning System

Description:

Tuning library for recursive matrix multiplication ... Change the table based on the feedback of performance and accuracy from previous ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 45
Provided by: ece5
Learn more at: https://www.ece.lsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Optimizing Matrix Multiplication with a Classifier Learning System


1
Optimizing Matrix Multiplication with a
Classifier Learning System
  • Xiaoming Li (presenter)
  • María Jesús Garzarán
  • University of Illinois at Urbana-Champaign

2
Tuning library for recursive matrix multiplication
  • Use cache-aware algorithms that take into account
    architectural features
  • Memory hierarchy
  • Register file,
  • Take into account input characteristics
  • matrix sizes
  • The process of tuning is automatic.

3
Recursive Matrix Partitioning
  • Previous approaches
  • Multiple recursive steps
  • Only divide by half

A
B
4
Recursive Matrix Partitioning
  • Previous approaches
  • Multiple recursive steps
  • Only divide by half

A
B
Step 1
5
Recursive Matrix Partitioning
  • Previous approaches
  • Multiple recursive steps
  • Only divide by half

A
B
Step 2
6
Recursive Matrix Partitioning
  • Our approach is more general
  • No need to divide by half
  • May use a single step to reach the same partition
  • Faster and more general

A
B
Step 1
7
Our approach
  • A general framework to describe a family of
    recursive matrix multiplication algorithms, where
    given the input dimensions of the matrices, we
    determine
  • Number of partition levels
  • How to partition at each level
  • An intelligent search method based on a
    classifier learning system
  • Search for the best partitioning strategy in a
    huge search space

8
Outline
  • Background
  • Partition Methods
  • Classifier Learning System
  • Experimental Results

9
Recursive layout framework
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

10
Recursive layout framework
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

2
1
4
3
11
Recursive layout in our framework
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

12
Recursive layout framework
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

13
Recursive layout framework
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

1
2
5
6
3
4
7
8
9
10
13
14
11
12
15
16
14
Padding
  • Necessary when the partition factor is not a
    divisor of the matrix dimension.

Divide by 3
2000
15
Padding
  • Necessary when the partition factor is not a
    divisor of the matrix dimension.

Divide by 3
2001
667
16
Padding
  • Necessary when the partition factor is not a
    divisor of the matrix dimension.

Divide by 4
2001
667
17
Padding
  • Necessary when the partition factor is not a
    divisor of the matrix dimension.

Divide by 4
2004
668
18
Recursive layout in our framework
  • Multiple level recursion
  • Support cache hierarchy
  • Square tile ? rectangular tile
  • Fit non-square matrixes

19
Recursive layout in our framework
  • Multiple level recursion
  • Support cache hierarchy
  • Square tile ? rectangular tile
  • Fit non-square matrixes

8
9
20
Recursive layout in our framework
  • Multiple level recursion
  • Support cache hierarchy
  • Square tile ? rectangular tile
  • Fit non-square matrixes

8
10
Padding
21
Recursive layout in our framework
  • Multiple level recursion
  • Support cache hierarchy
  • Square tile ? rectangular tile
  • Fit non-square matrixes

4
3
22
Outline
  • Background
  • Partition Methods
  • Classifier Learning System
  • Experimental Results

23
Two methods to partition matrices
  • Partition by Block (PB)
  • Specify the size of each tile
  • Example
  • Dimensions (M,N,K) (100, 100, 40)
  • Tile size (bm, bn, bk) (50, 50, 20)
  • Partition factors (pm, pn, pk)
    (2,2,2)
  • Tiles need not to be square

24
Two methods to partition matrices
  • Partition by Size (PS)
  • Specify the maximum size of the three tiles.
  • Maintain the ratios between dimensions constant
  • Example
  • (M,N,K) (100, 100,50)
  • Maximum tile size for M,N 1250
  • (pm, pn, pk) (2,2,1)
  • Generalization of the divide-by-half approach.
  • Tile size 1/4 matrix size

25
Outline
  • Background
  • Partition Methods
  • Classifier Learning System
  • Experimental Results

26
Classifier Learning System
  • Use the two partition primitives to determine how
    the input matrices are partitioned
  • Determine partition factors at each level
  • f (M,N,K) ? (pmi,pni,pki), i0,1,2 (only
    consider 3 levels)
  • The partition factors depend on the matrix size
  • Eg. The partitions factors of a (1000 x 1000)
    matrix should be different that those of a (50 x
    1000) matrix.
  • The partition factors also depend on the
    architectural characteristics, like cache size.

27
Determine the best partition factors
  • The search space is huge ? exhaustive search is
    impossible
  • Our proposal use a multi-step classifier
    learning system
  • Creates a table that given the matrix dimensions
    determines the partition factors

28
Classifier Learning System
  • The result of the classifier learning system is a
    table with two columns
  • Column 1 (Pattern) A string of 0, 1, and
    that encodes the dimensions of the matrices
  • Column 2 (Action) Partition method for one step
  • Built using the partition-by-block and
    partition-by-size primitives with different
    parameters.

29
Learn with Classifier System
30
Learn with Classifier System
5 bits / dim
31
Learn with Classifier System
24
16
32
Learn with Classifier System
24
16
33
Learn with Classifier System
12
8
34
Learn with Classifier System
12
8
35
Learn with Classifier System
12
8
36
Learn with Classifier System
4
4
37
How classifier learning algorithm works?
  • Change the table based on the feedback of
    performance and accuracy from previous runs.
  • Mutate the condition part of the table to adjust
    the range of matching matrix dimensions.
  • Mutate the action part to find the best partition
    method for the matching matrices.

38
Outline
  • Background
  • Partition Methods
  • Classifier Learning System
  • Experimental Results

39
Experimental Results
  • Experiments on three platforms
  • Sun UltraSparcIII
  • P4 Intel Xeon
  • Intel Itanium2
  • Matrices of sizes from 1000 x 1000 to 5000 x 5000

40
Algorithms
  • Classifier MMM our approach
  • Include the overhead of copying in and out of
    recursive layout
  • ATLAS Library generated by ATLAS using the
    search procedure without hand-written codes.
  • Has some type of blocking for L2
  • L1 One level of tiling
  • tile size the same that ATLAS for L1
  • L2 Two levels of tiling
  • L1tile and L2tile the same that ATLAS for L1

41
(No Transcript)
42
(No Transcript)
43
Conclusion and Future Work
  • Preliminary results prove the effectiveness of
    our approach
  • Sun UltraSparcIII and Xeon 18 and 5
    improvement, respectively.
  • Itanium -14
  • Need to improve padding mechanism
  • Reduce the amount of padding
  • Avoid unnecessary computation on padding

44
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com