Data Reuse in Embedded Processors - PowerPoint PPT Presentation

About This Presentation
Title:

Data Reuse in Embedded Processors

Description:

Multi-media embedded applications have many recurring time consuming and long ... 82 million instructions to encode to speech a review for 'Apocalypse Now' ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 23
Provided by: petert45
Learn more at: http://www.ece.uah.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Reuse in Embedded Processors


1
Data Reuse in Embedded Processors
  • Peter Trenkle
  • CPE631 Project Presentation

2
Problem Statement
  • Multi-media embedded applications have many
    recurring time consuming and long latency
    instructions
  • Floating point operations
  • Time-consuming instructions (Multiplies and
    Divides) which can cause 15-30 cycle delays in
    embedded processors

3
Problem Statement
  • Due to the demand for higher portability of
    computing power, power consumption is a big
    design constraint in embedded systems decreased
    clock speed is important
  • Long latency instructions have the potential to
    cause data hazards, thus decreasing performance

4
Goals
  • Develop a methodology to increase embedded
    applications performance
  • Decrease the need to go through a complete
    multiply or divide instruction, opportunities
    exist for program speed up
  • Decrease the embedded systems clock frequency
    reducing power consumption
  • Decrease amount of data hazards due to long
    latencies

5
Applications of Solution
  • Image processing
  • Low local entropy of processed data sets
  • Speech encoding
  • Human speech characteristics
  • High Speed Signal processing
  • Values could change very little over short run,
    saves duplication of instructions

6
Solution Data Reuse
  • Establish a memo table of a set length on an ARM
    processor that holds the operands and results of
    past multiply and/or divide instructions
  • Send the operands to both the memo table and
    multiply/division unit, if hit in the memo table,
    complete a multi-cycle instruction in one clock
    cycle

7
Diagram of Memo Table
Operand 1
Operand 2
Multiply/Division Unit
Memo Table
Hit/Miss
Operation Complete
Result
8
Definition of the Memo Table
  • The memo table is set up as a Look Up Table where
    the most recently used entries are present
  • The table consists of a long tag, consisting of
    two operands, and the result
  • Look-up and calculation are done in parallel to
    avoid adding latency

9
Constraints of the Memo Table
  • Trivial Calculations, such as multiplying by 0,
    are not logged into the table
  • Trivial calculations can be handled by the
    execution unit
  • If one of the operands in the table is referenced
    by a negative of itself, it results in a hit

10
Current Implementations
  • One paper deals with this concept Accelerating
    Multi-Media Processing by Implementing Memoing in
    Multiplication and Division Units (Citron,
    Feitelson, Larry Rudolph)
  • This paper dealt with Pentium Pro, Alpha 21164,
    ULTRASparc-II and MIPS R10000 leading
    microprocessors at the time

11
Experiment Configuration
  • A modified sim-safe application saves all
    instructions to a file (safet)
  • A C program was created to read in the data and
    simulate hit rates of certain instructions if
    loaded into the memo table (insomnia)
  • Floating point intensive MI-Bench benchmarks were
    used (rsynth, lame)

12
Configuration Safet
  • Modified version of sim-safe (performs functional
    simulation checking for correct memory
    reference), in the command line, allows for
    specifying a log file and number of instructions
    for data retrieval
  • Creates 300 MB to 4 GB of opcode and operand
    data solutions discarded
  • Shows most instructions run by the benchmark

13
Configuration Insomnia
  • Insomnia allows specification of logfile, num of
    instructions per log file, replacement policy,
    number of entries in memo table, number of log
    files, and opcode to be observed
  • Insomnia returns the number of times opcode was
    called, number of memo table hits, number of zero
    operands, and number of negative operand hits

14
Configuration Benchmarks
  • Uses MiBench ARM processor benchmarks
  • Rsynth Text to Speech Encoder, program executes
    82 million instructions to encode to speech a
    review for Apocalypse Now
  • Lame Wav to MP3 encoder
  • Both Benchmarks have over 20 of total
    instructions Floating Point, prime candidates for
    memo table implementation

15
Experiments Run
  • The opcode chosen to experiment with was 102
    MUL.
  • It was run with the following table lengths
    (4,8,16,32,64,128,256)
  • Three different replacement policies were run
    (FIFO, LRU, and Random)

16
Results
  • Opcode 102 (MUL) from rsynth has been tested
  • Rsynth has over 82 million instructions
  • 102 has only 134,000 entries

17
Results from LRU Replacement
18
Results from FIFO Replacement
19
Results from Random Replacement
20
Analysis of Results
  • Order of Multiplications, helped the hit rate
    results of smaller memo tables
  • Example
  • 102 1 5
  • 102 1 5
  • ..
  • 102 1 3
  • 102 1 3
  • ..
  • With this operand ordering a single entry memo
    table would have a significant hit rate

21
Analysis of Results
  • For better results other benchmarks should have
    more representative operand ordering
  • MUL has less than 1 of the total operations FP
    ADD has close to 20 of the operations possibly
    use memo table to optimize this solution

22
Conclusions
  • For future tests, number of operands present in
    code should also be analyzed to determine best
    instruction to memoize
  • The chance for better performance exists, but
    needs many different applications to completely
    verify
Write a Comment
User Comments (0)
About PowerShow.com