Title: Portable Execution Time Analysis Method
1Portable Execution Time Analysis Method
?
Keiji Yamamoto (University of Tokyo) Yutaka
Ishikawa (University of Tokyo) Toshihiro Matsui
(Advanced Industrial Science and Technology)
2Background
- Real-time systems must finish tasks within fixed
time bounds - Controllers in cars, robots,
- Worst Case Execution Time (WCET) must be known
3Goal of this work
- To design a static WCET prediction tool
- Requirements
- Guarantee of WCET
- Portability
- Various architectures are supported
- Pentium, XScale
4WCET Analysis Approach
- Flow Analysis
- Determines the dynamic behavior of program
- Program code is represented by the graph
structure (Control Flow Graph CFG) - Timing Analysis
- Determines the execution time for each program
part on the hardware - Calculation
- Determines WCET using flow and timing information
5Problem of Flow analysis
- Analyzes assembly code
- no portability
- data flow analysis is difficult
- A variable name and type information are lost.
- Analyzes C language
- Compiler optimization cannot be taken into
consideration
6Problem of Timing analysis
- Model-based approach
- constructs Pipeline, I-Cache, D-Cache, Branch
prediction models - Accuracy of these models cannot be guaranteed
- Internal behavior of a processor is not
disclosure. - Measurement-based approach
- It is possible for all the factors to be included
in a measurement result - The scope of this approach is limited
7Our Approach
- Flow Analysis
- Analyzes intermediate expression of compiler
- Independent of Low-level and High-level language
- Compiler optimization can be taken into
consideration - Data flow analysis is performed on intermediate
expression - calculates loop bound
- Timing Analysis
- Simulation-based approach
- for memory access latency
- Measurement-based approach
- for instruction execution time
8WCET Analysis System - RETAS
9Flow Analysis
10Flow Analysis
- Modified GCC 4.0.2
- Flow analyzer is implemented as a part of
optimization phase - Analyzes intermediate expression of GCC
- TREE,RTL (Register Transfer Language)
- Independent of Language and Architecture
- Considered compiler optimization
- Calculate loop bound automatically
11Dynamic Flow
- Dynamic flow information is described as
Annotation - Using pragma
- Annotation is extracted inside a compiler
- pragma is translated into a function call with a
preprocessor
for ( c 0 c lt n c ) pragma WCET
(loop", 10) / WCET_LOOP (10) / sum c
12Timing Analysis
- Memory access time and the other execution time
are calculated - Simulation-based
- RTL-level memory access latency analysis
- Measurement-based
- calculate the execution time of instructions
without memory access
13Memory Access Latency Analysis
- RTL-level simulator
- Analyzes memory access pattern
- Independent of architecture
- Cache simulator
- Calculates memory access latency using memory
access pattern - Dependent of architecture
- Set-Associative, LRU Round-Robin
- Cache size, Line size, Way is modified by the
parameter
14Measurement-based analysis
- Assembly code is divided into basic blocks
- Timing analysis code is inserted into before and
after these basic blocks - These basic blocks are executed on a real machine
15Evaluation of SimpleScalar
- Evaluation environment
- default parameter is used
16Evaluation of SimpleScalar
- Estimated time is larger than the Observed time
- The error of fibonacci
- The pipeline overlap between basic blocks
17Accuracy of RTL-Level simulator
- The memory access of SimpleScalar and RTL-level
simulator is compared - RTL-level simulation is equal to a binary level
simulation - The number of load instruction is strictly equal
- Cache access is also almost equal
18Evaluation of Real-machine
19XScale Processor
XScale
- ObservedltEstimated
- predicted safely
- A prediction error is about several percent
- It is accurate because of simple architecture
20Pentium-M Processor
Pentium-M
- ObservedltEstimated
- predicted safely
- A prediction error is large as compared with
XScale
21Limitation
- Not implemented factor
- Instruction cache
- Branch prediction
- There is an instruction to be able to measure the
execution time with cycle. - Simple architecture not implemented this
instruction
22Conclusion (1/2)
- New WCET analysis method
- Flow analysis
- Using intermediate expression of GCC
- Architecture independent
- Timing analysis
- Memory access latency
- Using RTL-level simulator and Cache
simulator,cache hit and main memory access count
are estimated - Measurement execution time of each basic block
23Conclusion (2/2)
- Evaluates some architectures
- SimpleScalar, XScale, Pentium-M
- Our system estimates safe WCET
- Complex architecture, prediction error is large
- Future work
- We consider instruction cache
24End