Title: Scratchpad Allocation for Concurrent Embedded Software
1Scratchpad Allocation for Concurrent Embedded
Software
- Vivy Suhendra
- Abhik Roychoudhury
- Tulika Mitra
- National University of Singapore
2Scratchpad Memory
- Software-managed on-chip fast memory
- Better timing predictability than caches
- Real-time guarantees
- Content selection
- Beneficial memory blocks
- Runtime management
- Usage by active tasks
3Contribution
- Sequential application
- Individual tasks
- Worst-case execution time (WCET)
Scratchpad Allocation
- Concurrent application
- Interacting tasks
- Control/data dependency
- Preemption
- Worst-case response time (WCRT)
4System Model
- Message Sequence Chart (MSC)
Radio Control
SPI Control
FBW Main
task
message communication
task
task
preemptive
5Motivating Example
Reload
Reload
Time-multiplexing
More opportunities!
6Considerations
- Sharing decision lifetime analysis
- Feedback loop
7Workflow
8Workflow
Start
Interference improves?
Yes
No
Scratchpad sharing scheme allocation
Stop
- Initialize
- empty allocation
- full interference
Task lifetimes interference graph
Scratchpad allocation decision
Task analysis
WCRT analysis
Task WCETs memory profiles
9Task Analysis
- Static timing analysis
- Worst-case execution time path
- Tool Chronos
- Micro-architecture modeling
- Infeasible path detection
- Memory profile
- This work code allocation (basic block)
- Area of code blocks
- Gain if allocated
- Execution frequency reduction in latency
10Workflow
Start
Interference improves?
Yes
No
Scratchpad sharing scheme allocation
Stop
- Initialize
- empty allocation
- full interference
Task lifetimes interference graph
Scratchpad allocation decision
Task analysis
Task WCETs memory profiles
WCRT analysis
11WCRT Analysis
- Yen Wolf, TPDS 1998
-
- Compute earliest, latest start and finish times
- Computation time Finish(t) Start(t) WCRT(t)
- Task dependencies Start(u) Finish(t)
- ? iterative tightening of bounds
-
- WCRT(t) is a function of
- WCET(t), and
- Delay by higher-priority tasks whose lifetimes
overlap ts - ? fixed-point computation
12WCRT Analysis
Application WCRT
13WCRT Analysis
- Changed interference pattern after allocation?
Problem returns!
14Adjustment
W2
Slack enforcement
15Workflow
Start
Interference improves?
Yes
No
Scratchpad sharing scheme allocation
Stop
- Initialize
- empty allocation
- full interference
Task lifetimes interference graph
Scratchpad allocation decision
Task analysis
WCRT analysis
Task WCETs memory profiles
16Profile-based Knapsack (PK)
Distribute space among tasks based on memory
profiles
17Profile-based Knapsack (PK)
- Integer Linear Programming
- Objective minimize
- Capacity constraint
18Interference Clustering (IC)
fm1
fm2
fm4
fr0
fr1
fs0
- Isolate interference in clusters
- Intra-cluster space distribution
- Inter-cluster time-multiplexing
19Graph Coloring (GC)
ILP adjustment
fm2
fr0
- Refined interference relation
- Same color time-multiplexing
- Inter-color space distribution
fs0
fm1
fr1
fm4
20Interference Reduction
fm1
fm2
fm4
fr0
fr1
fs0
Slack enforcement
fm2
fr0
Eliminate chosen interference to allow better
sharing scheme
fs0
fm1
fr1
fm4
21Critical Path Interference Reduction (CR)
- Find interferences on critical path
- (t, u) t on critical path, u preempts t
- Identify interference with worst impact
- Longest duration of preemption
- Eliminate the interference
- Impose slack
- Re-analyze WCRT
- Propagate lifetime shift
- Iterate
- Stop when no more improvement
22Extension to Multiprocessor
CPU1
CPU2
SPM2
SPM1
23UAV Application PapaBench
Autopilot
Fly-By-Wire
24Evaluation Parameters
- Scratchpad latency 1 cycle
- Main memory latency 100 cycles
- Fetch width 16 B (2 instructions)
- Task code sizes 96 B 6.3 KB
- 512 B 8 KB total scratchpad size
- 1, 2, 4 processors
251-PE Configuration
262-PE Configuration
274-PE Configuration
28Concluding Remarks
- Scratchpad allocation considering concurrent
application - Process interaction significantly affect
application response time - Justifies interference reduction via slack
enforcement
29vivy, abhik, tulika _at_comp.nus.edu.sg
30Related Work
31Graph Coloring (GC)
- NP-complete need heuristics
- Welsh-Powell algorithm
- Initialize all nodes to uncolored
- Traverse the nodes in decreasing order of degree
- Assign color 1 to an uncolored nodeif no
adjacent node has been assigned color 1 - Repeat second step with colors 2, 3, etc
- Until no node is uncolored
32UAV Application PapaBench
33Algorithm Runtime