Title: Soft RealTime Scheduling on Simultaneous Multithreaded Processors
1Soft Real-Time Scheduling on Simultaneous
Multithreaded Processors
- Rohit Jain,
- Christonpher J. Hughes
- Sarita V. Adve
- IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS02)
- Borrowed from Yen-Sheng Chang
- and presented by Cristiano Pereira
- 4/22/2005
2Outline
- Introduction
- SMT review (hyperthreading)
- Resource Sharing Algorithms
- Co-Scheduling Algorithms
- Experimental Methodology
- Results
- Conclusions
3Introduction
- Simultaneous multithreading (SMT) improves
processor throughput by processing instructions
from multiple threads each cycle. - Two Decisions with SMT
- Co-schedule selection
- Affects threads utilization
- Resource sharing
- Which threads get share processor resources among
co-scheduled threads - Problem unique to SMT processors
- The choice of co-scheduling and resource sharing
algorithm may be tightly coupled.
4Introduction - Objective
- To find the best algorithm to increase
schedulability of soft real time tasks by
exploiting co-scheduling and resource sharing on
SMT processors
5SMT review
simultaneous multithreaded
traditional (single-issue)
superscalar
multithreaded
6Related Work
- ICOUNT (seeks to maximize throughput)
- Without consideration of any real-time deadline.
- Symbiotic Job Scheduling for SMT
- One interactive task with other non-real-time
tasks - Multiprocessor scheduling (without SMT)
7Resource Sharing Algorithms (1)
- Threads share most processor resources
- Instruction fetch mechanism
- Instruction window
- Functional units
- Caches
- Previous work has focused resource sharing
algorithms to maximize total throughput (IPC) - No deadline concerns
- May have negative/positive impact on the
schedulability
8Resource Sharing Algorithms (2)
- Throughput-driven (dynamic)
- ICOUNT, gives priority to the thread that has the
least instructions in the instruction window - Performance prediction is difficult.
- Performance guarantees (static)
- Fixed set of resources is reserved for a given
job - May be suboptimal!
- Performance prediction is easy (identical to
uni-processor). - Resources controlled per thread in this work
- Fetch bandwidth instruction window (ICOUNT)
- Other resources (thread blind, e.g. give FU to
oldest inst.)
9Co-scheduling algorithms (1)
- Partitioning (bin-packing flavor algorithms)
- Allows admission control
- Global scheduling
- Task migration on an SMT processor is free.
- Symbiosis-aware vs. Symbiosis-oblivious
- On SMT processors, execution time of a job
depends on jobs co-scheduled with it.
10Co-scheduling algorithms (2)
- Design space explored
- EDF as the underlying algorithm.
- Partitioning ? EDF schedules the tasks within a
context - Global ? EDF chooses the next task.
11Predicting Execution Time, Utilization, and
Symbiosis (1)
- These algorithms need to know exec. time,
utilizations and symbiosis relations
IPC instruction count
12Predicting Execution Time, Utilization, and
Symbiosis (2)
- IPC
- Job IPC in single-thread mode
- ? profiling one frame of each frame type.
- Job IPC with static resource sharing
- ? profiling each allocation in single-threaded
mode - Job IPC with dynamic resource sharing
- ? profiles all possible co-schedules (N-tuples)
to obtain the task IPCs. - Average job IPC with dynamic resource sharing
- ? when the IPCs depend on the yet unknown
co-schedule, approximate as job IPC averaged
across all possible co-schedules - Instruction count
- Use the average instruction count of a large
number of frames as the prediction.
13Partitioning Algorithms (1)
- A partition in SMT is a set of tasks such that no
two will execute simultaneously. - SMT supporting with N contexts has up to N
partitions
14Partitioning Algorithms (2)
- PART-NOSYM-DYN-b
- Bin-packing based algorithm uses the
first-fit-decreasing-utilization (FFDU)
heuristic. - IPC average across all co-schedules
- PART-NOSYM-DYN-e
- Corrects under estimated IPC by adjusting by
increasing utilization threshold by smallest
amount - No task-set is ever rejected
- Simulates the schedule for a hyper-period, to
determine if it would meet the deadlines. - Complexity is high, but it gives partitioning the
fairest showing against global scheduling.
15Partitioning Algorithms (3)
- PART-NOSYM-STAT
- Independent of co-schedule because of static
resource allocation - IPC of each configuration
- FFDU heuristic with an EDF admission test.
- C1, C2, , Cn denote the N hardware contexts.
- Initial all resources are allocated to C1
- Re-allocation resources from C1 to Ck such that
Ck can accommodate new task - If C1 dont have enough resource ? Fail
- Remove smallest utilization task form C1 to
another context that can accommodate it. - If no such context is found ? Fail
16Partitioning Algorithms (4)
- PART-SYM-DYN-b
- Utilization average across all co-schedules
- Maximizes average symbiosis among tasks in
different partitions, while keeping the total
utilization of tasks in each partition balanced. - Weighted hypergraph with nodes representing the
tasks - A hypergraph is a graph in which generalized
edges (called hyperedges) may connect more than
two nodes. - The weight on a hyperedge (u1, u2, , uN) is the
inverse of the symbiosis factor of the
co-schedule formed by tasks u1, u2, , uN. - Each node is weighted with its tasks
utilization. - A hypergraph-partitioning algorithm is used.
- The sum of node-weights (utilization) is
balanced. - The weight of the hyperedges is minimized
(maximizing symbiosis) - PART-SYM-DYN-e
Reference B. L. Chamberlain. Graph Partitioning
Algorithms for Distributing Workloads of
Parallel Computations
17Global scheduling algorithms (1)
18Global scheduling algorithms (2)
- Symbiosis-Oblivious Global Scheduling
- GLOB-NOSYM-PLAIN
- IPC average across all co-schedules
- EDF N tasks with earliest deadlines are chosen
- Tasks with arbitrarily low utilization miss
deadline (Dhall effect) - GLOB-NOSYM-US
- EDF-USm/2m-1 algorithm
- If Ti has utilization gt N/(2N-1), give it high
priority - Giving the highest priority to high utilization
tasks in the task set.
Reference A. Srinivasan and S. Baruah.
Deadline-based Scheduling of Periodic Task
Systems on Multiprcessor
19Global Scheduling Algorithms (3)
- Symbiosis-Aware Global Scheduling
- GLOB-SYM-PLAIN
- Extends EDF to exploit symbiosis in a
straightforward way. - It first selects the task with the earliest
deadline. - For the other (N-1) tasks, it chooses the set
that maximizes symbiosis when running with the
first task. - Positive ? Improve schedulability (improve
overall throughput) - Negative ? Potentially reduce schedulability (no
real-time characteristic) - GLOB-SYM-US
- In the presence of high utilization tasks,
GLOB-SYM-PLAIN impairs schedulability - Improve the negative of GLOB-SYM-PLAIN
- Defaults to GLOB-NOSYM-US if a task Ti has
utilization Ui gt N/2N-1 - Otherwise, it defaults to GLOB-SYM-PLAIN
20Comparison of properties
21Experiment Setup (1)
- Randomly generated task sets
- Two workloads utilizations follow either normal
or bi-modal dist. - of tasks follows an uniform dist. (mean 8)
- Periods from a set 100,200,,1600 with uniform
probability - Randomly generated IPCi (mean 3) and co-schedule
effects on IPCij - Metric
- Success ratio percentage of tasks successful
scheduled by an algorithm (at most 5 deadl.
misses)
22Experiment Setup (2)
RSIM simulator
Real Workload
23Results
- Best algorithm ? GLOB-SYM-US
- Partitioning vs. Global algorithm
- Global is generally better
- Enhanced-versions and symbiosis awareness makes
PART more competitive to GLOB-SYM-US. - For bimodal, enhanced partition are the best
(distributes high utilization tasks) - Symbiosis-awareness
- Partitioning ? often helps
- Global scheduling ? helpful for high
utilization, not helpful for medium utilization - PLAIN vs US US does better for bi-modal
distribution as expected
24Experimental Methodology
- Metrics
- critical serial utilization (CSU)
- The total utilization obtained by uniformly
increasing the utilization of all tasks until a
further increase causes the task-set to become
unschedulable. (5 deadline ? soft real-time)
25Results
- Best algorithm ? GLOB-SYM-US
- Static vs. Dynamic resource sharing
- Static resource sharing generally implies lower
throughput than dynamic resource sharing. - Partitioning vs. Global algorithm
- Enhanced-version is more competitive to
GLOB-SYM-US. - Symbiosis-awareness
- Partitioning ? often helps
- Global scheduling ? it depends
26Conclusions
- Best algorithm
- Global scheduling, exploits symbiosis,
prioritizes high utilization tasks, uses dynamic
resource sharing. - Require a lot of profiling
- Two alternatives
- Partitioning algorithm that utilizes static
resource sharing - (PART-NONSYM-STAT)
- Worse schedulability and somewhat more complex.
- Provide a strict admission control and requires
less profiling. - Earliest deadline first global algorithms
- (GLOB-NONSYM-PLAIN)
- Not providing strict admission control, but
requires no profiling.
27Conclusions
- Dynamic resource sharing is better than static
for schedulability - Partitioning algorithm can be made competitive
with global scheduling algorithm, but with more
complexity. - Symbiosis-awareness
- Beneficial for partitioning algorithms because
they do not entirely ignore real-time constraint - Can hurt or help global scheduling algorithms,
depending on the relative magnitude of the
symbiosis factors and total utilization of the
applications.
28