Soft RealTime Scheduling on Simultaneous Multithreaded Processors presentation

About This Presentation

Transcript and Presenter's Notes

Title: Soft RealTime Scheduling on Simultaneous Multithreaded Processors

1
Soft Real-Time Scheduling on Simultaneous
Multithreaded Processors

Rohit Jain,
Christonpher J. Hughes
Sarita V. Adve
IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS02)
Borrowed from Yen-Sheng Chang
and presented by Cristiano Pereira
4/22/2005

2
Outline

Introduction
SMT review (hyperthreading)
Resource Sharing Algorithms
Co-Scheduling Algorithms
Experimental Methodology
Results
Conclusions

3
Introduction

Simultaneous multithreading (SMT) improves
processor throughput by processing instructions
from multiple threads each cycle.
Two Decisions with SMT
Co-schedule selection
Affects threads utilization
Resource sharing
Which threads get share processor resources among
co-scheduled threads
Problem unique to SMT processors
The choice of co-scheduling and resource sharing
algorithm may be tightly coupled.

4
Introduction - Objective

To find the best algorithm to increase
schedulability of soft real time tasks by
exploiting co-scheduling and resource sharing on
SMT processors

5
SMT review
simultaneous multithreaded
traditional (single-issue)
superscalar
multithreaded
6
Related Work

ICOUNT (seeks to maximize throughput)
Without consideration of any real-time deadline.
Symbiotic Job Scheduling for SMT
One interactive task with other non-real-time
tasks
Multiprocessor scheduling (without SMT)

7
Resource Sharing Algorithms (1)

Threads share most processor resources
Instruction fetch mechanism
Instruction window
Functional units
Caches
Previous work has focused resource sharing
algorithms to maximize total throughput (IPC)
No deadline concerns
May have negative/positive impact on the
schedulability

8
Resource Sharing Algorithms (2)

Throughput-driven (dynamic)
ICOUNT, gives priority to the thread that has the
least instructions in the instruction window
Performance prediction is difficult.
Performance guarantees (static)
Fixed set of resources is reserved for a given
job
May be suboptimal!
Performance prediction is easy (identical to
uni-processor).
Resources controlled per thread in this work
Fetch bandwidth instruction window (ICOUNT)
Other resources (thread blind, e.g. give FU to
oldest inst.)

9
Co-scheduling algorithms (1)

Partitioning (bin-packing flavor algorithms)
Allows admission control
Global scheduling
Task migration on an SMT processor is free.
Symbiosis-aware vs. Symbiosis-oblivious
On SMT processors, execution time of a job
depends on jobs co-scheduled with it.

10
Co-scheduling algorithms (2)

Design space explored
EDF as the underlying algorithm.
Partitioning ? EDF schedules the tasks within a
context
Global ? EDF chooses the next task.

11
Predicting Execution Time, Utilization, and
Symbiosis (1)

These algorithms need to know exec. time,
utilizations and symbiosis relations

IPC instruction count
12
Predicting Execution Time, Utilization, and
Symbiosis (2)

IPC
Job IPC in single-thread mode
? profiling one frame of each frame type.
Job IPC with static resource sharing
? profiling each allocation in single-threaded
mode
Job IPC with dynamic resource sharing
? profiles all possible co-schedules (N-tuples)
to obtain the task IPCs.
Average job IPC with dynamic resource sharing
? when the IPCs depend on the yet unknown
co-schedule, approximate as job IPC averaged
across all possible co-schedules
Instruction count
Use the average instruction count of a large
number of frames as the prediction.

13
Partitioning Algorithms (1)

A partition in SMT is a set of tasks such that no
two will execute simultaneously.
SMT supporting with N contexts has up to N
partitions

14
Partitioning Algorithms (2)

PART-NOSYM-DYN-b
Bin-packing based algorithm uses the
first-fit-decreasing-utilization (FFDU)
heuristic.
IPC average across all co-schedules
PART-NOSYM-DYN-e
Corrects under estimated IPC by adjusting by
increasing utilization threshold by smallest
amount
No task-set is ever rejected
Simulates the schedule for a hyper-period, to
determine if it would meet the deadlines.
Complexity is high, but it gives partitioning the
fairest showing against global scheduling.

15
Partitioning Algorithms (3)

PART-NOSYM-STAT
Independent of co-schedule because of static
resource allocation
IPC of each configuration
FFDU heuristic with an EDF admission test.
C1, C2, , Cn denote the N hardware contexts.
Initial all resources are allocated to C1
Re-allocation resources from C1 to Ck such that
Ck can accommodate new task
If C1 dont have enough resource ? Fail
Remove smallest utilization task form C1 to
another context that can accommodate it.
If no such context is found ? Fail

16
Partitioning Algorithms (4)

PART-SYM-DYN-b
Utilization average across all co-schedules
Maximizes average symbiosis among tasks in
different partitions, while keeping the total
utilization of tasks in each partition balanced.
Weighted hypergraph with nodes representing the
tasks
A hypergraph is a graph in which generalized
edges (called hyperedges) may connect more than
two nodes.
The weight on a hyperedge (u1, u2, , uN) is the
inverse of the symbiosis factor of the
co-schedule formed by tasks u1, u2, , uN.
Each node is weighted with its tasks
utilization.
A hypergraph-partitioning algorithm is used.
The sum of node-weights (utilization) is
balanced.
The weight of the hyperedges is minimized
(maximizing symbiosis)
PART-SYM-DYN-e

Reference B. L. Chamberlain. Graph Partitioning
Algorithms for Distributing Workloads of
Parallel Computations
17
Global scheduling algorithms (1)
18
Global scheduling algorithms (2)

Symbiosis-Oblivious Global Scheduling
GLOB-NOSYM-PLAIN
IPC average across all co-schedules
EDF N tasks with earliest deadlines are chosen
Tasks with arbitrarily low utilization miss
deadline (Dhall effect)
GLOB-NOSYM-US
EDF-USm/2m-1 algorithm
If Ti has utilization gt N/(2N-1), give it high
priority
Giving the highest priority to high utilization
tasks in the task set.

Reference A. Srinivasan and S. Baruah.
Deadline-based Scheduling of Periodic Task
Systems on Multiprcessor
19
Global Scheduling Algorithms (3)

Symbiosis-Aware Global Scheduling
GLOB-SYM-PLAIN
Extends EDF to exploit symbiosis in a
straightforward way.
It first selects the task with the earliest
deadline.
For the other (N-1) tasks, it chooses the set
that maximizes symbiosis when running with the
first task.
Positive ? Improve schedulability (improve
overall throughput)
Negative ? Potentially reduce schedulability (no
real-time characteristic)
GLOB-SYM-US
In the presence of high utilization tasks,
GLOB-SYM-PLAIN impairs schedulability
Improve the negative of GLOB-SYM-PLAIN
Defaults to GLOB-NOSYM-US if a task Ti has
utilization Ui gt N/2N-1
Otherwise, it defaults to GLOB-SYM-PLAIN

20
Comparison of properties
21
Experiment Setup (1)

Randomly generated task sets
Two workloads utilizations follow either normal
or bi-modal dist.
of tasks follows an uniform dist. (mean 8)
Periods from a set 100,200,,1600 with uniform
probability
Randomly generated IPCi (mean 3) and co-schedule
effects on IPCij
Metric
Success ratio percentage of tasks successful
scheduled by an algorithm (at most 5 deadl.
misses)

22
Experiment Setup (2)
RSIM simulator
Real Workload
23
Results

Best algorithm ? GLOB-SYM-US
Partitioning vs. Global algorithm
Global is generally better
Enhanced-versions and symbiosis awareness makes
PART more competitive to GLOB-SYM-US.
For bimodal, enhanced partition are the best
(distributes high utilization tasks)
Symbiosis-awareness
Partitioning ? often helps
Global scheduling ? helpful for high
utilization, not helpful for medium utilization
PLAIN vs US US does better for bi-modal
distribution as expected

24
Experimental Methodology

Metrics
critical serial utilization (CSU)
The total utilization obtained by uniformly
increasing the utilization of all tasks until a
further increase causes the task-set to become
unschedulable. (5 deadline ? soft real-time)

25
Results

Best algorithm ? GLOB-SYM-US
Static vs. Dynamic resource sharing
Static resource sharing generally implies lower
throughput than dynamic resource sharing.
Partitioning vs. Global algorithm
Enhanced-version is more competitive to
GLOB-SYM-US.
Symbiosis-awareness
Partitioning ? often helps
Global scheduling ? it depends

26
Conclusions

Best algorithm
Global scheduling, exploits symbiosis,
prioritizes high utilization tasks, uses dynamic
resource sharing.
Require a lot of profiling
Two alternatives
Partitioning algorithm that utilizes static
resource sharing
(PART-NONSYM-STAT)
Worse schedulability and somewhat more complex.
Provide a strict admission control and requires
less profiling.
Earliest deadline first global algorithms
(GLOB-NONSYM-PLAIN)
Not providing strict admission control, but
requires no profiling.

27
Conclusions

Dynamic resource sharing is better than static
for schedulability
Partitioning algorithm can be made competitive
with global scheduling algorithm, but with more
complexity.
Symbiosis-awareness
Beneficial for partitioning algorithms because
they do not entirely ignore real-time constraint
Can hurt or help global scheduling algorithms,
depending on the relative magnitude of the
symbiosis factors and total utilization of the
applications.

Thank you

Write a Comment

User Comments (0)

About PowerShow.com

Soft RealTime Scheduling on Simultaneous Multithreaded Processors PowerPoint PPT Presentation