Soft RealTime Scheduling on Simultaneous Multithreaded Processors - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Soft RealTime Scheduling on Simultaneous Multithreaded Processors

Description:

Introduction to SMT (hyperthreading) Related Works. Resource Sharing Algorithms ... Introduction to SMT (hyperthreading) Traditional Pipelining (single-issue) ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 32

Provided by: Gues200

Category:

more less

Transcript and Presenter's Notes

Title: Soft RealTime Scheduling on Simultaneous Multithreaded Processors

1
Soft Real-Time Scheduling on Simultaneous
Multithreaded Processors

Rohit Jain,
Christonpher J. Hughes
Sarita V. Adve
IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS02)
Present by Yen-Sheng Chang
Monday, November 24, 2003

2
Outline

Abstract
Introduction to SMT (hyperthreading)
Related Works
Resource Sharing Algorithms
Co-Scheduling Algorithms
Experimental Methodology
Results
Conclusions

3
Abstract

Simultaneous multithreading (SMT) improves
processor throughput by processing instructions
from multiple threads each cycle.
Two Decisions with SMT
co-schedule selection
Which threads to run simultaneously (the
co-schedule)
resource sharing
How to share processor resources among
co-scheduled threads.
The choice of co-scheduling and resource sharing
algorithm may be tightly coupled.

4
Abstract (conclude.)

We find (using simulation) that the best
algorithm uses global scheduling, exploit
symbiosis, prioritizes high utilization tasks,
and uses dynamic resource sharing.
Significant profiling overhead
No admission control
Trade off schedulability!!! (our approach)

5
Introduction to SMT (hyperthreading)
Traditional Pipelining (single-issue)
Superscalar (wide-issue)
2-issue
6
SMT (cont.)
Multithreaded Processor
7
SMT (cont.)
simultaneous multithreaded
traditional (single-issue)
superscalar
multithreaded
8
SMT (cont.)

Review of two questions
co-schedule
resource sharing

Architecture

9
Related Works

co-schedule selection
The fine-grained resource sharing problem is
unique to SMT.
ICOUNT (seeks to maximize throughput)
Without consideration of any real-time deadline.
Symbiotic Job Scheduling for SMT
Symbiosis exploited

10
Related Works

resource sharing
Transparent Threads
no real time tasks
Applications of Thread Prioritization in SMT
one interactive task with other non-real-time
tasks.
Real-Time Scheduling on Multithreaded
Processors
no SMT, no resource sharing problems

11
Resource Sharing Algorithms

The threads share most processor resources
Instruction fetch mechanism
Instruction window
Execution units (functional units)
Caches
Previous work has focused mostly on resource
sharing algorithms to maximize total throughput
in terms of completed instructions per cycle
(IPC)
May have negative/positive impact on the
schedulability.

12
Resource Sharing Algorithms (cont.)

Throughput-driven resource sharing
ICOUNT, gives priority to the thread that has the
least instructions in the instruction window.
Refer to Dynamic algorithm.
Performance prediction is difficult.
Resource sharing with performance guarantees
Static resource sharing algorithm where a fixed
set of resources is reserved for a given job.
May be suboptimal!
Performance prediction is easy (identical to
uniprocessor).
Resources controlled by thread-specific resource
sharing algorithms
SMT is particularly sensitive to the instruction
fetch bandwidth sharing.
Fetch bandwidth instruction window.

13
Co-Scheduling Algorithms

Design Space explored
Partitioning vs. Global scheduling
Symbiosis-aware vs. Symbiosis-oblivious
Prediction of Execution Time, Utilization, and
Symbiosis
Partitioning Algorithms
Global Scheduling Algorhtms

14
Co-Scheduling Algorithms (cont.)

Partitioning vs. Global scheduling
Admission control
Task migration on an SMT processor is free.
Symbiosis-aware vs. Symbiosis-oblivious
With dynamic resource sharing on an SMT
processor, the execution time of a job depends on
which other jobs are co-scheduled with it.

15
Co-Scheduling Algorithms (cont.)

Design space explored
EDF as the underlying algorithm.
Partitioning ? EDF schedules the tasks within a
context
Global scheduling ? EDF chooses the next task.

16
Predicting Execution Time, Utilization, and
Symbiosis

Review

IPC instruction count
17
Predicting Execution Time, Utilization, and
Symbiosis (cont.)

IPC
Job IPC in single-thread mode
? profiling one frame of each frame type.
Job IPC with static resource sharing
? profiling each allocation in single-threaded
mode
Job IPC with dynamic resource sharing
? profiles all possible co-schedules to obtain
the task IPCs.
Average job IPC with dynamic resource sharing
? when the IPCs depend on the co-schedule, an
approximation must be made.
? we use the job IPC averaged across all
possible co-schedules are measured above.
Instruction count
Use the average instruction count of a large
number of frames as the prediction.

18
Partitioning Algorithm

Different between multiprocessor and SMT
A partition in SMT is a set of tasks such that no
two will execute simultaneously. (Thus, on an SMT
supporting N contexts, up to N partitions may be
created)
The cost of migrating tasks for SMT is free!
SMT can allocate resources among the context.

19
Partitioning Algorithm (cont.)

PART-NOSYM-DYN-b
Bin-packing based algorithm uses the
first-fit-decreasing-utilization (FFDU)
heuristic.
Approximation for utilization (the need of
enhanced-version)
PART-NOSYM-DYN-e
Modifies the admission test so that no task-set
is ever rejected in this phase.
Simulates the schedule for a hyperperiod, to
determine if it would meet the deadlines.
Complexity is high, but it gives partitioning the
fairest showing against global scheduling.

20
Partitioning Algorithm (cont.)

PART-NOSYM-STAT
Independent of co-schedule
Only dependent on resource allocation
? No need for enhanced-version!
FFDU heuristic with an EDF admission test.
C1, C2, , Cn denote the N hardware contexts.
Initial all resources are allocated to C1
Re-allocation resources from C1 to Ck such that
Ck can accommodate.
If C1 dont have enough resource ? Fail
Remove smallest utilization task form C1 to
another context that can accommodate it.
If no such context is found ? Fail

21
Partitioning Algorithm (cont.)

PART-SYM-DYN-b
Maximizes average symbiosis among tasks in
different partitions, while keeping the total
utilization of tasks in each partition reasonably
balanced.
Weighted hypergraph with nodes representing the
tasks
A hypergraph is a graph in which generalized
edges (called hyperedges) may connect more than
two nodes.
The weight on a hyperedge (u1, u2, , uN) is the
inverse of the symbiosis factor of the
co-schedule formed by tasks u1, u2, , uN.
Each node is weighted with its tasks
utilization.
A hypergraph-partitioning algorithm is used.
The sum of node-weights (utilization) is
balanced.
The weight of the hyperedges is minimized
(maximizing symbiosis)
PART-SYM-DYN-e

Reference B. L. Chamberlain. Graph Partitioning
Algorithms for Distributing Workloads of
Parallel Computations
22
Global Scheduling Algorithms
23
Global Scheduling Algorithms (cont.)

Symbiosis-Oblivious Global Scheduling
GLOB-NOSYM-PLAIN
EDF
GLOB-NOSYM-US
EDF-USm/2m-1 algorithm
Giving the highest priority to high utilization
tasks in the task set.

Reference A. Srinivasan and S. Baruah.
Deadline-based Scheduling of Periodic Task
Systems on Multiprcessor
24
Global Scheduling Algorithms (cont.)

Symbiosis-Aware Global Scheduling
GLOB-SYM-PLAIN
Extends EDF to exploit symbiosis in a
straightforward way.
It first selects the task with the earliest
deadline.
For the other (N-1) tasks, it chooses the set
that maximizes symbiosis when running with the
first task.
Positive ? Improving schedulability (improve
overall throughput)
Negative ? Potentially reduce schedulability. (no
real-time characteristic)
GLOB-SYM-US
Improve the negative of GLOB-SYM-PLAIN
Defaults to GLOB-NOSYM-US if a task Ti has
utilization Ui gt N/2N-1
Otherwise, it defaults to GLOB-SYM-PLAIN

25
Co-schedule Algorithm (conclude)
26
Experimental Methodology

Metrics
critical serial utilization (CSU)
The total utilization obtained by uniformly
increasing the utilization of all tasks until a
further increase causes the task-set to become
unschedulable. (5 deadline ? soft real-time)

27
Experiment Setup
RSIM simulator
Real Workload
28
Results

Best algorithm ? GLOB-SYM-US
Static vs. Dynamic resource sharing
Static resource sharing generally implies lower
throughput than dynamic resource sharing.
Partitioning vs. Global algorithm
Enhanced-version is more competitive to
GLOB-SYM-US.
Symbiosis-awareness
Partitioning ? often helps
Global scheduling ? it depends

29
Conclusions

Best algorithm
Global scheduling, exploits symbiosis,
prioritizes high utilization tasks, uses dynamic
resource sharing.
Require a lot of profiling
Two alternatives
Partitioning algorithm that utilizes static
resource sharing
(PART-NONSYM-STAT)
Worse schedulability and somewhat more complex.
Provide a strict admission control and requires
less profilng.
Earliest deadline first global algorithms
(GLOB-NONSYM-PLAIN)
Not providing strict admission control, but
requires no profiling.

30
Conclusion (conclude)

Dynamic resource sharing is better than statis
for schedulability
Partitioning algorithm can be made competitive
with global scheduling algorithm, but with more
complexity.
Symbiosis-awareness
beneficial for partitioning algorithms because
they do not entirely ignore real-time constraint
Can hurt or help global scheduling algorithms,
depending on the relative magnitude of the
symbiosis factors and total utilization of the
applications.