Soft RealTime Scheduling on Simultaneous Multithreaded Processors - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Soft RealTime Scheduling on Simultaneous Multithreaded Processors

Description:

Introduction to SMT (hyperthreading) Related Works. Resource Sharing Algorithms ... Introduction to SMT (hyperthreading) Traditional Pipelining (single-issue) ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 32
Provided by: Gues200
Category:

less

Transcript and Presenter's Notes

Title: Soft RealTime Scheduling on Simultaneous Multithreaded Processors


1
Soft Real-Time Scheduling on Simultaneous
Multithreaded Processors
  • Rohit Jain,
  • Christonpher J. Hughes
  • Sarita V. Adve
  • IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS02)
  • Present by Yen-Sheng Chang
  • Monday, November 24, 2003

2
Outline
  • Abstract
  • Introduction to SMT (hyperthreading)
  • Related Works
  • Resource Sharing Algorithms
  • Co-Scheduling Algorithms
  • Experimental Methodology
  • Results
  • Conclusions

3
Abstract
  • Simultaneous multithreading (SMT) improves
    processor throughput by processing instructions
    from multiple threads each cycle.
  • Two Decisions with SMT
  • co-schedule selection
  • Which threads to run simultaneously (the
    co-schedule)
  • resource sharing
  • How to share processor resources among
    co-scheduled threads.
  • The choice of co-scheduling and resource sharing
    algorithm may be tightly coupled.

4
Abstract (conclude.)
  • We find (using simulation) that the best
    algorithm uses global scheduling, exploit
    symbiosis, prioritizes high utilization tasks,
    and uses dynamic resource sharing.
  • Significant profiling overhead
  • No admission control
  • Trade off schedulability!!! (our approach)

5
Introduction to SMT (hyperthreading)
Traditional Pipelining (single-issue)
Superscalar (wide-issue)
2-issue
6
SMT (cont.)
Multithreaded Processor
7
SMT (cont.)
simultaneous multithreaded
traditional (single-issue)
superscalar
multithreaded
8
SMT (cont.)
  • Review of two questions
  • co-schedule
  • resource sharing
  • Architecture

9
Related Works
  • co-schedule selection
  • The fine-grained resource sharing problem is
    unique to SMT.
  • ICOUNT (seeks to maximize throughput)
  • Without consideration of any real-time deadline.
  • Symbiotic Job Scheduling for SMT
  • Symbiosis exploited

10
Related Works
  • resource sharing
  • Transparent Threads
  • no real time tasks
  • Applications of Thread Prioritization in SMT
  • one interactive task with other non-real-time
    tasks.
  • Real-Time Scheduling on Multithreaded
    Processors
  • no SMT, no resource sharing problems

11
Resource Sharing Algorithms
  • The threads share most processor resources
  • Instruction fetch mechanism
  • Instruction window
  • Execution units (functional units)
  • Caches
  • Previous work has focused mostly on resource
    sharing algorithms to maximize total throughput
    in terms of completed instructions per cycle
    (IPC)
  • May have negative/positive impact on the
    schedulability.

12
Resource Sharing Algorithms (cont.)
  • Throughput-driven resource sharing
  • ICOUNT, gives priority to the thread that has the
    least instructions in the instruction window.
  • Refer to Dynamic algorithm.
  • Performance prediction is difficult.
  • Resource sharing with performance guarantees
  • Static resource sharing algorithm where a fixed
    set of resources is reserved for a given job.
  • May be suboptimal!
  • Performance prediction is easy (identical to
    uniprocessor).
  • Resources controlled by thread-specific resource
    sharing algorithms
  • SMT is particularly sensitive to the instruction
    fetch bandwidth sharing.
  • Fetch bandwidth instruction window.

13
Co-Scheduling Algorithms
  • Design Space explored
  • Partitioning vs. Global scheduling
  • Symbiosis-aware vs. Symbiosis-oblivious
  • Prediction of Execution Time, Utilization, and
    Symbiosis
  • Partitioning Algorithms
  • Global Scheduling Algorhtms

14
Co-Scheduling Algorithms (cont.)
  • Partitioning vs. Global scheduling
  • Admission control
  • Task migration on an SMT processor is free.
  • Symbiosis-aware vs. Symbiosis-oblivious
  • With dynamic resource sharing on an SMT
    processor, the execution time of a job depends on
    which other jobs are co-scheduled with it.

15
Co-Scheduling Algorithms (cont.)
  • Design space explored
  • EDF as the underlying algorithm.
  • Partitioning ? EDF schedules the tasks within a
    context
  • Global scheduling ? EDF chooses the next task.

16
Predicting Execution Time, Utilization, and
Symbiosis
  • Review

IPC instruction count
17
Predicting Execution Time, Utilization, and
Symbiosis (cont.)
  • IPC
  • Job IPC in single-thread mode
  • ? profiling one frame of each frame type.
  • Job IPC with static resource sharing
  • ? profiling each allocation in single-threaded
    mode
  • Job IPC with dynamic resource sharing
  • ? profiles all possible co-schedules to obtain
    the task IPCs.
  • Average job IPC with dynamic resource sharing
  • ? when the IPCs depend on the co-schedule, an
    approximation must be made.
  • ? we use the job IPC averaged across all
    possible co-schedules are measured above.
  • Instruction count
  • Use the average instruction count of a large
    number of frames as the prediction.

18
Partitioning Algorithm
  • Different between multiprocessor and SMT
  • A partition in SMT is a set of tasks such that no
    two will execute simultaneously. (Thus, on an SMT
    supporting N contexts, up to N partitions may be
    created)
  • The cost of migrating tasks for SMT is free!
  • SMT can allocate resources among the context.

19
Partitioning Algorithm (cont.)
  • PART-NOSYM-DYN-b
  • Bin-packing based algorithm uses the
    first-fit-decreasing-utilization (FFDU)
    heuristic.
  • Approximation for utilization (the need of
    enhanced-version)
  • PART-NOSYM-DYN-e
  • Modifies the admission test so that no task-set
    is ever rejected in this phase.
  • Simulates the schedule for a hyperperiod, to
    determine if it would meet the deadlines.
  • Complexity is high, but it gives partitioning the
    fairest showing against global scheduling.

20
Partitioning Algorithm (cont.)
  • PART-NOSYM-STAT
  • Independent of co-schedule
  • Only dependent on resource allocation
  • ? No need for enhanced-version!
  • FFDU heuristic with an EDF admission test.
  • C1, C2, , Cn denote the N hardware contexts.
  • Initial all resources are allocated to C1
  • Re-allocation resources from C1 to Ck such that
    Ck can accommodate.
  • If C1 dont have enough resource ? Fail
  • Remove smallest utilization task form C1 to
    another context that can accommodate it.
  • If no such context is found ? Fail

21
Partitioning Algorithm (cont.)
  • PART-SYM-DYN-b
  • Maximizes average symbiosis among tasks in
    different partitions, while keeping the total
    utilization of tasks in each partition reasonably
    balanced.
  • Weighted hypergraph with nodes representing the
    tasks
  • A hypergraph is a graph in which generalized
    edges (called hyperedges) may connect more than
    two nodes.
  • The weight on a hyperedge (u1, u2, , uN) is the
    inverse of the symbiosis factor of the
    co-schedule formed by tasks u1, u2, , uN.
  • Each node is weighted with its tasks
    utilization.
  • A hypergraph-partitioning algorithm is used.
  • The sum of node-weights (utilization) is
    balanced.
  • The weight of the hyperedges is minimized
    (maximizing symbiosis)
  • PART-SYM-DYN-e

Reference B. L. Chamberlain. Graph Partitioning
Algorithms for Distributing Workloads of
Parallel Computations
22
Global Scheduling Algorithms
23
Global Scheduling Algorithms (cont.)
  • Symbiosis-Oblivious Global Scheduling
  • GLOB-NOSYM-PLAIN
  • EDF
  • GLOB-NOSYM-US
  • EDF-USm/2m-1 algorithm
  • Giving the highest priority to high utilization
    tasks in the task set.

Reference A. Srinivasan and S. Baruah.
Deadline-based Scheduling of Periodic Task
Systems on Multiprcessor
24
Global Scheduling Algorithms (cont.)
  • Symbiosis-Aware Global Scheduling
  • GLOB-SYM-PLAIN
  • Extends EDF to exploit symbiosis in a
    straightforward way.
  • It first selects the task with the earliest
    deadline.
  • For the other (N-1) tasks, it chooses the set
    that maximizes symbiosis when running with the
    first task.
  • Positive ? Improving schedulability (improve
    overall throughput)
  • Negative ? Potentially reduce schedulability. (no
    real-time characteristic)
  • GLOB-SYM-US
  • Improve the negative of GLOB-SYM-PLAIN
  • Defaults to GLOB-NOSYM-US if a task Ti has
    utilization Ui gt N/2N-1
  • Otherwise, it defaults to GLOB-SYM-PLAIN

25
Co-schedule Algorithm (conclude)
26
Experimental Methodology
  • Metrics
  • critical serial utilization (CSU)
  • The total utilization obtained by uniformly
    increasing the utilization of all tasks until a
    further increase causes the task-set to become
    unschedulable. (5 deadline ? soft real-time)

27
Experiment Setup
RSIM simulator
Real Workload
28
Results
  • Best algorithm ? GLOB-SYM-US
  • Static vs. Dynamic resource sharing
  • Static resource sharing generally implies lower
    throughput than dynamic resource sharing.
  • Partitioning vs. Global algorithm
  • Enhanced-version is more competitive to
    GLOB-SYM-US.
  • Symbiosis-awareness
  • Partitioning ? often helps
  • Global scheduling ? it depends

29
Conclusions
  • Best algorithm
  • Global scheduling, exploits symbiosis,
    prioritizes high utilization tasks, uses dynamic
    resource sharing.
  • Require a lot of profiling
  • Two alternatives
  • Partitioning algorithm that utilizes static
    resource sharing
  • (PART-NONSYM-STAT)
  • Worse schedulability and somewhat more complex.
  • Provide a strict admission control and requires
    less profilng.
  • Earliest deadline first global algorithms
  • (GLOB-NONSYM-PLAIN)
  • Not providing strict admission control, but
    requires no profiling.

30
Conclusion (conclude)
  • Dynamic resource sharing is better than statis
    for schedulability
  • Partitioning algorithm can be made competitive
    with global scheduling algorithm, but with more
    complexity.
  • Symbiosis-awareness
  • beneficial for partitioning algorithms because
    they do not entirely ignore real-time constraint
  • Can hurt or help global scheduling algorithms,
    depending on the relative magnitude of the
    symbiosis factors and total utilization of the
    applications.

31
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com