Title: Group Scheduling in System Software
1Group Schedulingin System Software
Michael Frisbie, Douglas Niehaus University of
Kansas niehaus_at_ittc.ku.edu Venkita Subramonian,
Christopher Gill Computer Science and
Engineering Washington University cdgill_at_cse.wustl
.edu
2 Motivation
- Real-time and embedded systems have widely
varying computation semantics - Varying policies for controlling specific
computations - Varying levels of focus for systems
- Single purpose embedded systems
- General purpose systems with a RT application
- Multi-media, machine control and user-interface
- No single policy is likely to be appropriate for
all
3 Motivation
- Computation in computer systems is not
exclusively done under the publicly exposed
scheduling model - OS computational components
- Interrupts, SoftIRQs, Tasklets, and Bottom halves
- Execute outside exposed scheduling policies
- Manifest as noise in the scheduling model
- Active middleware
- Linux PThreads library, TAO
- Manifest as competing load
4 Goals
- Highly configurable framework within which a wide
range of policies can be specified - Selection of predefined scheduling semantics
- Implementation of customized schedulers
- Application computation oriented representation
- Representation of all computation components on
system under scheduling framework - Current semantics available as default policies
- Requires some new types of information
5Platform
- Linux
- Growing popularity for real-time and embedded
- Middleware version for portability and range of
mechanism control - KU Real-Time Linux (KURT-Linux)
- OS computation components integration
- Interrupt handling modifications
- Number of related projects
- Data Streams Performance Evaluation Framework
- Ability to gather detailed scenario-oriented data
6Group Scheduling
- Application computation centric scheduling view
- Computations are implemented by a group of one or
more computation components - Threads, IRQ handlers, SoftIRQ, Tasklets, BH's
- Flexible framework for composing and configuring
the system scheduling decision function (SSDF) - SSDF chooses the computation component using the
CPU at any given time - Framework explicitly supports description of both
computations and relations among computations
7 Group Structure
- Group a set of computation components with an
associated Scheduling Decision Function (SDF) - Elements within a group can be threads, other
groups, or other computation components - Elements can belong to more than one group
- Scheduling decision tree (SDT) composed of one or
more groups - Control semantics for computation components
- SDT for computations are composed to form the
System Scheduling Decision Tree (SSDT)
8 System Scheduling Decision Tree (SSDT)
- Controls the system's computation components
- Explicitly or implicitly
- Ultimate goal is to make all of it explicit and
easily configurable across a wide semantic range - Can co-exist with the default system scheduler
- Semantic hooks
- SSDT invocation before default scheduler (DS)
- Method of making DS skip components under SSDT
control
9First-Refusal (FR) SSDT
- FR-SSDT uses a sequential SDF at the top level
- SDT controlling components under group scheduling
model has first refusal - Linux SDF (default scheduler) makes the decision
if no component under the group model should run - Exclusion of components from DS ensures precise
control as needed
10MLFQ SDT Example
- Top level priority SDF maintains the priority
equivalence class view - Each priority class is a group using a
round-robin policy to share the CPU among members - Dynamic priority adjustment of processes can move
them among priority classes
11Related Work
- Hierarchical Scheduling
- Regher and Stankovic (RTSS 2001)
- Likely computationally equivalent (capability)
- Distinguished by which abstractions are
emphasized - CPU Inheritance Scheduling
- Ford and Susarla (Flux Project)
- Group scheduling emphasizes
- Application structure reflected in groups
- Integration of all computation components
- Interrupts, tasklets, etc
12 Kernel Implementation
- Modifies the default Linux scheduler to permit
the GS framework to have a chance to choose - Hook to make default scheduler exclude a
component is the most subtle change - Changes to existing code to consult the exclusion
notation, rather than trying to remove the
component from the base data structure - Control for components other than threads is the
most significant feature for real-time systems
13Middleware Implementation
- Currently controls only threads at the user level
- Part of DARPA PCES2 project
- Layering on top of supplied Linux scheduler
requires indirect control through available
mechanisms - Separates managing and managed threads into
equivalence classes to determine CPU use - Uses Fixed Priority POSIX scheduling model as
implemented by Linux SCHED_FIFO
14Middleware Implementation
- Scheduler has two threads
- SSDT thread selects current thread
- API thread processes group operation requests
- Block Catcher detects when current thread blocks
- Signals SSDT thread
- Uses SIGSTOP and SIGCONT to control availability
of thread for execution - Model is incomplete because it cannot know when a
previously blocked thread becomes unblocked
15Thread Priority Classes
- Reaper spawns scheduler and then blocks
- Scheduler SSDT thread chooses current thread
- API thread processes group operation requests
- Block Catcher detects current thread block,
signals SSDT - Non-current threads are both at lower priority
and SIGSTOP - Linux threads at level 0
16Context Switch Event Sequence
- Thread A is current thread
- Timer or other event blocks or pre-empts Thread
A - Scheduling Thread runs and selects Thread B,
blocks in nanosleep - Context switch to Thread B begins its execution
17Middleware Implementation Tradeoffs
- Portable standards based implementation
- POSIX fixed priority scheduling
- Socket based group API access
- Significantly greater context switch delay
compared to existing kernel based implementation - SSDT thread context switch and Block Catcher as
well if current thread blocks - Most significant need is SSDT thread notification
that a threads unblocks - Scheduler Activations
18 Performance Evaluation
- Metric
- Scheduling overhead
- Context switch latency (A to B)
- Parameters
- Number of Processes
- CPU bound or I/O bound
- User/Kernel Implementation
- Others
- Signal delivery details and semantics
19Context Switch Overhead - Kernel
20 Kernel Performance
- Constant with respect to CPU or I/O bound
- Considerably lower than MW version
- Simple SSDT
- Does not require signal delivery
21Context Switch Overhead Compute Bound MW
22Context Switch Overhead Blocking MW
23 Middleware Performance
- Different with respect to CPU or I/O bound
- Requires signal delivery
- Block Catcher mechanism adds latency
- Considerably higher than kernel version
- Simple SSDT
- Some extension to existing system semantics
required for completeness - Unblocking notification upcall
24Group Scheduling Summary
- Provides a flexible control framework
- Within which resource control and
- Distributed end-to-end scheduling constraints
can be expressed and enforced - Portable middleware version
- Limited by lack of unblocking notification upcall
- Implementation under KURT-Linux is simple
- ACE system call wrappers
- VxWorks threads state change notification
25Current Status
- Integration of all KURT-Linux OS computational
components under group scheduling framework - Recently completed
- Michael Frisbies Masters Thesis topic
- We are currently working on Group Scheduling
control of service classes in - Event Channel
- TAO based computations
- Includes control of middleware threads, queues,
etc.
26Future Work
- Middleware use of group scheduling to provide
support for service classes in Event Channel and
TAO - Concurrency constraint representation in
KURT-Linux to permit fine grain computation
component control under group scheduling - Experimentation with application aware scheduling
decision functions - Integrated DSKI/DSUI instrumentation to
diagnose/deduce scheduling-related optimizations
and fine-grain points of inefficiency (cruft
sleuthing)