RealTime Scheduling Analysis for Multiprocessor Platforms - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

RealTime Scheduling Analysis for Multiprocessor Platforms

Description:

Systematization of existing results for RT scheduling and schedulability analysis on MP ... Tilera's TILE64: 64-core. Nios II: x soft Cores ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 71
Provided by: retis
Category:

less

Transcript and Presenter's Notes

Title: RealTime Scheduling Analysis for Multiprocessor Platforms


1
Real-Time Scheduling AnalysisforMultiprocessor
Platforms
  • Marko Bertogna
  • PhD dissertation
  • Scuola Superiore S.Anna,
  • Pisa, Italy

2
Overview
  • The Multicore Revolution
  • Real-Time Multiprocessor Systems existing
    results
  • Schedulability Analysis for global schedulers
  • Experimental evaluation
  • Conclusions
  • Other research activities

3
Main Contributions
  • Systematization of existing results for RT
    scheduling and schedulability analysis on MP
  • Polynomial and pseudo-polynomial schedulability
    tests for
  • Work-conserving schedulers
  • FP
  • EDF
  • EDZL
  • Experimental comparison of existing techniques

4
Real-Time Systems
  • Solid theory of single processor systems
  • Optimal schedulers, tight schedulability tests,
    shared resource protocols, bandwidth reservation
    schemes, hierarchical schedulers, OS, etc.
  • Much less results for multiprocessors
  • Many NP-hard problems, few optimal results,
    heuristic approaches, simplified task models,
    only sufficient schedulability tests, etc.
  • Do we really need to investigate Multi-Processors
    Real-Time Systems?

5
As Moores law goes on
  • Number of transistor/chip doubles every 18 to 24
    mm months

6
heating becomes a problem
  • P ? V ? f Clock speed limited to less than 4 GHz

7
Solution
Use a higher number of slower logic gates
Denser chips with transistor operating at lower
frequencies
MULTICORE SYSTEMS
8
The Multicore invasion
  • Intels Core2, Itanium, Xeon 2, 4 cores
  • AMDs Opteron, Athlon 64 X2, Phenom 2, 4 cores
  • IBM-Toshiba-Sony Cell processor 8 cores (PSX3)
  • Microsofts Xenon 3 cores (Xbox 360)
  • ARMs MPCore 4 cores
  • Suns Niagara UltraSPARC 8 cores
  • Tileras TILE64 64-core
  • Nios II x soft Cores
  • TI, Freescale, Atmel, Broadcom,Picochip
    (picoArray up to 300 DSP cores), ...

9
Identical vs heterogenous cores
ARMs MPCore
STIs Cell Processor
  • One Power Processor Element (PPE)
  • 8 Synergistic Processing Element (SPE)
  • 4 identical ARMv6 cores

10
System model
  • Platform with m identical processors
  • Task set t with n periodic or sporadic tasks ti
  • Period or minimum inter-arrival time Ti
  • Worst-case execution time Ci
  • Deadline Di
  • Utilization UiCi/Ti, density liCi/min(Di,Ti)

11
Problems addressed
  • Run-time scheduling problem
  • Schedulability problem

CPU1
t1
?
t2
CPU2
t3
t4
CPU3
t5
12
Assumptions
  • Independent tasks
  • Job-level parallelism prohibited
  • the same job cannot be contemporarily executed on
    more than one processor
  • Preemption and Migration support
  • a preempted task can resume its execution on a
    different processor
  • Cost of preemption/migration integrated into task
    WCET

13
Global vs partitioned scheduling
  • Single system-wide queue or multiple
    per-processor queues

Global scheduler
Partitioned scheduler
14
Partitioned Scheduling
  • The scheduling problem reduces to
  • Global (work-conserving) and partitioned
    approaches are incomparable

Uniprocessor scheduling problem
Bin-packing problem

t1
t3
t5
t2
t4
NP-hard in the strong sense
Well known
EDF Utot 1
RM (RTA)
...
Various heuristics used FF, NF, BF, FFDU, BFDD,
etc.
15
Global scheduling
  • The m highest priority ready jobs are always the
    one executing
  • Work-conserving scheduler
  • No processor is ever idled when a task is ready
    to execute.

16
Global scheduling advantages
  • Load automatically balanced
  • Easier re-scheduling (dynamic loads, selective
    shutdown, etc.)
  • Lower average response time (see queueing theory)
  • More efficient reclaiming and overload management
  • Number of preemptions
  • Migration cost can be mitigated by proper HW
    (e.g., MPCores Direct Data Intervention)
  • Few schedulability tests ? Further research needed

17
Uniprocessor scheduling
  • EDF optimal for arbitrary job collections
  • Exact schedulability conditions
  • linear test for implicit deadlines Utot 1
  • Pseudo-polynomial test for constrained and
    arbitrary deadlines Baruah et al. 90
  • Optimal priority assignments for sporadic and
    synchronous periodic task systems
  • RM for implicit deadlines
  • DM for constrained deadlines
  • Exact pseudo-polynomial schedulability test for
    FP
  • Response Time Analysis (RTA)

18
Global Scheduling
  • No optimal scheduler known for general task
    models
  • Pfair optimal for implicit deadlines Utot m
  • preemption and synchronization issues
  • Classic schedulers are not optimal (Dhalls
    effect)
  • Hybrid schedulers EDF-US, RM-US, DM-DS,
    AdaptiveTkC, fpEDF, EDF(k), EDZL,

m light tasks 1 heavy task Utot?1
19
Global scheduling main results
  • Only sufficient schedulability tests
  • Utilization-based tests (implicit deadlines)
  • EDF ? Goossens et al. Utot m(1-Umax)Umax
  • fpEDF ? Baruah Utot (m1)/2
  • RM-US ? Andersson et al. Utot m2/(3m-2)
  • Polynomial tests
  • EDF, FP ? Baker O(n2) and O(n3) tests
  • EDZL ? Cirinei,Baker O(n2) test
  • Pseudo-polynomial tests
  • EDF, FP ? Fisher,Baruah load-based tests

20
Density-based tests
  • EDF ltot m(1-lmax)lmax
  • EDF-DS1/2 ltot (m1)/2
  • DM ltot m(1lmax)/2lmax
  • DM-DS1/3 ltot (m1)/3

ECRTS05
Gives highest priority to (at most m-1) tasks
having lt 1/2, and schedules the remaining ones
with EDF
OPODIS05
Gives highest priority to (at most m-1) tasks
having lt 1/3, and schedules the remaining ones
with DM (only constrained deadlines)
21
Critical instant
  • A particular configuration of releases that leads
    to the largest possible response time of a task.
  • Possible to derive exact schedulability tests
    analyzing just the critical instant situation.
  • Uniprocessor FP and EDF a critical instant is
    when
  • all tasks arrive synchronously
  • all jobs are released as soon as permitted
  • Response Time Analysis for uniprocessors
  • FP ? the response time of task k is given by the
    fixed point of Rk in the iteration

22
Multiprocessor anomaly
  • Synchronous periodic arrival of jobs is not a
    critical instant for multiprocessors

t1 (1,1,2) t2 (1,1,3) t3 (5,6,6)
Synchronous periodic situation
Second job of t2 delayed by one unit
from Bar07
Need to find pessimistic situations to derive
sufficient schedulability tests
23
Introducing the interference
Ik Total interference suffered by task tk
Iki Interference of task ti on task tk
Ik1
Ik6
Ik3
Ik3
CPU3
tk
Ik2
Ik2
Ik5
Ik5
CPU2
tk
tk
Ik3
Ik4
Ik8
Ik7
CPU1
rkRk
rk
24
Limiting the interference
It is sufficient to consider at most the portion
(Rk-Ck1) of each term Iik in the sum
Ik1
Ik6
Ik3
Ik3
CPU3
tk
Ik2
Ik2
Ik5
Ik5
CPU2
tk
tk
Ik3
Ik4
Ik8
Ik7
CPU1
rkRk
rk
It can be proved that WCRTk is given by the fixed
point of
25
Bounding the interference
  • Exactly computing the interference is complex
  • Pessimistic assumptions
  • Bound the interference of a task with the
    workload
  • Use an upper bound on the workload.

26
Bounding the workload
  • Consider a situation in which
  • The first job executes as close as possible to
    its deadline
  • Successive jobs execute as soon as possible

( jobs excluded the last one)
where
(last job)
27
RTA for generic global schedulers
  • An upper bound on the WCRT of task k is given by
    the fixed point of Rk in the iteration
  • The slack of task k is at least

Rk
Sk
28
Improvement using slack values
  • Consider a situation in which
  • The first job executes as close as possible to
    its deadline
  • Successive jobs execute as soon as possible

( jobs excluded the last one)
where
(last job)
29
Improvement using slack values
  • Consider a situation in which
  • The first job executes as close as possible to
    its deadline
  • Successive jobs execute as soon as possible

where
30
RTA for generic global schedulers
  • An upper bound on the WCRT of task k is given by
    the fixed point of Rk in the iteration

1.
2.
If a fixed point Rk Dk is reached for every
task k in the system, the task set is schedulable
with any work-conserving global scheduler.
31
Iterative schedulability test
  • All slacks initialized to zero
  • Compute slack lower bound for tasks 1,,n
  • if higher than old value ? update slack bound
  • If lower, do nothing
  • If all tasks have a positive slack lower bound ?
    return success
  • If no slack has been updated for tasks 1,,n ?
    return fail
  • Otherwise, return to point 2

32
RTA refinement for Fixed Priority
  • The interference on higher priority tasks is
    always null
  • An upper bound on the WCRT of task k can be given
    by the fixed point of Rk in the iteration

1.
2.
33
RTA refinement for EDF
  • A different bound can be derived analyzing the
    worst-case workload in a situation in which
  • The interfering and interfered tasks have a
    common deadline
  • All jobs execute as late as possible
  • An upper bound on the WCRT of task k is given by
    the fixed point of Rk in the iteration

1.
2.
34
Complexity
  • Pseudo-polynomial complexity
  • Fast average behavior
  • We verified the schedulability of millions of
    task sets in a few minutes on a normal device.
  • Lower complexity for Fixed Priority systems
  • at most one slack update per task, if slacks are
    updated in decreasing priority order.
  • Possible to reduce complexity limiting the number
    of rounds

35
Polynomial complexity test
  • A simpler test can be derived avoiding the
    iterations on the response times
  • A lower bound on the slack of tk is given by
  • The iteration on the slack values is the same
  • Performances comparable to RTA-based test
  • Complexity down to O(n2)

36
Experimental results for EDF
  • 2 processors
  • Constrained
  • deadlines
  • 1.000.000
  • task sets
  • generated
  • Our test is
  • constantly
  • superior at all
  • utilizations

Total task sets
generated task sets
task sets
I-BCL EDF
Goossens et al.03
Baker et al.07
Bertogna et al.05
our test
Improvement over existing solutions
Task set utilization
37
Experimental results for FP
Total task sets
generated task sets
task sets
  • 2 processors
  • Constrained
  • deadlines
  • 1.000.000
  • task sets
  • generated
  • Our test is
  • constantly
  • superior at all
  • utilizations

I-BCL FP
Bertogna et al.05
Baker et al.07
Density bound
our test
Task set utilization
38
FP vs EDF
  • 4 processors
  • Constrained
  • deadlines
  • 1.000.000
  • task sets
  • generated
  • our FP test is
  • constantly
  • superior to all
  • tests at every
  • utilization

generated task sets
Total task sets
task sets
I-BCL FP
Baker et al.07
I-BCL EDF
Goossens et al.03
our FP test
our EDF test
Task set utilization
39
Conclusions
  • Multiprocessor Real-Time systems are a promising
    field to explore.
  • Still few existing results far from tight
    conditions.
  • We contributed filling this gap.
  • Future work
  • Find tighter schedulability tests.
  • Use our techniques to analyze the efficiency of
    other scheduling algorithms (EDZL, EDF-US, FP-DS,
    etc).
  • Take into account exclusive resources access.
  • Integrate into Resource Reservation framework.

40
The end
41
Other research activities
  • Limited-preemption EDF
  • Reducing Resource Holding Times
  • Shared resources and open environments

42
ARMs MPcore
43
Frequency and power
  • f operating frequency
  • V supply voltage (V0.30.7 f)
  • Reducing the voltage causes a higher frequency
    reduction
  • Ileak leakage current (becomes non-negligible)
  • P Pdynamic Pstatic power consumed
  • Pdynamic ? ACV2f (main contributor until hundreds
    nm)
  • Pstatic ? VIleak (always present, due to
    subthreshold and gate-oxide leakage)
  • Reducing V allows a quadratic reduction of
    Pdynamic

44
Power density
45
How many cores in the future?
  • Intels 80 core prototype already available
  • Able to transfers a TB of data/s (Core 2 Duo
    reaches 1.66GB data/s)
  • To be released in 5 years

46
Beyond 2 billion transistors/chip
  • Intels Tukwila
  • Itanium based
  • 2.046 B FET
  • Quad-core
  • 65 nm technology
  • 2 GHz on 170W
  • 30 MB cache
  • 2 SMT ? 8 threads/ck

47
Intels timeline
48
  • From 4004 (1971) to Pentium D (2005)
  • Tech 10 um ? 65 nm 150 x
  • f 100kHz ? 3 GHz 25000 x
  • MOS 2.300?291.000.000 125.000 x
  • P 0.2W?100W 500 x
  • Vdd reduced (from 5V to 1V)
  • Not all MOS change state
  • Great part of chip occupied by cache
  • f ? Vdd-Vtt
  • Ileak ? Vdd, 1/Vtt

49
Intel Pentium IV (2000)
Intel 4004 (1971)
50
Itanium temperature plot
51
Problems addressed
  • Run-time scheduling problem
  • Schedulability problem

t1
CPU1
?
t2
CPU2
t3
t4
CPU3
t5
52
  • Incandescent light bulb 25-100 W
  • Compact fluorescent lights 5-30 W
  • Typical car 25 kW
  • Human climbing stairs 200 W
  • 1 kWh 1 kW constantly supplied for 1 h
  • ENEL 0.13-0.18 /kWh

53
Density and utilization bounds
54
Uniprocessor feasibility
55
Uniprocessor static priority run-time scheduling
56
Uniprocessor static priority feasibility
57
Uniprocessor static priority schedulability
58
Multiprocessor feasibility
59
Multiprocessor run-time scheduling
60
Feasibility conditions
Utot gt m
Not feasible
load gt m
load gt m
???
Sufficient feasibility and schedulability tests
Feasible
Si Ci /min(Di,Ti) m
61
Multiprocessor static job priority feasibility
62
Multiprocessor static job priority schedulability
63
Multiprocessor static priority run-time scheduling
64
Multiprocessor static priority feasibility
65
Multiprocessor static priority schedulability
66
RTA for Uniprocessors
  • For FP, the worst-case response time of a task is
    given by the first instance released at a
    critical instant
  • For EDF, it is given by an instance in a busy
    interval starting with a critical instant
  • With these observations it is possible to compute
    the WCRT of all tasks. Example for FP, the WCRT
    of a task k is given by the fixed point of

67
RTA refinement for EDF
  • Still valid the bound
  • A different bound can be derived analyzing the
    worst-case workload in a situation in which
  • The interfering and interfered tasks have a
    common deadline
  • All jobs execute as late as possible

Di
Ti
Si
Ci
Ci
Ci
Dk
68
RTA refinement for EDF
  • A different bound can be derived analyzing the
    worst-case workload in a situation in which
  • The interfering and interfered tasks have a
    common deadline
  • All jobs execute as late as possible

Di
Ti
Si
Ci
Ci
Ci
Dk
with
and
69
Polynomial complexity test
  • A lower bound on the slack of tk is given by
  • For EDF
  • For FP

70
Limiting the number of iterations
Write a Comment
User Comments (0)
About PowerShow.com