Effect of Context Aware Scheduler on TLB - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

Effect of Context Aware Scheduler on TLB

Description:

priority bitmap and array of linked list of threads. Behavior. search priority bitmap and choose a thread with the highest priority. Scheduling overhead ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 57

Provided by: aleCsceK

Category:

more less

Transcript and Presenter's Notes

Title: Effect of Context Aware Scheduler on TLB

1
Effect of Context Aware Scheduler on TLB

Satoshi Yamada and Shigeru Kusakabe
Kyushu University

2
Contents

Introduction
Effect of Sibling Threads on TLB
Context Aware Scheduler (CAS)
Benchmark Applications and Measurement
Environment
Result
Related Work
Conclusion

3
Contents

Introduction
What is Context?
Motivation
Task Switch and Cache
Approach of our Scheduler
Effect of Sibling Threads on TLB
Context Aware Scheduler (CAS)
Benchmark Applications and Measurement
Environment
Result
Related Work
Conclusion

4
What is context ?

Definition in this presentation
Context Memory Address Space
Task switch
Context switch

5
Motivation

More chances of using native threads in OS today
Java, Perl, Python, Erlang, and Ruby
OpenMP, MPI
The more threads increase, the heavier the
overhead due to a task switch tends to get
Agarwal, et al. Cache performance of operating
system and multiprogramming workloads (1988)

6
Task Switch and Cache

Overhead due a task switch
includes that of loading a working set of next
process
is deeply related with the utilization of caches
Mogul, et al. The effect of of context switches
on cache performance (1991)

Working set of A
Working sets overflows the cache
Working set of B
Working set of A
Working set of B
Process B
Process A
Cache
7
Approach of our Scheduler

Three solutions to reduce the overhead due to
task switches
Agarwal, et al. Cache performance of operating
system and multiprogramming workloads (1988)
Increase the size of caches
Reuse the shared date among threads
Utilize tagged caches and/or restrain cache
flushes

We utilize sibling threads to achieve 2. and 3.
We mainly discuss on 3.
8
Contents

Introduction
Effect of Sibling Threads on TLB
Working Set and Task Switch
TLB tag and Task Switch
Advantage of Sibling Threads
Effect of Sibling Threads on Task Switches
Context Aware Scheduler (CAS)
Benchmark Applications and Measurement
Environment
Result
Related Work
Conclusion

9
Working Set and Task Switch

Task Switch with small overhead
Task Switch with large overhead

Process B
Process A
Working set of A
Working set of B
Process B
Process A
10
TLB and Task Switch
Tagged TLB
Non - Tagged TLB
2056
496
0x0123 0xc567 0x23ab 0xcea4 0x3614
0xc345 0x8a24 0xcacd
0x0123 0x0a67 0x23ab 0x0aa4 0x3614
0x0a45 0x8a24 0x0acd

Tagged TLB TLB flush is not necessary (ARM,
MIPS, etc)
Non-tagged TLB TLB flush is necessary(x86, etc)

11
Advantage of Sibling Threads
Parent
Parent
fork()
task_struct
task_struct
mm_struct
mm signal file . .
mm signal file . .
signal_struct
signal_struct
. .
create a THREAD
create a PROCESS

Advantage on task switches
Higher possibility of sharing data among sibling
threads
Context switch does not happen
Restrain TLB flushes in non-tagged TLB

12
Effect of Sibling Threads on Task
SwitchesMeasurement
We use the idea of lat_ctx program in LMbench
13
Effect of Sibling Threads on Task SwitchesResults
(sibling threads / process)
14
Contents

Introduction
Effect of Sibling Threads on TLB
Context Aware Scheduler (CAS)
O(1) Scheduler in Linux
Context Aware Scheduler (CAS)
Benchmark Applications and Measurement
Environment
Result
Related Work
Conclusion

15
O(1) Scheduler in Linux

Structure
active queue and expired queue
priority bitmap and array of linked list of
threads
Behavior
search priority bitmap and choose a thread with
the highest priority
Scheduling overhead
independent of the number of threads

bitmap
bitmap
high
A
1
1
B
C
1
0
0
1
D
0
0
low
0
0
active
expired
Processor
16
Context Aware Scheduler (CAS) (1/2)
regular O(1) scheduler runqueue
A
B
1
0
C
D
E
1
0

CAS creates auxiliary runqueues per context
CAS compares Preg and Paux
Preg the highest priority in regular O(1)
scheduler runqueue
Paux the highest priority in the auxiliary
runqueue
if Preg - Paux ? threshold, then we choose Paux

17
Context Aware Scheduler (CAS) (2/2)
regular O(1) scheduler runqueue
auxiliary runqueues per context
A
B
B
A
1
1
1
1
1
0
C
E
1
D
1
C
E
D
1
0
0
0
A
C
E
B
D
CAS with threshold 2
context switch1 time
O(1) scheduler
A
B
C
D
E
context switch4 times
18
Contents

Introduction
Effect of Sibling Threads on TLB
Context Aware Scheduler (CAS)
Benchmark Applications and Measurement
Environment
Measurement Environment
Benchmarks
Measurements
Scheduler
Result
Related Work
Conclusion

19
Measurement Environment

Intel Core 2 Duo 1.86 GHz

Spec of each memory hierarchy
20
Benchmarks
21
Measurements
Chat SysBench Volano DaCapo
DTLB and ITLB misses (user/kernel spaces)
Elapsed Time of executing 4 applications
22
Scheduler

O(1) scheduler in Linux 2.6.21
CAS
threshold 1
threshold 10

23
Contents

Introduction
Effect of Sibling Threads on TLB
Context Aware Scheduler (CAS)
Benchmark Applications and Measurement
Environment
Result
TLB misses
Process Time
Elapsed Time
Comparison between Completely Fair Scheduler
Related Work
Conclusion

24
TLB misses
(million times)
25
Why larger threshold better?
1
larger threshold can aggregate more
0
0
0
1
Dynamic priority works against small threshold
0
26
Process Time
(seconds)
27
Elapsed Time
(seconds)
28
Comparison between Completely Fair Scheduler (CFS)

What is CFS?
Introduced from Linux 2.6.23
Cut off the heuristic calculation of dynamic
priority
Not consider the address space in scheduling
Why compare?
Investigate if applying CAS into CFS is valuable
CAS idea can reduce TLB misses and process time
in CFS?

29
TLB misses
30
Process Time and Total Elapsed Time
(seconds)
31
Contents

Introduction
Effect of Sibling Threads on TLB
Context Aware Scheduler (CAS)
Benchmark Applications and Measurement
Environment
Result
Related Work
Conclusion

32
Sujay Parekh, et. al,Thread Sensitive
Scheduling for SMT Processors (2000)

Parekhs scheduler
tries groups of threads to execute in parallel
and sample the information about
IPC
TLB misses
L2 cache misses, etc
schedules on the information sampled

Sampling Phase
Scheduling Phase
Sampling Phase
Scheduling Phase
33
Contents

Introduction
Effect of Sibling Threads on TLB
Context Aware Scheduler (CAS)
Benchmark Applications and Measurement
Environment
Result
Related Work
Conclusion

34
Conclusion

Conclusion
CAS is effective in reducing TLB misses
CAS enhances the throughput of every application
Future Works
Evaluation on other architectures
Applying CAS into CFS scheduler
Extension to SMP platforms

35
additional slides
36
Effect of sibling threads on context switches
(counts)
37
Result of Cache Misses
(thousand times)
38
Result of Cache Misses
(thousand times)
39
Memory Consumption of CAS

Additional memory consumption of CAS
About 40 bytes per thread
About 150 K bytes per thread group
6 150 K 1700 40 970K

40
Effective and Ineffective Case of CAS

Effective case
Consecutive threads share certain amount of data
Ineffective case
Consecutive threads do not share data

cache
Working set of B
Working set of A
cache
Working set of B
Working set of A
41
Pranay Koka, et. al, Opportunities for Cache
Friendly Process (2005)

Kokas scheduler
traces the execution of each thread
puts the focus on the shared memory space between
threads

Tracing Phase
Scheduling Phase
Tracing Phase
Scheduling Phase
42
Extension to SMP

Aggregation into limited processors

CPU 0
CPU 1
43
Extension to SMP

Execute threads with the same address space in
parallel

CPU 0
CPU 1
44
TLB misses and Total Elapsed Time
45
(No Transcript)
46
widely spread multithreading
ThreadA ThreadB

Multithreading hides the latency of disk I/O and
network access
Threads in many languages, Java, Perl, and Python
correspond to OS threads

ThreadB waits
disk
More context switches happen today Process
scheduler in OS is more responsible for
the system performance
47
Context Aware (CA) scheduler
Our CA scheduler aggregates sibling threads
Linux O(1) scheduler CA scheduler
A
C
D
B
E
Context switches between processes3 times
A
C
D
B
E
Context switches between processes1 time
48
Results of Context Switch
(micro seconds)
Process C
Process A
2MB
L2 cache size 2MB
Process B
1MB
Cache
0
49
Overhead due to a context switch by lat_ctx in
LMbench
50
Fairness
bitmap
bitmap

O(1) scheduler keeps the fairness by epoch
cycles of active queue and expired queue
CA scheduler also follows epoch
guarantee the same level of fairness as O(1)
scheduler

A
1
1
B
C
1
0
1
1
D
0
0
0
0
active
expired
Processor 0
51
Influence of sibling threads on the overhead of
context switch
Ratio of each events (process / sibling threads)
52
Results of TLB misses (million times)