Capriccio: Scalable Threads for Internet Services - PowerPoint PPT Presentation

About This Presentation

Title:

Capriccio: Scalable Threads for Internet Services

Description:

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 32

Provided by: c2356

Learn more at: https://users.cs.northwestern.edu

Category:

more less

Transcript and Presenter's Notes

Title: Capriccio: Scalable Threads for Internet Services

1
Capriccio Scalable Threads for Internet Services

Rob von Behren, Jeremy Condit, Feng Zhou, Geroge
Necula and Eric Brewer
University of California at Berkeley
Presenter Olusanya Soyannwo

2
Outline

Motivation
Background
Goals
Approach
Experiments
Results
Related work
Conclusion Future work

EECS Advanced Operating Systems
Northwestern University
3
Motivation

Increasing scalability demands for Internet
services
Hardware improvements are limited by existing
software
Current implementations are event based

EECS Advanced Operating Systems
Northwestern University
4
Background Event Based Systems - Drawbacks

Events systems hide the control flow
Difficult to understand and debug
Programmers need to match related events
Burdens programmers

EECS Advanced Operating Systems
Northwestern University
5
Goals Capriccio

Support for existing thread API
Scalability to hundreds of thousands of threads
Automate application-specific customization

EECS Advanced Operating Systems
Northwestern University
6
Approach Capriccio

Thread package
Cooperative scheduling
Linked stacks
Address the problem of stack allocation for large
numbers of threads
Combination of compile-time and run-time analysis
Resource-aware scheduler

EECS Advanced Operating Systems
Northwestern University
7
Approach User Level Thread The Choice

POSIX API
(-)Complex preemption
(-)Bad interaction with Kernel scheduler
Performance
Ease thread synchronization overhead
No kernel crossing for preemptive threading
More efficient memory management at user level
Flexibility
Decoupling user and kernel threads allows faster
innovation
Can use new kernel thread features without
changing application code
Scheduler tailored for applications

EECS Advanced Operating Systems
Northwestern University
8
Approach User Level Thread Disadvantages

Additional Overhead
Replacing blocking calls with non-blocking calls
Multiple CPU synchronization

EECS Advanced Operating Systems
Northwestern University
9
Approach User Level Thread Implementation

Context Switches
Built on top of Edgar Toernigs coroutine library
Fast context switches when threads voluntarily
yield
I/O
Capriccio intercepts blocking I/O calls
Uses epoll for asynchronous I/O
Scheduling
Very much like an event-driven application
Events are hidden from programmers
Synchronization
Supports cooperative threading on single-CPU
machines
Requires only Boolean checks

EECS Advanced Operating Systems
Northwestern University
10
Approach Linked Stack
Fixed Stacks

The problem fixed stacks
Overflow vs. wasted space
Limits thread numbers
The solution linked stacks
Allocate space as needed
Compiler analysis
Add runtime checkpoints
Guarantee enough space until next check

Linked Stack
EECS Advanced Operating Systems
Northwestern University
11
Approach Linked Stack

Parameters
MaxPath
MinChunk
Steps
Break cycles
Trace back
Special Cases
Function pointers
External calls

3
3
5
2
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
12
Approach Linked Stack

Parameters
MaxPath
MinChunk
Steps
Break cycles
Trace back
Special Cases
Function pointers
External calls

3
3
5
2
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
13
Approach Linked Stack

Parameters
MaxPath
MinChunk
Steps
Break cycles
Trace back
Special Cases
Function pointers
External calls

3
3
5
2
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
14
Approach Linked Stack

Parameters
MaxPath
MinChunk
Steps
Break cycles
Trace back
Special Cases
Function pointers
External calls

3
3
5
2
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
15
Approach Linked Stack

Parameters
MaxPath
MinChunk
Steps
Break cycles
Trace back
Special Cases
Function pointers
External calls

3
3
2
3
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
16
Approach Scheduling

Advantages of event-based scheduling
Tailored for applications
With event handlers
Events provide two important pieces of
information for scheduling
Whether a process is close to completion
Whether a system is overloaded

EECS Advanced Operating Systems
Northwestern University
17
Approach Scheduling -The Blocking Graph
Write
Read
Sleep
Close
Write
Threadcreate
Main

Thread-based
View applications as sequence of stages,
separated by blocking calls
Analogous to event-based scheduler

EECS Advanced Operating Systems
Northwestern University
18
Approach Resource-aware Scheduling

Track resources used along BG edges
Memory, file descriptors, CPU
Predict future from the past
Algorithm
Increase use when underutilized
Decrease use near saturation
Advantages
Operate near the knee w/o thrashing
Automatic admission control

EECS Advanced Operating Systems
Northwestern University
19
Experiment Threading Microbenchmarks

SMP, two 2.4 GHz Xeon processors
1 GB memory
two 10 K RPM SCSI Ultra II hard drives
Linux 2.5.70
Compared Capriccio, LinuxThreads, and Native
POSIX Threads for Linux

EECS Advanced Operating Systems
Northwestern University
20
Experiment Thread Scalability

Producer-consumer microbenchmark
LinuxThreads begin to degrade after 20 threads
NPTL degrades after 100
Capriccio scales to 32K producers and consumers
(64K threads total)

EECS Advanced Operating Systems
Northwestern University
21
Results Thread Primitive - Latency
Capriccio LinuxThreads NPTL
Thread creation 21.5 21.5 17.7
Thread context switch 0.24 0.71 0.65
Uncontended mutex lock 0.04 0.14 0.15
EECS Advanced Operating Systems
Northwestern University
22
Results Thread Scalability
EECS Advanced Operating Systems
Northwestern University
23
Results I/O performance

Network performance
Token passing among pipes
Simulates the effect of slow client links
10 overhead compared to epoll
Twice as fast as both LinuxThreads and NPTL when
more than 1000 threads
Disk I/O comparable to kernel threads

EECS Advanced Operating Systems
Northwestern University
24
Results Runtime Overhead

Tested Apache 2.0.44
Stack linking
73 slowdown for null call
3-4 overall
Resource statistics
2 (on all the time)
0.1 (with sampling)
Stack traces
8 overhead

EECS Advanced Operating Systems
Northwestern University
25
Results Web Server Performance
EECS Advanced Operating Systems
Northwestern University
26
Related Work