Title: Capriccio: Scalable Threads for Internet Services
1Capriccio Scalable Threads for Internet Services
- Rob von Behren, Jeremy Condit, Feng Zhou, Geroge
Necula and Eric Brewer - University of California at Berkeley
- Presenter Olusanya Soyannwo
2Outline
- Motivation
- Background
- Goals
- Approach
- Experiments
- Results
- Related work
- Conclusion Future work
EECS Advanced Operating Systems
Northwestern University
3Motivation
- Increasing scalability demands for Internet
services - Hardware improvements are limited by existing
software - Current implementations are event based
EECS Advanced Operating Systems
Northwestern University
4Background Event Based Systems - Drawbacks
- Events systems hide the control flow
- Difficult to understand and debug
- Programmers need to match related events
- Burdens programmers
EECS Advanced Operating Systems
Northwestern University
5Goals Capriccio
- Support for existing thread API
- Scalability to hundreds of thousands of threads
- Automate application-specific customization
EECS Advanced Operating Systems
Northwestern University
6Approach Capriccio
- Thread package
- Cooperative scheduling
- Linked stacks
- Address the problem of stack allocation for large
numbers of threads - Combination of compile-time and run-time analysis
- Resource-aware scheduler
EECS Advanced Operating Systems
Northwestern University
7Approach User Level Thread The Choice
- POSIX API
- (-)Complex preemption
- (-)Bad interaction with Kernel scheduler
- Performance
- Ease thread synchronization overhead
- No kernel crossing for preemptive threading
- More efficient memory management at user level
- Flexibility
- Decoupling user and kernel threads allows faster
innovation - Can use new kernel thread features without
changing application code - Scheduler tailored for applications
EECS Advanced Operating Systems
Northwestern University
8Approach User Level Thread Disadvantages
- Additional Overhead
- Replacing blocking calls with non-blocking calls
- Multiple CPU synchronization
EECS Advanced Operating Systems
Northwestern University
9Approach User Level Thread Implementation
- Context Switches
- Built on top of Edgar Toernigs coroutine library
- Fast context switches when threads voluntarily
yield - I/O
- Capriccio intercepts blocking I/O calls
- Uses epoll for asynchronous I/O
- Scheduling
- Very much like an event-driven application
- Events are hidden from programmers
- Synchronization
- Supports cooperative threading on single-CPU
machines - Requires only Boolean checks
EECS Advanced Operating Systems
Northwestern University
10Approach Linked Stack
Fixed Stacks
- The problem fixed stacks
- Overflow vs. wasted space
- Limits thread numbers
- The solution linked stacks
- Allocate space as needed
- Compiler analysis
- Add runtime checkpoints
- Guarantee enough space until next check
Linked Stack
EECS Advanced Operating Systems
Northwestern University
11Approach Linked Stack
- Parameters
- MaxPath
- MinChunk
- Steps
- Break cycles
- Trace back
- Special Cases
- Function pointers
- External calls
3
3
5
2
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
12Approach Linked Stack
- Parameters
- MaxPath
- MinChunk
- Steps
- Break cycles
- Trace back
- Special Cases
- Function pointers
- External calls
3
3
5
2
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
13Approach Linked Stack
- Parameters
- MaxPath
- MinChunk
- Steps
- Break cycles
- Trace back
- Special Cases
- Function pointers
- External calls
3
3
5
2
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
14Approach Linked Stack
- Parameters
- MaxPath
- MinChunk
- Steps
- Break cycles
- Trace back
- Special Cases
- Function pointers
- External calls
3
3
5
2
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
15Approach Linked Stack
- Parameters
- MaxPath
- MinChunk
- Steps
- Break cycles
- Trace back
- Special Cases
- Function pointers
- External calls
3
3
2
3
2
4
3
6
MaxPath 8
EECS Advanced Operating Systems
Northwestern University
16Approach Scheduling
- Advantages of event-based scheduling
- Tailored for applications
- With event handlers
- Events provide two important pieces of
information for scheduling - Whether a process is close to completion
- Whether a system is overloaded
EECS Advanced Operating Systems
Northwestern University
17Approach Scheduling -The Blocking Graph
Write
Read
Sleep
Close
Write
Threadcreate
Main
- Thread-based
- View applications as sequence of stages,
separated by blocking calls - Analogous to event-based scheduler
EECS Advanced Operating Systems
Northwestern University
18Approach Resource-aware Scheduling
- Track resources used along BG edges
- Memory, file descriptors, CPU
- Predict future from the past
- Algorithm
- Increase use when underutilized
- Decrease use near saturation
- Advantages
- Operate near the knee w/o thrashing
- Automatic admission control
EECS Advanced Operating Systems
Northwestern University
19Experiment Threading Microbenchmarks
- SMP, two 2.4 GHz Xeon processors
- 1 GB memory
- two 10 K RPM SCSI Ultra II hard drives
- Linux 2.5.70
- Compared Capriccio, LinuxThreads, and Native
POSIX Threads for Linux
EECS Advanced Operating Systems
Northwestern University
20Experiment Thread Scalability
- Producer-consumer microbenchmark
- LinuxThreads begin to degrade after 20 threads
- NPTL degrades after 100
- Capriccio scales to 32K producers and consumers
(64K threads total)
EECS Advanced Operating Systems
Northwestern University
21Results Thread Primitive - Latency
Capriccio LinuxThreads NPTL
Thread creation 21.5 21.5 17.7
Thread context switch 0.24 0.71 0.65
Uncontended mutex lock 0.04 0.14 0.15
EECS Advanced Operating Systems
Northwestern University
22Results Thread Scalability
EECS Advanced Operating Systems
Northwestern University
23Results I/O performance
- Network performance
- Token passing among pipes
- Simulates the effect of slow client links
- 10 overhead compared to epoll
- Twice as fast as both LinuxThreads and NPTL when
more than 1000 threads - Disk I/O comparable to kernel threads
EECS Advanced Operating Systems
Northwestern University
24Results Runtime Overhead
- Tested Apache 2.0.44
- Stack linking
- 73 slowdown for null call
- 3-4 overall
- Resource statistics
- 2 (on all the time)
- 0.1 (with sampling)
- Stack traces
- 8 overhead
EECS Advanced Operating Systems
Northwestern University
25Results Web Server Performance
EECS Advanced Operating Systems
Northwestern University
26Related Work
- Programming Model of high concurrency
- Event based models are a result of poor thread
implementations - User-Level Threads
- Capriccio is unique
- Kernel Threads
- NPTL
- Application Specific Optimization
- SPIN Exokernel
- Burden on programmers
- Portability
- Asynchronous I/O
- Stack Management
- Using heap requires a garbage collector (ML of
NJ)
EECS Advanced Operating Systems
Northwestern University
27Related Work (contd)
- Resource Aware Scheduling
- Several similar to capriccio
28Future Work
- Threading
- Multi-CPU support
- Kernel interface
- (enabled) Compile-time techniques
- Variations on linked stacks
- Static blocking graph
- Scheduling
- More sophisticated prediction
EECS Advanced Operating Systems
Northwestern University
29Conclusion
- Capriccio simplifies high concurrency
- Scalable high performance
- Control over concurrency model
- Stack safety
- Resource-aware scheduling
- Enables compiler support, invariants
- Issues
- Additional burden to programmer
- Resource controlled sched.? What hysteresis?
EECS Advanced Operating Systems
Northwestern University
30OTHER GRAPHS
31OTHER GRAPHS