Title: A languagebased technique to combine events and threads
1A language-based technique to combine events and
threads
- Peng Li, Steve Zdancewic
- University of Pennsylvania
2Outline
- Motivation and challenge
- Implementation
- Programming interfaces
- Discussion
3The C10K problemhttp//www.kegel.com/c10k.html
- Problem one server ? 10,000 clients
- Network servers, peer-to-peer systems
- Where is the bottleneck?
- Hardware is powerful enough
- Steady network transfers 1G bps / 10K clients
100K bps per client - It is on the software
- Programming with 10K clients is not a trivial
task!
4Programming models for C10Kthreads vs. events
- The multithreaded approach
- One server thread ? one client
- Blocking I/O (usually)
- OS and runtime library take care of scheduling
- The event-driven approach
- One server thread ? many clients
- Non-blocking/asynchronous I/O
- Programmer manages to interleave computation for
multiple clients
5The debate over threads vs. events
- Why threads are a bad idea USENIX ATC 1999
- Why events are a bad idea HotOS 2003
- Ease of programming threads win
- Threads provides good abstraction
- Event-driven programming is difficult
- Program in the continuation-passing style (CPS)
- Performance
- Events win (minimal overhead, async IO)
- (Cooperative) Threads can also be very
lightweight - Requires substantial engineering under the hood
(OS / compiler support) - Flexibility events win
- Event-driven systems give programmers more
control - Scheduling and I/O functionality tailored to
applications needs - Thread schedulers are difficult to customize
- Implemented in separate libraries/kernel address
spaces - Uses unsafe, low-level hacks
6The hybrid concurrency modelconcepts
- A hybrid approach
- At the high level threads
- Cheap, cooperative threads
- One cheap thread ? one client
- Blocking I/O
- Internally
- Cheap thread represented in CPS
- Thread continuations can be used as event
handlers - The I/O schedulers events
- Event-driven system
- Nonblocking/Asynchronous I/O
7The hybrid concurrency modeldesign goals
User Application
- A uniform programming
- environment
- Same language
- Same address space
- Shared data structures
- Shared libraries
- Compiled and linked together
Operating System
8Challenge providing abstractions inside the
application
- The hybrid concurrency model
- Abstractions are inside the
- user application
- Abstraction mechanism
- Programming-language level abstractions
- Lightweight, transparent to programmer
- In most multithreading systems
- Thread abstraction is outside the
- user application
- Abstraction mechanisms
- OS, VM, compiler transformations, etc.
- Heavyweight, opaque to programmer
Application withthreads
Thread abstraction
Runtime libraries and OS
OS
9Outline
- Motivation and challenge
- Implementation
- Programming interfaces
- Discussion
10Implementing the unified concurrency model
- A poor mans concurrency monad, by Koen
Claessen, ICFP 1999
Monadsprovides an embedded language with thread
primitives
Higher-order functions represent computation
internally in CPS
Lazy data structures implement inversion of
control
11Basic concept system calls
- Primitives for the cheap cooperative threads
- Thread control primitives fork, yield, stop
- I/O primitives read, write, readiness
notification
12Basic concept trace
- Trace a tree of system calls
- Generated at run time, can have infinite size
- A system call creates a trace node
- sys_fork creates a branching node SYS_FORK
13Representing traces in Haskell
- A lazy tree structure
- Potentially infinite size
- Nodes are computed only when needed
- Provides the event abstraction
- Lazy computation provides control inversion
14How stuff works (conceptually)
- Threads create the trace
- Scheduler (event loops) consumes the trace
SYS_NBIO(write_nb)
15Summary so far
Monadsprovides an embedded language with thread
primitives
Higher-order functions represent computation
internally in CPS
Lazy data structure traces (implement inversion
of control)
16Next
Monadsprovides an embedded language with thread
primitives
Higher-order functions represent computation
internally in CPS
Lazy data structure traces (implement inversion
of control)
17How the cheap threads looks like
Nested function calls
Exception handling
System call
Conditional branches
Function call to I/O lib
Recursion
18Designing the thread language
- An embedded language in Haskell
- Syntax
- Standard control-flow primitives
- Sequential composition, branches, loops, nested
functional calls, exceptions - System calls
- fork, yield, I/O,
- Semantics
- Thread programs produce traces!
19Problem composition of traces
- Naive ideas
- Each system call generates a trace node
- A threaded computation generates a trace
- Technical challenge how to sequentially combine
two traces? - Example programdo read_request
write_response - Doesnt work out easily!
20The composible designContinuation-Passing Style
(CPS)
- Continuation-passing style
- A threaded computation of type a is represented
using a function of type (a-gtTrace)-gtTrace - It takes a function that generates a Trace, and
generates a Trace. - The thread language is implemented as a CPS
monad - Each primitive operation (system call) is
represented as a CPS computation that generates a
trace node - The bind operation (written as gtgt)
sequentially combine CPS computations
21The thread abstraction CPS monad(hidden from
the user)
22Haskells do-syntax for monads
- A special syntax to simplify programming with
monads - Automatic overloading of operations such as gtgt
- A monad is an embedded language!
23The big picture
The monad interfaceembedded language with the
do-syntax and system calls
The monad implementation Continuation-passing
style computation
The side effect of the monad Lazy traces
24Outline
- Motivation and challenge
- Implementation
- Programming interfaces
- Discussion
25Now we only care the interfaces
Embedded language with the do-syntax and system
calls
Lazy traces for control inversion
26Programming with threads
- The same programming style with C/Java
- Wrap low-level I/O system calls with higher-level
library interfaces - Code example writing a blocking sock_accept call
using non-blocking accept calls
27Programming with events
- The scheduler (main event loop) has an abstract
programming interface traces - Schedulers job traverse the trace tree
- Reading a trace node running an event handler
- Tree traversal strategy thread scheduling
algorithm - Scheduler plays the active role of control
- Control inversion provided by laziness
28A simple round-robin scheduler
Fetch a thread
Run the thread until it makes the next syscall
Take the child node
Interpret the syscall
Throw it back to the ready queue
29A real event-driven system
- Each event loop (a worker) runs in a OS thread
- Event loops synchronize using queues
- Example configuration
- Some worker threads for CPU-intensive
computations and nonblocking I/O (like the
round-robin scheduler) - Some worker threads for executing blocking I/O
calls (such as fopen) - Dedicated worker threads for monitoring epoll/AIO
events
30Event loops for epoll/AIO
- Epoll high-performance select() in Linux
- AIO high-performance asynchronous file I/O in
Linux - Event loops are really simple.
Wraps the C function epoll_wait() using the
Haskell Foreign Function Interface (FFI)
31Adding an user-level TCP stack
Define / interpret TCP syscalls (22 lines)
Event loop for incoming packets (7 lines)
Event loop for timers (9 lines)
32Outline
- Motivation and challenge
- Implementation
- Programming interfaces
- Discussion
33How about performance?
- Haskell is a pure, lazy, functional programming
language - Runs slightly slower than C
- Needs garbage collection
- However
- CPS threads are really cheap and lightweight
- We can take advantage of high-performance,
event-driven I/O interfaces (epoll/AIO) - Mastering continuation-passing in C is not easier
than learning Haskell!
34How lightweight?
- A minimum CPS thread only uses 48 bytes at run
time - No thread-local stack, everything is
heap-allocated - Actual memory usage depends on thread-local
states - 1GB RAM 22M minimum CPS threads
35How scalable?
- We compared Haskell CPS threads with C threads
using Linux NPTL - NPTL Native Posix Thread Library
- I/O scalability tests
- FIFO pipes
- Disk head scheduling
- CPS threads scale better than NPTL
- Performs like the ideal event-driven system!
36Disk head scheduling performance
37FIFO performance with idle threads
38How can this help system design?
Use programming language techniques to provide
all the necessary abstractions
User Application
Thread scheduler
Blocking I/O
Flexible and type-safe
TCP
Mostly event-driven
OS Kernel
Keep it thin and simple
Hardware
Purely event-driven
39Conclusion
- With better programming languages, we can get the
best of both worlds - Expressiveness and simplicity of threads
- Scalability and flexibility of event-driven
systems - There are other benefits, too
- Type safety is a big one
- Multiprocessor support is good (see our paper)
- Easy to implement event-driven OS components
directly in user applications