Title: Combining Events and Threads for Scalable Network Services
1Combining Events and Threads for Scalable Network
Services
- Peng Li and Steve Zdancewic
- University of Pennsylvania
- PLDI 2007, San Diego
2Overview
- A Haskell framework for massively concurrent
network applications - Servers, P2P systems, load generators
- Massive concurrency
- 1,000 threads? (easy)
- 10,000 threads? (common)
- 100,000 threads? (challenging)
- 1,000,000 threads? (20 years later?)
- 10,000,000 threads? (in 15 minutes)
- How to write such programs?
- The very first decision to make the programming
model - Shall we use threads or events?
3Threads vs. Events
- The event-driven model
- One thread ? 10000 clients
- Asynchronous I/O
- Scheduling programmer
- while(1)
- nfdsepoll_wait(kdpfd, events, MAXEVT,-1)
- for(n0 nltnfds n)
- handle_event(eventsn)
-
- The multithreaded model
- One thread ? one client
- Synchronous I/O
- Scheduling OS/runtime libs
- int send_data(int fd1, int fd2)
- while (!EOF(fd1))
- size read_chunk(fd, buf, count)
- write_chunk(fd, buf, size)
-
-
?
4Can we get the best of both worlds?
One application program
- Programming with each client threads
- Synchronous I/O
- Intuitive control-flow primitives
The bridge between threads/events? (some kind of
continuation support)
- Resource scheduling events
- Written as part of the application
- Tailored to applications needs
5Roads to lightweight, application-level
concurrency
- Direct language support for continuations
- Good if you have them
- Source-to-source CPS translations
- Requires hacking on compiler/runtime
- Often not very elegant
- Other solutions?
- (no language support)
- (no compiler/runtime hacks)
6The poor mans concurrency monad
- A poor mans concurrency monad by Koen
Claessen, JFP 1999. (Functional Pearl) - The thread interface
- The CPS monad
- The event interface
- A lazy, tree-like data structure called trace
SYS_NBIO(write_nb)
7Questions on the poor mans approach
- Does it work for high-performance network
services? - (using a pure, lazy, functional language?)
- How does the design scale up to real systems?
- Symmetrical multiprocessing? Synchronization?
I/O? - How cheap is it?
- How much does a poor mans thread cost?
- How poor is it?
- Does it offer acceptable performance?
8Our experiment
- A high-performance Haskell framework for
massively-concurrent network services!!! - Supported features
- Linux Asynchronous IO (AIO)
- epoll() and nonblocking IO
- OS thread pools
- SMP support
- Thread synchronization primitives
- Applications developed
- IO benchmarks on FIFO pipes / Disk head
scheduling - A simple web server for static files
- HTTP load generator
- Prototype of an application-level TCP stack
- We used the Glasglow Haskell Compiler (GHC)
9Multithreaded code example
Nested function calls
Exception handling
Conditional branches
Synchronous call to I/O lib
Recursion
10Event-driven code example
A wrapper function to the C library call using
the Haskell Foreign Function Interface (FFI)
An event loop running in a separate OS thread
Put events in queues for processing in other OS
threads
11A complete event-driven I/O subsystem
One virtual processor event loop for each CPU
Haskell Foreign Function Inteface (FFI)
Each event loop runs in a separate OS thread
12Modular and customizable I/O system (add a TCP
stack if you like)
Define / interpret TCP syscalls (22 lines)
Event loop for incoming packets (7 lines)
Event loop for timers (9 lines)
13How cheap is a poor mans thread?
48 bytes
- Minimal memory consumption 48 bytes
- Each thread just loops and does nothing
- Actual size determined by thread-local states
- Even an ethernet packet can be gt1,000 bytes
- Pay as you go --- only pay for things needed
- In contrast
- A Linux POSIX threads stack has 2MB by default
- The state-of-the-art user-level thread system
(Capriccio) use at least a few KBs for each
thread - Observation
- The poor mans thread is extremely
memory-efficient - (Challenging most event-driven systems)
14I/O scalability test
- Comparison against the Linux POSIX Thread Library
(NPTL) - Highly optimized OS thread implementation
- Each NPTL threads stack limited to 32KB
- Mini-benchmarks used
- Disk head scheduling (all threads running)
- FIFO pipe scalability with idle threads (128
threads running)
15A simple web server
16How poor is the poor mans monad?
- Not too shabby
- Benchmarks shows comparable (if not higher)
performance to existing, optimized systems - An elegant design is more important than 10
performance improvement - Added benefit type safety for many dangerous
things - Continuations, thread queues, schedulers,
asynchronous I/O
17Related Work
- We are motivated by two projects
- Twisted the python event-driven framework for
scalable internet applications - - The programmer must write code in CPS
- Capriccio a high-performance user-level thread
system for network servers - - Requires C compiler hacks
- - Difficult to customize (e.g. adding SMP
support) - Continuation-based concurrency
- Wand 80, Shivers 97,
- Other languages and programming models
- CML, Erlang,
18Conclusion
- Haskell and The Poor Mans Concurrency Monad are
a promising solution for high-performance,
massively-concurrent networking applications - Get the best of both threads and events!
- This poor mans approach is actually very cheap,
and not so poor! - http//www.cis.upenn.edu/lipeng/homepage/unify.ht
ml