Title: An Efficient Threading Model to Boost Server Performance
1An Efficient Threading Model to Boost Server
Performance
2Motivation
- Complex mainstream servers are multi-threaded
- Apache 2.0
- MySQL
- Variety of threading models
- Effects on server performance?
- Want higher performance
3Thesis Contributions
- Examine thread architectures
- User thread per kernel thread
- Blocking I/O vs. non-blocking I/O
- N-to-M threads with non-blocking I/O
- Novel thread model
- Architectural benefits over other thread models
- Higher performance for Apache and MySQL
4Talk Outline
- Contrast threading architectures
- Benefits of N-to-M threads with non-blocking I/O
- Large I/O transfer optimization
- Evaluation
- Apache
- MySQL
- Related works
- Conclusion
5User Thread / Kernel Thread
User
User
Kernel
Kernel
N-to-M
1-to-1
N-to-1
6Blocking I/O / Non-blocking I/O
- Blocking I/O
- Issue application I/O as is
- I/O blocks gt thread blocks
- Non-blocking I/O
- Issue application blocking I/O in non-blocking
manner - Use event notification mechanism
- Library schedules I/O for different threads
- Return to application when I/O finishes
7Threading Models
- X gt feasible
- - gt not feasible
- gt novel
81-to-1 threads/blocking I/O
- Context switches increase for I/O intensive
workloads - Kernel level context switches
9N-to-1 threads/non-blocking I/O
- Block due to page faults, or open()s
- Cannot use multiple processors on an SMP
- Event notification
- Select()/poll() dont scale well
10N-to-M threads/blocking I/O
- Employs scheduler activations to handle blocking
events - Blocking I/O gt context switch overhead
- Frequent blocking I/O gt reduces to 1-to-1 threads
11Non-blocking I/O
Blocking I/O
Non-blocking I/O
12N-to-M threads/non-blocking I/O
- Compared to 1-to-1 threads/blocking I/O
- Fewer kernel threads
- Library context switches less expensive
- Non-blocking I/O allows batching of events across
user/kernel boundary
13N-to-M threads/asynchronous I/O (contd.)
- Compared to N-to-M threads/blocking I/O
- Non-blocking I/O allows batching of events across
user/kernel boundary - Compared to N-to-1 threads/non-blocking I/O
- A kernel thread per CPU on an SMP
- Does not stall in case of page faults
14Large I/O in Traditional Libraries
REPEAT
15ServLib Large I/O Optimization
REPEAT
16ServLib Thread Library
- N-to-M threads/non-blocking I/O
- Exports POSIX threads (pthreads) API
- Transparently linked to multi-threaded servers
- Employs FreeBSDs kevent() event notification
mechanism
17Performance Evaluation
- Compare ServLib with
- N-to-1 threads/non-blocking I/O (libc_r)
- 1-to-1 threads/blocking I/O (linuxthreads)
- Two server applications
- Apache web server (version 2.0.43)
- Synthetic workload
- Trace based workload
- MySQL database server (version 3.23.55)
- TPC-W workload
18Apache Synthetic Workload
- Synthetic Workload
- Concurrent clients requesting the same file
- Vary file size
- Hardware
- 2.4 GHz Intel Xeon server
- 2 GB memory
- 2x Gigabit network connection between server and
client - Server CPU bottleneck in these tests
19(No Transcript)
20(No Transcript)
21Analysis
- Collected kernel profile statistics
- 1-to-1 threads
- 40x more context switches than ServLib
- Effect of I/O optimization in ServLib
- N-to-1 threads
- Effect of I/O optimization in ServLib
- Poll() 4th most costly system call
- Kevent() inexpensive
22Apache I/O Optimization Test
- Experiment on large I/O optimization
- Turn off optimization
- 5 reduction in overall performance
23Apache Trace Based Workload
- Trace based workload
- Rice CS trace, NASA trace
- Play trace log from client machine
- Ignore the first run
- Collect results for second run (warm cache)
- Working set size less than main memory
24Traces Characteristics
25(No Transcript)
26(No Transcript)
27MySQL Tests
- Trace of database queries for TPC-W workload
- Database size 400 MB
- Server CPU bottleneck in these tests
28(No Transcript)
29(No Transcript)
30MySQL Tests Analysis
- Collected kernel profile statistics
- 1-to-1 threads
- 3x more context switches than ServLib
- Kernel level synchronization more expensive
- N-to-1 threads
- 20x more poll() than ServLib
- 7x more poll() than 1-to-1 threads
31Future Work
- Investigate effects of preemption
- Experiments
- Tests on an SMP
- N-to-M threads with blocking I/O
- Optimize N-to-1 threads to use kevent()
32Related Works Server Architectures
- Flash web server USENIX 1999
- Hybrid architecture
- Staged Event Driven Architecture SOSP 2001
- QoS for internet services
33Related Works Thread Libraries
- State Threads
- N-to-1 thread library
- Not pthreads compatible
- For Internet server applications
- Gnu Pth
- N-to-1 thread library
- Not pthreads compatible
- Threads for event-driven applications
- Solariss N-to-M threads with blocking I/O
- Linuxs 1-to-1 threads with blocking I/O
34Conclusions
- N-to-M threads with non-blocking I/O
- Novel
- High performance
- Boost server performance
- 10-20 for Apache
- 10-15 for MySQL
35Thank You!