An Efficient Threading Model to Boost Server Performance - PowerPoint PPT Presentation

About This Presentation
Title:

An Efficient Threading Model to Boost Server Performance

Description:

Apache web server (version 2.0.43) Synthetic workload. Trace based workload ... Apache: Trace Based Workload. Trace based workload. Rice CS trace, NASA trace ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 36
Provided by: eecgTo
Category:

less

Transcript and Presenter's Notes

Title: An Efficient Threading Model to Boost Server Performance


1
An Efficient Threading Model to Boost Server
Performance
  • Anupam Chanda

2
Motivation
  • Complex mainstream servers are multi-threaded
  • Apache 2.0
  • MySQL
  • Variety of threading models
  • Effects on server performance?
  • Want higher performance

3
Thesis Contributions
  • Examine thread architectures
  • User thread per kernel thread
  • Blocking I/O vs. non-blocking I/O
  • N-to-M threads with non-blocking I/O
  • Novel thread model
  • Architectural benefits over other thread models
  • Higher performance for Apache and MySQL

4
Talk Outline
  • Contrast threading architectures
  • Benefits of N-to-M threads with non-blocking I/O
  • Large I/O transfer optimization
  • Evaluation
  • Apache
  • MySQL
  • Related works
  • Conclusion

5
User Thread / Kernel Thread
User
User
Kernel
Kernel
N-to-M
1-to-1
N-to-1
6
Blocking I/O / Non-blocking I/O
  • Blocking I/O
  • Issue application I/O as is
  • I/O blocks gt thread blocks
  • Non-blocking I/O
  • Issue application blocking I/O in non-blocking
    manner
  • Use event notification mechanism
  • Library schedules I/O for different threads
  • Return to application when I/O finishes

7
Threading Models
  • X gt feasible
  • - gt not feasible
  • gt novel

8
1-to-1 threads/blocking I/O
  • Context switches increase for I/O intensive
    workloads
  • Kernel level context switches

9
N-to-1 threads/non-blocking I/O
  • Block due to page faults, or open()s
  • Cannot use multiple processors on an SMP
  • Event notification
  • Select()/poll() dont scale well

10
N-to-M threads/blocking I/O
  • Employs scheduler activations to handle blocking
    events
  • Blocking I/O gt context switch overhead
  • Frequent blocking I/O gt reduces to 1-to-1 threads

11
Non-blocking I/O
Blocking I/O
Non-blocking I/O
12
N-to-M threads/non-blocking I/O
  • Compared to 1-to-1 threads/blocking I/O
  • Fewer kernel threads
  • Library context switches less expensive
  • Non-blocking I/O allows batching of events across
    user/kernel boundary

13
N-to-M threads/asynchronous I/O (contd.)
  • Compared to N-to-M threads/blocking I/O
  • Non-blocking I/O allows batching of events across
    user/kernel boundary
  • Compared to N-to-1 threads/non-blocking I/O
  • A kernel thread per CPU on an SMP
  • Does not stall in case of page faults

14
Large I/O in Traditional Libraries
REPEAT
15
ServLib Large I/O Optimization
REPEAT
16
ServLib Thread Library
  • N-to-M threads/non-blocking I/O
  • Exports POSIX threads (pthreads) API
  • Transparently linked to multi-threaded servers
  • Employs FreeBSDs kevent() event notification
    mechanism

17
Performance Evaluation
  • Compare ServLib with
  • N-to-1 threads/non-blocking I/O (libc_r)
  • 1-to-1 threads/blocking I/O (linuxthreads)
  • Two server applications
  • Apache web server (version 2.0.43)
  • Synthetic workload
  • Trace based workload
  • MySQL database server (version 3.23.55)
  • TPC-W workload

18
Apache Synthetic Workload
  • Synthetic Workload
  • Concurrent clients requesting the same file
  • Vary file size
  • Hardware
  • 2.4 GHz Intel Xeon server
  • 2 GB memory
  • 2x Gigabit network connection between server and
    client
  • Server CPU bottleneck in these tests

19
(No Transcript)
20
(No Transcript)
21
Analysis
  • Collected kernel profile statistics
  • 1-to-1 threads
  • 40x more context switches than ServLib
  • Effect of I/O optimization in ServLib
  • N-to-1 threads
  • Effect of I/O optimization in ServLib
  • Poll() 4th most costly system call
  • Kevent() inexpensive

22
Apache I/O Optimization Test
  • Experiment on large I/O optimization
  • Turn off optimization
  • 5 reduction in overall performance

23
Apache Trace Based Workload
  • Trace based workload
  • Rice CS trace, NASA trace
  • Play trace log from client machine
  • Ignore the first run
  • Collect results for second run (warm cache)
  • Working set size less than main memory

24
Traces Characteristics
25
(No Transcript)
26
(No Transcript)
27
MySQL Tests
  • Trace of database queries for TPC-W workload
  • Database size 400 MB
  • Server CPU bottleneck in these tests

28
(No Transcript)
29
(No Transcript)
30
MySQL Tests Analysis
  • Collected kernel profile statistics
  • 1-to-1 threads
  • 3x more context switches than ServLib
  • Kernel level synchronization more expensive
  • N-to-1 threads
  • 20x more poll() than ServLib
  • 7x more poll() than 1-to-1 threads

31
Future Work
  • Investigate effects of preemption
  • Experiments
  • Tests on an SMP
  • N-to-M threads with blocking I/O
  • Optimize N-to-1 threads to use kevent()

32
Related Works Server Architectures
  • Flash web server USENIX 1999
  • Hybrid architecture
  • Staged Event Driven Architecture SOSP 2001
  • QoS for internet services

33
Related Works Thread Libraries
  • State Threads
  • N-to-1 thread library
  • Not pthreads compatible
  • For Internet server applications
  • Gnu Pth
  • N-to-1 thread library
  • Not pthreads compatible
  • Threads for event-driven applications
  • Solariss N-to-M threads with blocking I/O
  • Linuxs 1-to-1 threads with blocking I/O

34
Conclusions
  • N-to-M threads with non-blocking I/O
  • Novel
  • High performance
  • Boost server performance
  • 10-20 for Apache
  • 10-15 for MySQL

35
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com