Capriccio: Scalable Threads for Internet Service - PowerPoint PPT Presentation

About This Presentation
Title:

Capriccio: Scalable Threads for Internet Service

Description:

Thread-based. View applications as sequence of stages, separated by ... Apache's performance improved by 15% with Capriccio. Resource-Aware Admission Control ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 47
Provided by: csF2
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Capriccio: Scalable Threads for Internet Service


1
Capriccio Scalable Threads for Internet Service
2
Introduction
  • Internet services have ever-increasing
    scalability demands
  • Current hardware is meeting these demands
  • Software has lagged behind
  • Recent approaches are event-based
  • Pipeline stages of events

3
Drawbacks of Events
  • Events systems hide the control flow
  • Difficult to understand and debug
  • Eventually evolved into call-and-return event
    pairs
  • Programmers need to match related events
  • Need to save/restore states
  • Capriccio instead of event-based model, fix the
    thread-based model

4
Goals of Capriccio
  • Support for existing thread API
  • Little changes to existing applications
  • Scalability to thousands of threads
  • One thread per
  • execution
  • Flexibility to address
  • application-specific
  • needs

Threads
Ideal
Ease of Programming
Events
Threads
Performance
5
Thread Design Principles
  • Kernel-level threads are for true concurrency
  • User-level threads provide a clean programming
    model with useful invariants and semantics
  • Decouple user from kernel level threads
  • More portable

6
Capriccio
  • Thread package
  • All thread operations are O(1)
  • Linked stacks
  • Address the problem of stack allocation for large
    numbers of threads
  • Combination of compile-time and run-time analysis
  • Resource-aware scheduler

7
Thread Design and Scalability
  • POSIX API
  • Backward compatible

8
User-Level Threads
  • Performance
  • Flexibility
  • - Complex preemption
  • - Bad interaction with kernel scheduler

9
Flexibility
  • Decoupling user and kernel threads allows faster
    innovation
  • Can use new kernel thread features without
    changing application code
  • Scheduler tailored for applications
  • Lightweight

10
Performance
  • Reduce the overhead of thread synchronization
  • No kernel crossing for preemptive threading
  • More efficient memory management at user level

11
Disadvantages
  • Need to replace blocking calls with nonblocking
    ones to hold the CPU
  • Translation overhead
  • Problems with multiple processors
  • Synchronization becomes more expensive

12
Context Switches
  • Built on top of Edgar Toernigs coroutine library
  • Fast context switches when threads voluntarily
    yield

13
I/O
  • Capriccio intercepts blocking I/O calls
  • Uses epoll for asynchronous I/O

14
Scheduling
  • Very much like an event-driven application
  • Events are hidden from programmers

15
Synchronization
  • Supports cooperative threading on single-CPU
    machines
  • Requires only Boolean checks

16
Threading Microbenchmarks
  • SMP, two 2.4 GHz Xeon processors
  • 1 GB memory
  • two 10 K RPM SCSI Ultra II hard drives
  • Linux 2.5.70
  • Compared Capriccio, LinuxThreads, and Native
    POSIX Threads for Linux

17
Latencies of Thread Primitives
18
Thread Scalability
  • Producer-consumer microbenchmark
  • LinuxThreads begin to degrade after 20 threads
  • NPTL degrades after 100
  • Capriccio scales to 32K producers and consumers
    (64K threads total)

19
Thread Scalability
20
I/O Performance
  • Network performance
  • Token passing among pipes
  • Simulates the effect of slow client links
  • 10 overhead compared to epoll
  • Twice as fast as both LinuxThreads and NPTL when
    more than 1000 threads
  • Disk I/O comparable to kernel threads

21
Linked Stack Management
  • LinuxThreads allocates 2MB per stack
  • 1 GB of VM holds only 500 threads

Fixed Stacks
22
Linked Stack Management
  • But most threads consumes only a few KB of stack
    space at a given time
  • Dynamic stack allocation can significantly reduce
    the size of VM

Linked Stack
23
Compiler Analysis and Linked Stacks
  • Whole-program analysis
  • Based on the call graph
  • Problematic for recursions
  • Static estimation may be too conservative

24
Compiler Analysis and Linked Stacks
  • Grow and shrink the stack size on demand
  • Insert checkpoints to determine whether we need
    to allocate more before the next checkpoint
  • Result in noncontiguous stacks

25
Placing Checkpoints
  • One checkpoint in every cycle in the call graph
  • Bound the size between checkpoints with the
    deepest call path

26
Dealing with Special Cases
  • Function pointers
  • Dont know what procedure to call at compile time
  • Can find a potential set of procedures

27
Dealing with Special Cases
  • External functions
  • Allow programmers to annotate external library
    functions with trusted stack bounds
  • Allow larger stack chunks to be linked for
    external functions

28
Tuning the Algorithm
  • Stack space can be wasted
  • Internal and external fragmentation
  • Tradeoffs
  • Number of stack linkings
  • External fragmentation

29
Memory Benefits
  • Tuning can be application-specific
  • No preallocation of large stacks
  • Reduced requirement to run a large numbers of
    threads
  • Better paging behavior
  • StacksLIFO

30
Case Study Apache 2.0.44
  • Maximum stack allocation chunk 2KB
  • Apache under SPECweb99
  • Overall slowdown is about 3
  • Dynamic allocation 0.1
  • Link to large chunks for external functions 0.5
  • Stack removal 10

31
Resource-Aware Scheduling
  • Advantages of event-based scheduling
  • Tailored for applications
  • With event handlers
  • Events provide two important pieces of
    information for scheduling
  • Whether a process is close to completion
  • Whether a system is overloaded

32
Resource-Aware Scheduling
  • Thread-based
  • View applications as sequence of stages,
    separated by blocking calls
  • Analogous to event-based scheduler

33
Blocking Graph
  • Node A location in the program that blocked
  • Edge between two nodes if they were consecutive
    blocking points
  • Generated at runtime

34
Resource-Aware Scheduling
  • 1. Keep track of resource utilization
  • 2. Annotate each node with resource used and its
    outgoing edges
  • 3. Dynamically prioritize nodes
  • Prefer nodes that release resources

35
Resources
  • CPU
  • Memory (malloc)
  • File descriptors (open, close)

36
Pitfalls
  • Tricky to determine the maximum capacity of a
    resource
  • Thrashing depends on the workload
  • Disk can handle more requests that are sequential
    instead of random
  • Resources interact
  • VM vs. disk
  • Applications may manage memory themselves

37
Yield Profiling
  • User threads are problematic if a thread fails to
    yield
  • They are easy to detect, since their running
    times are orders of magnitude larger
  • Yield profiling identifies places where programs
    fail to yield sufficiently often

38
Web Server Performance
  • 4x500 MHz Pentium server
  • 2GB memory
  • Intel e1000 Gigabit Ethernet card
  • Linux 2.4.20
  • Workload requests for 3.2 GB of static file data

39
Web Server Performance
  • Request frequencies match those of the SPECweb99
  • A client connects to a server repeated and issue
    a series of five requests, separated by 20ms
    pauses
  • Apaches performance improved by 15 with
    Capriccio

40
Resource-Aware Admission Control
  • Consumer-producer applications
  • Producer loops, adding memory, and randomly
    touching pages
  • Consumer loops, removing memory from the pool and
    freeing it
  • Fast producer may run out of virtual address space

41
Resource-Aware Admission Control
  • Touching pages too quickly will cause thrashing
  • Capriccio can quickly detect the overload
    conditions and limit the number of producers

42
Programming Models for High Concurrency
  • Event
  • Application-specific optimization
  • Thread
  • Efficient thread runtimes

43
User-Level Threads
  • Capriccio is unique
  • Blocking graph
  • Resource-aware scheduling
  • Target at a large number of blocking threads
  • POSIX compliant

44
Application-Specific Optimization
  • Most approaches require programmers to tailor
    their application to manage resources
  • Nonstandard APIs, less portable

45
Stack Management
  • No garbage collection

46
Future Work
  • Multi-CPU machines
  • Profiling tools for system tuning
Write a Comment
User Comments (0)
About PowerShow.com