Capriccio : Scalable Threads for Internet Services - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Capriccio : Scalable Threads for Internet Services

Description:

Co-operative scheduling. Supports POSIX API. Provides: Scalability. Efficient stack ... Co-operative scheduling. Single CPU check Boolean lock/unlock flag ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 21
Provided by: Shar66
Category:

less

Transcript and Presenter's Notes

Title: Capriccio : Scalable Threads for Internet Services


1
Capriccio Scalable Threads for Internet
Services
  • Rob von Behren, Jeremy Condit, Feng Zhou, George
    C Necula and Eric Brewer
  • University of California, Berkeley
  • 2003

2
Intro
  • Thread package
  • At user level
  • Co-operative scheduling
  • Supports POSIX API
  • Provides
  • Scalability
  • Efficient stack management
  • Resource aware scheduling

3
Implementation
  • Context switching
  • Built on top of a co-routine library
  • Fast context switch when threads voluntarily
    yield - explicitly or by making blocking I/O call

4
Implementation
  • I/O
  • Intercepts blocking I/O calls at lib level
  • Internally converted to
  • epoll for socket, pipes
  • AIO for disk I/O
  • poll and kernel thread pool

5
Implementation
  • Scheduler
  • Lot like an event-driven application
  • Alternates b/w running threads checking I/O
    completion
  • Modular select b/w different schedulers at
    runtime

6
Implementation
  • Synchronization
  • Co-operative scheduling
  • Single CPU check Boolean lock/unlock flag
  • Multi CPU spin locks or optimistic concurrency
    control primitives

7
Linked Stack Management
  • Stacks for threads bounded conservatively
  • LinuxThreads 2MB of stack per thread (?)
  • Affects scalability, waste of memory
  • Better option allocate stack as needed dynamic
    stack allocation/de-allocation

8
Stacks
overflow
http//capriccio.cs.berkeley.edu/publications.html
9
Compiler analysis and Linked Stacks
  • Compiler analysis of program (CIL toolkit) to
    generate call graph
  • Weighted directed call graph
  • Each node represents a function, weighted with
    max stack frame size
  • Edge A-gtB represents direct function call from
    A to B

10
Call graph
  • Paths represent sequence of stack frames
  • Path length sum of weights of nodes

11
Usage
  • Without recursive functions calculate longest
    path static stack size
  • Might be too conservative
  • So dynamic stacks using checkpoints

12
Checkpoints
  • Piece of code inserted at call sites (edges)
  • Checks if enough stack space left to reach next
    checkpoint without stack overflow
  • If not, allocate stack chunk, move SP to new
    stack chunk
  • Stack chunk de-allocated when function returns

13
Placing checkpoints
  • Against at every call site, analyze graph
  • Ensure bound on stack space between checkpoints
    when creating them
  • Algorithm
  • Break cycles
  • Scan nodes upwards
  • Break edges/call sites based on path limit
    parameter (MaxPath)

14
Benefits Cost of stack linking
  • Pre-allocation of large stacks unnecessary
    reduces virtual memory pressure
  • Improved paging behavior because of stack chunk
    reuse reduces app work set
  • Cost
  • 73 slow-down in function calls alone but offset
    by low number of calls (5)

15
Resource aware scheduling
  • SEDA multiple event driven stages useful for
    scheduling (queue size, stage part)
  • Capriccio pseudo stages separated by blocking
    points
  • Blocking point places in program where threads
    block

16
Blocking graph
  • Created at runtime by studying block point
    transitions
  • Node location of thread block in program
  • Edge connects consecutive blocking points
  • Edges annotated with weighted averages of
    resource usage
  • Nodes weighted average of outgoing edge values
  • Updated as threads traverse through nodes/edges

17
Usage
  • Track each resource utilization level
  • Determine dynamically if at limit
  • Annotate each node with resources used on
    outgoing edges
  • Predict impact on resources if threads from a
    particular node is scheduled
  • Dynamically prioritize nodes (hence threads)
    based on above

18
Cost
  • Stack traces to locate blocking points
  • Overhead - 8 of exec time for Apache with
    Capriccio
  • 36 for Knot
  • Aggregating resource utilization figures
  • Overhead 0.1

19
Apache spin polling on fd, blocking I/O
callsHaboob/Capriccio poll() for n/w I/O
kernel thread pool for disk I/O
20
Discussion
Write a Comment
User Comments (0)
About PowerShow.com