Title: Magpie: Profiling for Performance Analysis of Distributed Systems
1Magpie Profiling forPerformance Analysisof
Distributed Systems
- Rebecca Isaacs
- (joint work with Paul Barham)
- 4 July 2002
2What is Magpie?
- A tool for characterising the workload of a
distributed system based on detailed observations
of system activity - Online measurements are taken by a set of
distributed profiling components - System resource consumption is accounted to
individual requests - e.g. CPU, disk accesses and network bandwidth
used by HTTP request in web server - Offline processing of the recorded data derives a
characterization of the system workload
3Motivation Performance Modelling
- Goal is to derive a generative model of the
system workload suitable for input to a
performance modeller - Scope (currently) is multi-tier server farms
running .NET web sites - Advantages of Mapgie
- Acquire a workload description with less human
effort than conventional benchmarking - Extract a detailed model from a representative
system - Not just a long-term average across all
transactions - Measure with a realistic mix of transaction types
- Build a probabilistic model of the usage profile
which includes hidden transaction types, eg
error conditions - Complex behaviour may not be easily observable
manually, eg web transaction type discriminator
is not necessarily the URL
4Profiling Components (1)
- Windows XP has efficient low-level event tracing
built in to the kernel - Perfinfo is a command-line tool for turning on or
off tracing of specific system activities - Magpie runs perfinfo on both servers to capture
- Context switches
- File IO
- Disk IO
- Network send and receive
- Process and thread creation and deletion
5Profiling Components (2)
- ISAPI filter
- DLL loaded into IIS (web server) process
- Filter registers with IIS to receive particular
event notifications - Can examine and modify both incoming and outgoing
streams of data - Magpie ISAPI filter
- Allocates a unique identifier to each incoming
request and adds it to the HTTP header - Records cycle counter resource usage at entry
and exit
6Profiling Components (3)
- HTTP Module
- Part of ASP.NET
- Each request is processed by multiple HTTP
modules, eg session, authentication etc - Magpie HTTP Module
- Stores request identifier in (managed) thread
local state - Records cycle counter, managed thread id
resource usage
7Profiling Components (4)
- Common Language Runtime Profiling API
- Two COM interfaces
- Profiler implements the notifications API eg
function enter/leave, thread mapping, garbage
collection - Runtime implements API which allows profiler to
get more information - Magpie CLR Profiler
- Monitors CLRg OS thread mappings
- Records thread ids, cycle counter resource
usage - Intercepts JIT compilation of relevant ADO.NET
functions - Inserts calls to profiling functions
- Modifies SQL stored procedure invocations
8Profiling Components (5)
- SQL Profiler
- Logs selected events (can be user defined) to
table or file - Magpie SQL Profiling
- Wraps original stored procedures
- Runs extended store procedure to get cycle
counter resource usage stats before and after
executing original request - Generates trace events before and after executing
original request - Recorded by the SQL Profiler in output trace
- Data includes request identifier, cycle counter
resource usage
9Magpie Measurement Infrastructure
Store request id in TLS
Wrap stored procs with profiling
Modify SQL RPC
Tag each request
Web Server(s)
SQL Server(s)
Client(s)
Stored Procs
DBMS
Cache
Observations are ordered by cycle counter
Kernel
Kernel
Context switches, disk and file IO, network send
and receive
10What really happens in a simple request?
Web Server
SQL Server
Client
http//someurl.aspx
SQL request
data
web page
11Magpie observations of CPU used by one request
IIS threads
ASP.564
bad0019d
IIS.918
IIS.9b4
39.65s
40.15s
SQL threads
SQL.fa4
bad0019d
bad0019d
SQL.f5c
bad0019d
38.32s
38.68s
12Models of the simple request
Typically assumed structure
20
80
IIS
100
SQL
Actual structure observed by Magpie
IIS
SQL
13Simulation Case Study
- Compare SEQUENTIAL transaction with PIPELINED
transaction - Saturation test with 1000 requests
- Equal resource demands (22ms comp IIS, 20ms SQL,
3x1k net)
Single IIS Thread Single SQL Thread RPS Average Resp Time IIS Util SQL Util
SEQ 22.5 65ms 50 45
PIPE 37.8 28ms 84 76
2 IIS Threads Single SQL Thread RPS Average Resp Time IIS Util SQL Util
SEQ 30.7 74ms 68 62
PIPE 40.1 50ms 90 82
14Constructing Models with Machine Learning?
- Learn probabilistic models of resource usage by
different request types - Possibly apply coupled hidden Markov models?
Web
Compute, Disk IO
Receive Pkt
SQL
Waiting
Send Pkt
etc.
time
15Future Work
- Investigate ways of extracting models from the
data, esp. machine learning - Use Magpie to learn parameters in the live
system order to calibrate hardware device models
(very speculative) - Explore other types of distributed system, eg
peer-to-peer