Title: SC2000 Tutorial
1 Tutorial on Using NetLogger Toolkit version 2.x
Dan Gunter Brian L. Tierney
Distributed Systems Department Lawrence Berkeley
National Laboratory
2Overview
- The Problem
- When building distributed systems, we often
observe unexpectedly low performance - the reasons for which are usually not obvious
- The bottlenecks can be in any of the following
components - the applications
- the operating systems
- the disks or network adapters on either the
sending or receiving host - the network switches and routers, and so on
- The Solution
- Highly instrumented systems with precision timing
information and analysis tools
3Bottleneck Analysis
- Distributed system users and developers often
assume the problem is network congestion - This is often not true
- In our experience tuning distributed
applications, performance problems are due to - network problems 40
- host problems 20
- application design problems/bugs 40
- 50 client , 50 server
- Therefore it is equally important to instrument
the applications
4NetLogger Toolkit
- We have developed the NetLogger Toolkit (short
for Networked Application Logger), which
includes - tools to make it easy for distributed
applications to log interesting events at every
critical point - tools for host and network monitoring
- The approach is novel in that it combines
network, host, and application-level monitoring
to provide a complete view of the entire system. - This has proven invaluable for
- isolating and correcting performance bottlenecks
- debugging distributed applications
5NetLogger Components
- NetLogger Toolkit contains the following
components - NetLogger message format
- NetLogger client library (C, Java, Python)
- NetLogger visualization tools
- NetLogger host/network monitoring tools
- Source code (BSD-like open source) and binaries
are available at - http//www-didc.lbl.gov/NetLogger/
- Additional critical component for distributed
applications - NTP (Network Time Protocol) or GPS host clock is
required to synchronize the clocks of all systems
6NetLogger Methodology
- NetLogger is both a methodology for analyzing
distributed systems, and a set of tools to help
implement the methodology. - You can use the NetLogger methodology without
using any of the LBNL provided tools. - The NetLogger methodology consists of the
following - All components must be instrumented to produce
monitoring These components include application
software, middleware, operating system, and
networks. The more components that are
instrumented the better. - All monitoring events must use a common format
and common set of attributes and a globally
synchronized timestamp - Log all of the following events Entering and
exiting any program or software component, and
begin/end of all IO (disk and network) - Collect all log data in a central location
- Use event correlation and visualization tools to
analyze the monitoring event logs
7When to use NetLogger
- When you want to
- do performance/bottleneck analysis on distributed
applications - determine which hardware components to upgrade to
alleviate bottlenecks - do real-time or post-mortem analysis of
applications - correlate application performance with system
information (ie TCP retransmission's) - works best with applications where you can follow
a specific item (data block, message, object)
through the system
8When NOT to use NetLogger
- Analyzing massively parallel programs (e.g. MPI)
- Current visualization tools dont scale beyond
tracking about 20 types of events at a time - However, NetLogger is useful for watch a few
nodes, and for collecting MPI log files from all
the nodes - Analyzing many very short events
- system will become overwhelmed if too many events
- we typically use NetLogger to monitor events that
take gt .5 ms - e.g probably dont want to use to instrument the
UNIX kernel
9NetLogger Client Library
- There are 2 versions of the client library
- Netlogger-lite only supports NetLogger format
(single .c and .h file) - Recommended for most use-cases
- C, Python (native or via SWIG), and Java (native
only) - NetLogger-full supports all formats, reconnect,
backup URLs, and more - C, Python (via SWIG), Java (via SWIG)
10NetLogger Event Format
- NetLogger supports the following message formats
- NetLogger format native ASCII format, easy to
read and parse, enforces data model - NetLogger binary for applications that generate
1000s of events / second - ULM for compatibility with NetLogger 1.0
- XML if you want to run very slow -)
11NetLogger Data Model
- NetLogger message format is designed to help
enforce not just a common format, but also a
common data model - Why bother with a data model?
- Makes it easier to write generic consumers of the
data for analysis, visualization - Relational databases need as much structure as
they can get - Elegance if we can describe a general principle,
we should!
12NetLogger Message Format (1)
- Each NetLogger message is called an event
- The event is subdivided into fields
- Each field has three parts, typecode, key, and
value - t DATE 2004-04-15T213601.425059
- Typecodes
- d Double (64 bits)
- i Integer (32 bits)
- l Long integer (64 bits)
- s String (up to 4K)
- t Timestamp (ISO8601 date format)
13NetLogger Message Format (2)
- The following fields are generated automatically
by NetLogger (all APIs) - DATE The time that this event occurred.
- LVL The logging level of this event.
- Fatal, Error, Warn, Info, Debug, Debug1, Debug2,
Debug3, or User. The default level is Info. - EVNT This field is the name of the event.
- Any number of additional fields are allowed
- Well, technically, 128 total is the limit
14NetLogger Message Format (3)
- Sample NetLogger event
- t DATE 2004-04-15T213601.425059
- s LVL Info
- s HOST foo.lbl.gov
- s PRGM test_program
- s EVNT SendData
- i Send.Size 49332
- This says program named test_program on host
foo.lbl.gov performed event named SendData with a
size of 49332 bytes, on April 15 at 936 PM
(UTC). - User-defined data elements are used to store
information about the logged event - for example - EVNT SendData / Send.Size 49332
- the number of bytes of data sent
- EVNT netstat.retranssegs / NS.RTS 2
- the number of TCP retransmits since the previous
event
15NetLogger APIs
- NetLogger Toolkit includes application libraries
for generating NetLogger messages - Can send log messages to
- file
- host/port (netlogd)
- syslogd
- C, Java, and Python APIs are currently supported
- Compatible with standard Java and Python logging
packages - Perl, TCL, etc.
- Easy to generate compatible format using string
print functions
16NetLogger C API
17C API deployment
- Designed to be lightweight
- One .c and one .h file
- To build from source, also need
netlogger_stdint.h for portability - lt 3000 lines of code, total
- Can just copy into your source code go
18NetLogger C API Design
Format a modules event at a given logging level
for a given target type
Format an event belonging to a module to a
stream using write() with variable arguments
public interface exists for each layer
Abstraction
Format a stream of records
Format a record into a string
19Basic C API concepts
- Module
- Name for a globally accessible output
destination. Multiple modules may be active at
any one time. - Logging levels
- Each event has a level, 0..9 are reserved, but
any posint is OK (appears as NL_USER in output) - Each module has a level, meaning only events at
this level of detail or lower levels of details
(henceforth just lower levels) will be written - Target types
- application, dbg, path, node or any
user - Determines additional auto-generated fields, such
as file name and line number for dbg - Determines required fields, such as SRC and DST
for path
20C API Example
- // create module named dbg
- NL_logger_module("dbg", NULL, NL_LVL_INFO,NL_TYPE_
DBG, "") - // write a loop.start event to the dbg module
- NL_debug("dbg","loop.start","starti stopi", 0,
10) - for ( ctr0 ctr lt 10 ctr )
- // wrap do_something() with some log events
- NL_debug1("dbg","do.something.start","inde
xi", ctr) - do_something()
- NL_debug1("dbg","do.something.end","index
i", ctr) -
- // write a loop.end event
- NL_debug("dbg","loop.end","starti stopi", 0,
10) - // close all log modules
- NL_logger_del()
21Sample output (1)
- If you compile and run this example, you will get
no output! - Reasons
- Destination for logging was specified as NULL
- Message level (debug) is more detailed than the
default message level for the module
(NL_LVL_INFO) - Both of these can be changed without even
re-compiling the application
gcc simple_example.c ./a.out huh?!
22C API, Dynamic Log Locations / Levels
- The environment variable NL_DEST provides a
default location for logging - NL_DESTx-netlog//some.remote.host15001
- Changing log levels on the fly is very useful for
debugging - Very simple to do this with C API
- NLCONFIGLEVEL/tmp/level
- echo 3 gt /tmp/level
- You can even do per-module levels
- NLCONFIGLEVEL/tmp/level
- echo 5 dbg\n3 gt /tmp/level
23C API Example output (2)
- Applying what we now know to the example
gcc simple_example.c export
NLCONFIGLEVEL/tmp/dbglevel echo 6 gt
/tmp/dbglevel export NL_DEST- stdout
./a.out t DATE 2004-09-29T224238.759067 s LVL
Debug s HOST 131.243.2.143 s TGT dbg s EVNT
loop.start s FILE example.c i LINE 9 s FN
main i start 0 i stop 10 t DATE
2004-09-29T224238.759268
s LVL Debug1 s HOST 131.243.2.143 s TGT dbg s
EVNT do.something.start s FILE example.c i
LINE 11 s FN main i index 0 t DATE
2004-09-29T224238.759326 s LVL Debug1 s HOST
131.243.2.143 s TGT dbg s EVNT
do.something.end s FILE example.c i LINE 13 s
FN main i index 0
etc..
24C API Performance
- To /dev/null
- 250,000 events/sec
- To a file in /tmp
- 95,000 events/sec
- These results were with the highest API layer.
Using the next layer down gives about a 10
speedup. - These tests were on a faster machine than the
Python and Java ones
25NetLogger Python API
26Python API Design
Python stdlib logging module
wraps
nllogging.py
nllite.so C library
calls
nllite.py
calls (optionally)
27Sample usage of nllogging.py
- import logging
- import sys
- import gov.lbl.dsd.netlogger.nllogging as \
nllogging - create a logging 'channel'
- logger logging.getLogger('sample')
- proxy Handler class w/ NLHandlerProxy
- logger.addHandler(nllogging.NLHandlerProxy(logging
.StreamHandler)) - write some messages
- logger.warn("this log message is ignored")
- for i in range(10)
- logger.warn("hi","foo"i,"bar"i/2.0))
28Sample usage of nllite.py
- import gov.lbl.dsd.netlogger.nllite as \ nllite
- get file object for NetLogger URL
- ofile nllite.urlfile("-")
- open log stream
- log nllite.LogOutputStream(ofile)
- write some messages
- for i in range(10) log.write("hi","foo"i,"ba
r"i/2.0, - levelnllite.Level.WARN)
29Python API Performance
logging to a file in /tmp
nllite.py API
nllogging.py API
Normal usage
pure Python
800
7000
25000
n/a
Python / C with SWIG
High-perf usage
Units events/second
30Choosing a Python API
- Use the nllogging API if you
- Want compatibility with existing instrumentation
- Dont log more than a few dozen events a second
from any one process - Want to use the log4j-style configuration files
built in to this interface - Use the nllite API if you
- Are sometimes logging hundreds of events a second
from a single process - Dont need log4j config files
- Want to write over a socket directly
- Need the C librarys dynamic levels
31NetLogger Java API
32Java API Design Goals
- Interoperate with log4j and commons logging APIs
- Logging calls should not take more than one or
two lines of code - Efficiency
33Java API Usage
- Java code
- public static logSomething(void)
- private static final org.apache.commons.logging.L
og log org.apache.commons.logging.LogFactory.ge
tLog(nl") - for (int i 0 i lt 10 i)
- log.debug(new LogMessage("dbg", "my.event").
add("index", i)) -
-
- Log4j.properties
- Send nl' messages to the NETLOGGER appender
- log4j.logger.nlDEBUG, NETLOGGER
- Append to file
- log4j.appender.NETLOGGERorg.apache.log4j.FileAppe
nder - log4j.appender.NETLOGGER.File/tmp/foo1
34Java API Performance
Format-only
Write to /tmp
New message every call
Normal case
45,000
90,000
Reuse message object
800,000
80,000
HotSpot test?
Unitsevents/sec
35Java API Deployment
netlogd (collect log files)
WAN
Log4J properties file
nlforward (send files to net)
LogMessage.java (nllite.jar)
disk
Will talk about this part next
APP
36Using NetLogger for Distributed Applications
37Recommended approach
netlogd (collect log files)
WAN
node
N
38Why not just use a socket?
- Distributed communication is unreliable
- Putting reconnect and buffering logic into the
instrumentation library is not a good idea - Bloats and complicates the code
- Cleanup is a nightmare
- Who deals with the file if the app crashes?
- Separating it out has many advantages
- Writing log files will have almost no impact on
application - No chance that network congestion will cause
application to block - Logged events will eventually get sent to
netlogd, even if program exits or crashes
39nlforward
- Scan a directory for files matching some glob or
regex pattern, and forward them to a NetLogger
URL (file, socket) - Scan interval is configurable to reduce load
- Can optionally age out files that havent been
modified in some time period - If you want, can even remove them!
- Example
- nlforward -e 600 -s 5 ./'.log' -p fwd.state
x-netlog//some.host14380 - Forward all files ending in .log to
some.host14380, scanning every 5 sec for
changes and timing out files after 10 min.
40netlogd
- Use netlogd to collect NetLogger messages at a
central host - use to avoid the need to sort/merge several log
files from several places
Network 3
Network 2
Network 1
netlogd
NetLogger data
41NetLogger Event Life Lines
42Event ID
- In order to associate a group of events into a
lifeline, you must assign an event ID to each
NetLogger event - Sample Event Ids
- file name
- block ID
- frame ID
- user name
- host name
- combination of the above
- etc.
43Sample NetLogger Use with Event IDs
- include nl_log.h
- main()
- int i0, size 65536, num_blocks 100
- NL_logger_module(foo",NULL, NL_LVL_INFO,
NL_TYPE_APP, "") - for ( i0 i lt num_blocks i )
- / read block of data)
- NL_write(INFO, foo",read.start","indexi
read.sizei", i, size) - read_data(i)
- NL_write(INFO, foo",read.end","indexi
read.sizei", i, size) - / process block of data /
- NL_write(INFO, foo",process.start","indexi",i
,) - process_data(i)
- NL_write(INFO, foo",read.start","indexi
read.sizei", i, size) - / write results /
- NL_write(INFO, foo",write.start","indexi
write.sizei", i, size) - send_data(i)
- NL_write(INFO, foo",write.start","indexi
write.sizei", i, size) -
44NetLogger Host/Network Tools
- Wrapped UNIX network and OS monitoring tools to
log interesting events using the same log
format - netstat (TCP retransmissions, etc.)
- vmstat (system load, available memory, etc.)
- iostat (disk activity)
- ping
- These tools have been wrapped with python
programs which - parse the output of the system utility
- build NetLogger messages containing the results
- Also have a NetLogger wrapper for Ganglia
45Sample NetLogger System Monitoring Tool
- Example nl_vmstat -t 60 -d 5000 -m 2
logger.lbl.gov - program will run vmstat every 5 seconds for 1
hour, and send the results to netlogd on host
logger.lbl.gov - Generates the following information
- CPU usage by User
- CPU usage by System
46NetLogger Visualization Tools
- Exploratory, interactive analysis of the log data
has proven to be the most important means of
identifying problems - this is provided by nlv (NetLogger Visualization)
- nlv functionality
- can display several types of NetLogger events at
once - user configurable which events to plot, and the
type of plot to draw (lifeline, load-line, or
point) - play, pause, rewind, slow motion, zoom in/out,
and so on - nlv can be run post-mortem or in real-time
- real-time mode done by reading the output of
netlogd as it is being written
47NLV Graph Types
- nlv supports graphing of points, load-lines,
and lifelines
48NLV Screenshot
Menu bar
Title
Scale for load-line/ points
Events
Zoom box
Time axis
Max window size
Legend
Window size
Playback speed
Zoom-box actions
Summary line
Playback controls
Zoom window controls
You are here
49NLV Configuration
- NLV is very flexible, with many options settable
in the configuration file. - Format
- eventset /-eventset_name
- type ltline,point,loadgt
- id list of ULM field names used to determine
which NetLogger messages get grouped into the
same graph primitive - group list of ULM field names which will be
mapped to the same color - val field_name min_val max_val
- annotate list of field names to display in
with annotate option - events list of all event IDs in this
lifeline -
- Each nlv graph object needs to be defined by an
eventset - Events and event-sets both use "" and "-" to
indicate default (i.e. on startup) visibility
50Example NLV Configuration
- display vmstat info as a loadline
- eventset VMSTAT
- type load
- loadline constructed from messages with the
same HOST and NL.EVNT - id HOST NL.EVNT
- messages with the same HOST get the same color
- group HOST
- list of NL.EVNT values in this set_
- events VMSTAT_SYS_TIME VMSTAT_USER_TIME
-
- display netstat TCP retransmits as a point
- eventset NETSTAT
- type point
- truncate values outside the range 0 to 999
- val NS.VAL 0.0 999.0
- point constructed from messages from the same
HOST and PROG - id HOST PROG
- messages with the same HOST get the same color
51Example NLV Configuration
- display server data as a lifeline
- eventset SERVER_READ
- type line
- lifeline constructed from messages from the
same client and server - id CLIENT_HOST DPSS.SERV
- messages with the same DPSS.SERV get the same
color - group DPSS.SERV
- events
- APP_SENT
- DPSS_SERV_IN
- DPSS_START_READ
- DPSS_END_READ
- DPSS_START_WRITE
- APP_RECEIVE
52How to Instrument Your Application
- Youll probably want to add a NetLogger event to
the following places in your distributed
application - before and after all disk I/O
- before and after all network I/O
- entering and leaving each distributed component
- before and after any significant computation
- e.g. an FFT operation
- before and after any significant graphics call
- e.g. certain CPU intensive OpenGL calls
- This is usually an iterative process
- add more NetLogger events as you zero in on the
bottleneck
53Does NetLogger affect application performance?
- Only if you use it incorrectly or log too much
- There are several things to be careful of when
doing this type of monitoring - If logging to disk, dont log to a nfs mounted
disk - best to log to /tmp, which may actually be RAM
(Solaris) - Dont send log messages to a slow (i.e. 10BT) or
congested network, as youll just make it worse - log to a local file instead
- Sample NetLoggerWrite Performance 100000
calls/sec - can make 1000 NetLoggerWrite calls / sec and only
effect your application by 1
54NetLogger Case Studies
55Example HPSS Storage Manager Application
- NetLogger was used to test and verify the results
of a Storage Access Coordination System (STACS)
by LBNLs Data Management Group - STACS is designed to optimize the use of a disk
cache with an HPSS Mass Storage system, and tries
to minimize tape mount requests by clustering
related data on the same tape - NetLogger was used to look at
- per-query latencies
- to show that subsequent fetches of spatially
clustered data "hit" in the cache. - (http//gizmo.lbl.gov/sm/)
56STACS Instrumentation Points
Monitoring Points A) request arrives at HPSS B)
start transfer from tape C) tape transfer
finished D) file available to client E) file
retrieved by client F) file released by client
57NLV for STACS Tracking File Requests
58Tracking Files and System Performance
59Example Parallel Data Block Server
- The Distributed Parallel Storage Server (DPSS)
- provides high-speed parallel access to remote
data - Unique features of the DPSS
- On a high-speed network, can actually access
remote data faster that from a local disk - 70 MB/sec (DPSS) vs 22 MB/sec (local disk)
- Only need to send parts of the file currently
required over the network - e.g. client may only need 100 MB from a 2 GB
data set - analogous to http model
- NetLogger was used for performance tuning and
debugging of the DPSS
60DPSS Cache Architecture
61NetLogger Results for the DPSS
62NetLogger Results for the DPSS
63NetLogger Results for the DPSS over a WAN
64Example NLV of DPSS with a HENP client
65Example Babar data analysis 2 nodes with
Objectivity Error
66Example Matisse Project
67Example Combined Host and Application Monitoring
68Future directions
- Data queries
- Data grouping
- Discovery queries
- Statistical transformations
Subject, Operation, Parameters
Metadata ref Data
Monitoring service
Metadata refs
Host, network sensors
Instrumentation
Metadata can be inline or by reference
69 Getting NetLogger
- Source code and binaries are available at
- http//dsd.lbl.gov/NetLogger/Toolkit/
- Code should run on all Unix platforms
- Tested under Linux, Solaris, FreeBSD, and Mac OSX
- Python and Java code should even run under
(gasp!) Windows - Were not Windows developers, so some tweaks may
be necessary
70For More Information
- Email
- Dan Gunter, dkgunter_at_lbl.gov
- Brian Tierney, bltierney_at_lbl.gov
- http//dsd.lbl.gov/netlogger-toolkit/
- New distribution combining netlogger-full and
netlogger-lite - Subversion repo. on TO-DO list!
- http//dsd.lbl.gov/nlv/
- NetLogger Visualization tool
- http//dsd.lbl.gov/NetLogger/
- Older netlogger-full web pages, still lots of
useful stuff