SC2000 Tutorial

About This Presentation

Title:

SC2000 Tutorial

Description:

When building distributed systems, we often observe unexpectedly low performance ... the disks or network adapters on either the sending or receiving host ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 71

Provided by: briant2

Category:

more less

Transcript and Presenter's Notes

Title: SC2000 Tutorial

1

Tutorial on Using NetLogger Toolkit version 2.x
Dan Gunter Brian L. Tierney
Distributed Systems Department Lawrence Berkeley
National Laboratory
2
Overview

The Problem
When building distributed systems, we often
observe unexpectedly low performance
the reasons for which are usually not obvious
The bottlenecks can be in any of the following
components
the applications
the operating systems
the disks or network adapters on either the
sending or receiving host
the network switches and routers, and so on
The Solution
Highly instrumented systems with precision timing
information and analysis tools

3
Bottleneck Analysis

Distributed system users and developers often
assume the problem is network congestion
This is often not true
In our experience tuning distributed
applications, performance problems are due to
network problems 40
host problems 20
application design problems/bugs 40
50 client , 50 server
Therefore it is equally important to instrument
the applications

4
NetLogger Toolkit

We have developed the NetLogger Toolkit (short
for Networked Application Logger), which
includes
tools to make it easy for distributed
applications to log interesting events at every
critical point
tools for host and network monitoring
The approach is novel in that it combines
network, host, and application-level monitoring
to provide a complete view of the entire system.
This has proven invaluable for
isolating and correcting performance bottlenecks
debugging distributed applications

5
NetLogger Components

NetLogger Toolkit contains the following
components
NetLogger message format
NetLogger client library (C, Java, Python)
NetLogger visualization tools
NetLogger host/network monitoring tools
Source code (BSD-like open source) and binaries
are available at
http//www-didc.lbl.gov/NetLogger/
Additional critical component for distributed
applications
NTP (Network Time Protocol) or GPS host clock is
required to synchronize the clocks of all systems

6
NetLogger Methodology

NetLogger is both a methodology for analyzing
distributed systems, and a set of tools to help
implement the methodology.
You can use the NetLogger methodology without
using any of the LBNL provided tools.
The NetLogger methodology consists of the
following
All components must be instrumented to produce
monitoring These components include application
software, middleware, operating system, and
networks. The more components that are
instrumented the better.
All monitoring events must use a common format
and common set of attributes and a globally
synchronized timestamp
Log all of the following events Entering and
exiting any program or software component, and
begin/end of all IO (disk and network)
Collect all log data in a central location
Use event correlation and visualization tools to
analyze the monitoring event logs

7
When to use NetLogger

When you want to
do performance/bottleneck analysis on distributed
applications
determine which hardware components to upgrade to
alleviate bottlenecks
do real-time or post-mortem analysis of
applications
correlate application performance with system
information (ie TCP retransmission's)
works best with applications where you can follow
a specific item (data block, message, object)
through the system

8
When NOT to use NetLogger

Analyzing massively parallel programs (e.g. MPI)
Current visualization tools dont scale beyond
tracking about 20 types of events at a time
However, NetLogger is useful for watch a few
nodes, and for collecting MPI log files from all
the nodes
Analyzing many very short events
system will become overwhelmed if too many events
we typically use NetLogger to monitor events that
take gt .5 ms
e.g probably dont want to use to instrument the
UNIX kernel

9
NetLogger Client Library

There are 2 versions of the client library
Netlogger-lite only supports NetLogger format
(single .c and .h file)
Recommended for most use-cases
C, Python (native or via SWIG), and Java (native
only)
NetLogger-full supports all formats, reconnect,
backup URLs, and more
C, Python (via SWIG), Java (via SWIG)

10
NetLogger Event Format

NetLogger supports the following message formats
NetLogger format native ASCII format, easy to
read and parse, enforces data model
NetLogger binary for applications that generate
1000s of events / second
ULM for compatibility with NetLogger 1.0
XML if you want to run very slow -)

11
NetLogger Data Model

NetLogger message format is designed to help
enforce not just a common format, but also a
common data model
Why bother with a data model?
Makes it easier to write generic consumers of the
data for analysis, visualization
Relational databases need as much structure as
they can get
Elegance if we can describe a general principle,
we should!

12
NetLogger Message Format (1)

Each NetLogger message is called an event
The event is subdivided into fields
Each field has three parts, typecode, key, and
value
t DATE 2004-04-15T213601.425059
Typecodes
d Double (64 bits)
i Integer (32 bits)
l Long integer (64 bits)
s String (up to 4K)
t Timestamp (ISO8601 date format)

13
NetLogger Message Format (2)

The following fields are generated automatically
by NetLogger (all APIs)
DATE The time that this event occurred.
LVL The logging level of this event.
Fatal, Error, Warn, Info, Debug, Debug1, Debug2,
Debug3, or User. The default level is Info.
EVNT This field is the name of the event.
Any number of additional fields are allowed
Well, technically, 128 total is the limit

14
NetLogger Message Format (3)

Sample NetLogger event
t DATE 2004-04-15T213601.425059
s LVL Info
s HOST foo.lbl.gov
s PRGM test_program
s EVNT SendData
i Send.Size 49332
This says program named test_program on host
foo.lbl.gov performed event named SendData with a
size of 49332 bytes, on April 15 at 936 PM
(UTC).
User-defined data elements are used to store
information about the logged event - for example
EVNT SendData / Send.Size 49332
the number of bytes of data sent
EVNT netstat.retranssegs / NS.RTS 2
the number of TCP retransmits since the previous
event

15
NetLogger APIs

NetLogger Toolkit includes application libraries
for generating NetLogger messages
Can send log messages to
file
host/port (netlogd)
syslogd
C, Java, and Python APIs are currently supported
Compatible with standard Java and Python logging
packages
Perl, TCL, etc.
Easy to generate compatible format using string
print functions

16
NetLogger C API
17
C API deployment

Designed to be lightweight
One .c and one .h file
To build from source, also need
netlogger_stdint.h for portability
lt 3000 lines of code, total
Can just copy into your source code go

18
NetLogger C API Design
Format a modules event at a given logging level
for a given target type
Format an event belonging to a module to a
stream using write() with variable arguments
public interface exists for each layer
Abstraction
Format a stream of records
Format a record into a string
19
Basic C API concepts

Module
Name for a globally accessible output
destination. Multiple modules may be active at
any one time.
Logging levels
Each event has a level, 0..9 are reserved, but
any posint is OK (appears as NL_USER in output)
Each module has a level, meaning only events at
this level of detail or lower levels of details
(henceforth just lower levels) will be written
Target types
application, dbg, path, node or any
user
Determines additional auto-generated fields, such
as file name and line number for dbg
Determines required fields, such as SRC and DST
for path

20
C API Example

// create module named dbg
NL_logger_module("dbg", NULL, NL_LVL_INFO,NL_TYPE_
DBG, "")
// write a loop.start event to the dbg module
NL_debug("dbg","loop.start","starti stopi", 0,
10)
for ( ctr0 ctr lt 10 ctr )
// wrap do_something() with some log events
NL_debug1("dbg","do.something.start","inde
xi", ctr)
do_something()
NL_debug1("dbg","do.something.end","index
i", ctr)
// write a loop.end event
NL_debug("dbg","loop.end","starti stopi", 0,
10)
// close all log modules
NL_logger_del()

21
Sample output (1)

If you compile and run this example, you will get
no output!
Reasons
Destination for logging was specified as NULL
Message level (debug) is more detailed than the
default message level for the module
(NL_LVL_INFO)
Both of these can be changed without even
re-compiling the application

gcc simple_example.c ./a.out huh?!
22
C API, Dynamic Log Locations / Levels

The environment variable NL_DEST provides a
default location for logging
NL_DESTx-netlog//some.remote.host15001
Changing log levels on the fly is very useful for
debugging
Very simple to do this with C API
NLCONFIGLEVEL/tmp/level
echo 3 gt /tmp/level
You can even do per-module levels
NLCONFIGLEVEL/tmp/level
echo 5 dbg\n3 gt /tmp/level

23
C API Example output (2)

Applying what we now know to the example

gcc simple_example.c export
NLCONFIGLEVEL/tmp/dbglevel echo 6 gt
/tmp/dbglevel export NL_DEST- stdout
./a.out t DATE 2004-09-29T224238.759067 s LVL
Debug s HOST 131.243.2.143 s TGT dbg s EVNT
loop.start s FILE example.c i LINE 9 s FN
main i start 0 i stop 10 t DATE
2004-09-29T224238.759268
s LVL Debug1 s HOST 131.243.2.143 s TGT dbg s
EVNT do.something.start s FILE example.c i
LINE 11 s FN main i index 0 t DATE
2004-09-29T224238.759326 s LVL Debug1 s HOST
131.243.2.143 s TGT dbg s EVNT
do.something.end s FILE example.c i LINE 13 s
FN main i index 0
etc..
24
C API Performance

To /dev/null
250,000 events/sec
To a file in /tmp
95,000 events/sec
These results were with the highest API layer.
Using the next layer down gives about a 10
speedup.
These tests were on a faster machine than the
Python and Java ones

25
NetLogger Python API
26
Python API Design
Python stdlib logging module
wraps
nllogging.py
nllite.so C library
calls
nllite.py
calls (optionally)
27
Sample usage of nllogging.py

import logging
import sys
import gov.lbl.dsd.netlogger.nllogging as \
nllogging
create a logging 'channel'
logger logging.getLogger('sample')
proxy Handler class w/ NLHandlerProxy
logger.addHandler(nllogging.NLHandlerProxy(logging
.StreamHandler))
write some messages
logger.warn("this log message is ignored")
for i in range(10)
logger.warn("hi","foo"i,"bar"i/2.0))

28
Sample usage of nllite.py

import gov.lbl.dsd.netlogger.nllite as \ nllite
get file object for NetLogger URL
ofile nllite.urlfile("-")
open log stream
log nllite.LogOutputStream(ofile)
write some messages
for i in range(10) log.write("hi","foo"i,"ba
r"i/2.0,
levelnllite.Level.WARN)

29
Python API Performance
logging to a file in /tmp
nllite.py API
nllogging.py API
Normal usage
pure Python
800
7000
25000
n/a
Python / C with SWIG
High-perf usage
Units events/second
30
Choosing a Python API

Use the nllogging API if you
Want compatibility with existing instrumentation
Dont log more than a few dozen events a second
from any one process
Want to use the log4j-style configuration files
built in to this interface
Use the nllite API if you
Are sometimes logging hundreds of events a second
from a single process
Dont need log4j config files
Want to write over a socket directly
Need the C librarys dynamic levels

31
NetLogger Java API
32
Java API Design Goals

Interoperate with log4j and commons logging APIs
Logging calls should not take more than one or
two lines of code
Efficiency

33
Java API Usage

Java code
public static logSomething(void)
private static final org.apache.commons.logging.L
og log org.apache.commons.logging.LogFactory.ge
tLog(nl")
for (int i 0 i lt 10 i)
log.debug(new LogMessage("dbg", "my.event").
add("index", i))
Log4j.properties
Send nl' messages to the NETLOGGER appender
log4j.logger.nlDEBUG, NETLOGGER
Append to file
log4j.appender.NETLOGGERorg.apache.log4j.FileAppe
nder
log4j.appender.NETLOGGER.File/tmp/foo1

34
Java API Performance
Format-only
Write to /tmp
New message every call
Normal case
45,000
90,000
Reuse message object
800,000
80,000
HotSpot test?
Unitsevents/sec
35
Java API Deployment
netlogd (collect log files)
WAN
Log4J properties file
nlforward (send files to net)
LogMessage.java (nllite.jar)
disk
Will talk about this part next
APP
36
Using NetLogger for Distributed Applications
37
Recommended approach
netlogd (collect log files)
WAN
node
N
38
Why not just use a socket?

Distributed communication is unreliable
Putting reconnect and buffering logic into the
instrumentation library is not a good idea
Bloats and complicates the code
Cleanup is a nightmare
Who deals with the file if the app crashes?
Separating it out has many advantages
Writing log files will have almost no impact on
application
No chance that network congestion will cause
application to block
Logged events will eventually get sent to
netlogd, even if program exits or crashes

39
nlforward

Scan a directory for files matching some glob or
regex pattern, and forward them to a NetLogger
URL (file, socket)
Scan interval is configurable to reduce load
Can optionally age out files that havent been
modified in some time period
If you want, can even remove them!
Example
nlforward -e 600 -s 5 ./'.log' -p fwd.state
x-netlog//some.host14380
Forward all files ending in .log to
some.host14380, scanning every 5 sec for
changes and timing out files after 10 min.

40
netlogd

Use netlogd to collect NetLogger messages at a
central host
use to avoid the need to sort/merge several log
files from several places

Network 3
Network 2
Network 1
netlogd
NetLogger data
41
NetLogger Event Life Lines
42
Event ID

In order to associate a group of events into a
lifeline, you must assign an event ID to each
NetLogger event
Sample Event Ids
file name
block ID
frame ID
user name
host name
combination of the above
etc.

43
Sample NetLogger Use with Event IDs

include nl_log.h
main()
int i0, size 65536, num_blocks 100
NL_logger_module(foo",NULL, NL_LVL_INFO,
NL_TYPE_APP, "")
for ( i0 i lt num_blocks i )
/ read block of data)
NL_write(INFO, foo",read.start","indexi
read.sizei", i, size)
read_data(i)
NL_write(INFO, foo",read.end","indexi
read.sizei", i, size)
/ process block of data /
NL_write(INFO, foo",process.start","indexi",i
,)
process_data(i)
NL_write(INFO, foo",read.start","indexi
read.sizei", i, size)
/ write results /
NL_write(INFO, foo",write.start","indexi
write.sizei", i, size)
send_data(i)
NL_write(INFO, foo",write.start","indexi
write.sizei", i, size)

44
NetLogger Host/Network Tools

Wrapped UNIX network and OS monitoring tools to
log interesting events using the same log
format
netstat (TCP retransmissions, etc.)
vmstat (system load, available memory, etc.)
iostat (disk activity)
ping
These tools have been wrapped with python
programs which
parse the output of the system utility
build NetLogger messages containing the results
Also have a NetLogger wrapper for Ganglia

45
Sample NetLogger System Monitoring Tool

Example nl_vmstat -t 60 -d 5000 -m 2
logger.lbl.gov
program will run vmstat every 5 seconds for 1
hour, and send the results to netlogd on host
logger.lbl.gov
Generates the following information
CPU usage by User
CPU usage by System

46
NetLogger Visualization Tools

Exploratory, interactive analysis of the log data
has proven to be the most important means of
identifying problems
this is provided by nlv (NetLogger Visualization)
nlv functionality
can display several types of NetLogger events at
once
user configurable which events to plot, and the
type of plot to draw (lifeline, load-line, or
point)
play, pause, rewind, slow motion, zoom in/out,
and so on
nlv can be run post-mortem or in real-time
real-time mode done by reading the output of
netlogd as it is being written

47
NLV Graph Types

nlv supports graphing of points, load-lines,
and lifelines

48
NLV Screenshot
Menu bar
Title
Scale for load-line/ points
Events
Zoom box
Time axis
Max window size
Legend
Window size
Playback speed
Zoom-box actions
Summary line
Playback controls
Zoom window controls
You are here
49
NLV Configuration

NLV is very flexible, with many options settable
in the configuration file.
Format
eventset /-eventset_name
type ltline,point,loadgt
id list of ULM field names used to determine
which NetLogger messages get grouped into the
same graph primitive
group list of ULM field names which will be
mapped to the same color
val field_name min_val max_val
annotate list of field names to display in
with annotate option
events list of all event IDs in this
lifeline
Each nlv graph object needs to be defined by an
eventset
Events and event-sets both use "" and "-" to
indicate default (i.e. on startup) visibility

50
Example NLV Configuration

display vmstat info as a loadline
eventset VMSTAT
type load
loadline constructed from messages with the
same HOST and NL.EVNT
id HOST NL.EVNT
messages with the same HOST get the same color
group HOST
list of NL.EVNT values in this set_
events VMSTAT_SYS_TIME VMSTAT_USER_TIME
display netstat TCP retransmits as a point
eventset NETSTAT
type point
truncate values outside the range 0 to 999
val NS.VAL 0.0 999.0
point constructed from messages from the same
HOST and PROG
id HOST PROG
messages with the same HOST get the same color

51
Example NLV Configuration

display server data as a lifeline
eventset SERVER_READ
type line
lifeline constructed from messages from the
same client and server
id CLIENT_HOST DPSS.SERV
messages with the same DPSS.SERV get the same
color
group DPSS.SERV
events
APP_SENT
DPSS_SERV_IN
DPSS_START_READ
DPSS_END_READ
DPSS_START_WRITE
APP_RECEIVE

52
How to Instrument Your Application

Youll probably want to add a NetLogger event to
the following places in your distributed
application
before and after all disk I/O
before and after all network I/O
entering and leaving each distributed component
before and after any significant computation
e.g. an FFT operation
before and after any significant graphics call
e.g. certain CPU intensive OpenGL calls
This is usually an iterative process
add more NetLogger events as you zero in on the
bottleneck

53
Does NetLogger affect application performance?

Only if you use it incorrectly or log too much
There are several things to be careful of when
doing this type of monitoring
If logging to disk, dont log to a nfs mounted
disk
best to log to /tmp, which may actually be RAM
(Solaris)
Dont send log messages to a slow (i.e. 10BT) or
congested network, as youll just make it worse
log to a local file instead
Sample NetLoggerWrite Performance 100000
calls/sec
can make 1000 NetLoggerWrite calls / sec and only
effect your application by 1

54
NetLogger Case Studies
55
Example HPSS Storage Manager Application

NetLogger was used to test and verify the results
of a Storage Access Coordination System (STACS)
by LBNLs Data Management Group
STACS is designed to optimize the use of a disk
cache with an HPSS Mass Storage system, and tries
to minimize tape mount requests by clustering
related data on the same tape
NetLogger was used to look at
per-query latencies
to show that subsequent fetches of spatially
clustered data "hit" in the cache.
(http//gizmo.lbl.gov/sm/)

56
STACS Instrumentation Points
Monitoring Points A) request arrives at HPSS B)
start transfer from tape C) tape transfer
finished D) file available to client E) file
retrieved by client F) file released by client
57
NLV for STACS Tracking File Requests
58
Tracking Files and System Performance
59
Example Parallel Data Block Server

The Distributed Parallel Storage Server (DPSS)
provides high-speed parallel access to remote
data
Unique features of the DPSS
On a high-speed network, can actually access
remote data faster that from a local disk
70 MB/sec (DPSS) vs 22 MB/sec (local disk)
Only need to send parts of the file currently
required over the network
e.g. client may only need 100 MB from a 2 GB
data set
analogous to http model
NetLogger was used for performance tuning and
debugging of the DPSS

60
DPSS Cache Architecture
61
NetLogger Results for the DPSS
62
NetLogger Results for the DPSS
63
NetLogger Results for the DPSS over a WAN
64
Example NLV of DPSS with a HENP client
65
Example Babar data analysis 2 nodes with
Objectivity Error
66
Example Matisse Project
67
Example Combined Host and Application Monitoring
68
Future directions

Data queries
Data grouping
Discovery queries
Statistical transformations

Subject, Operation, Parameters
Metadata ref Data
Monitoring service
Metadata refs
Host, network sensors
Instrumentation
Metadata can be inline or by reference
69
Getting NetLogger