A Mobile-Agent-Based Performance-Monitoring System at RHIC - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

A Mobile-Agent-Based Performance-Monitoring System at RHIC

Description:

Motivation for a new monitoring system. Design of the Instrumentation system ... Nightly backups. Weekly. de-frag. Results indicate server load, client config. 13 ... – PowerPoint PPT presentation

Number of Views:13

Avg rating:3.0/5.0

Slides: 18

Provided by: WinC150

Category:

more less

Transcript and Presenter's Notes

Title: A Mobile-Agent-Based Performance-Monitoring System at RHIC

1
A Mobile-Agent-Based Performance-Monitoring
System at RHIC

Richard Ibbotson

2
Overview

Motivation for a new monitoring system
Design of the Instrumentation system
Use of mobile agents (mobile programs vs remote
procedures)
How it works, what it does and doesnt do
Practical experiences with a test instrument
What works well and what doesnt
Future enhancements

3
Monitoring System Purpose

The system should
Provide performance monitoring at service-level
End-to-end tests yielding mixed information on
the functioning of several services
Track performance changes during configuration
changes
Monitor current health of system
Provide some error-tracking/reporting
capabilities
Be a tool for administrators experimenters
It will not
Provide detailed system information for fault
diagnosis (system-specific, vendor-supplied tools
already exist)

4
Desired Features of the System

View / compare past and current measurements
Inspect correlations between metrics
Allow variation of sampling rate
Automatically execute scheduled measurements
Can perform measurements on demand at shorter
intervals
Perform OS-independent measurements
Use a small fraction of available resources

5
Components of the System

Instruments which perform measurements
Centralized database of Instruments (code) and
time-stamped results
Allows simple addition of new metrics
Allows previously run tests to be reproduced
Mechanism for remote execution of Instruments
IBM Aglets mobile-agent system
(http//www.trl.ibm.co.jp/aglets)

parameters
code
monitor
sequence of measurements
6
Mobile Agents vs. RPC

Remote Procedure Call

Users system
Remote system
Datasetto search
Local search utility
Search request
A pre-defined procedure on remote host executes
and returns result

Mobile Agent

Increased network load for large agents
Remote system
Users system
Daemon on remote host accepts agent and allows
execution
Datasetto search
Search request
Local search utility
7
Advantages of Mobile Agents

Metrics can be defined at any time, and
implemented on the central host
Performance is measured on the relevant host
Aglets system is Java-based, providing
platform-independent execution
Sophisticated security model exists for
restricting actions of the agents

8
Use of Mobile Agents In Monitoring

Simplest approach, Single-Remote-Host was
implemented for initial configuration
Waiting between tests is done on central server
for reliability

Target host
Itineraryapproach
SingleRemote Host approach
Central server
Target host
Target host
Central server
Target host
Target host
Target host
9
Anatomy of an Instrument
The code defining a specific implementation of an
Instrument is ? 30 lines
Inherits from
Inherits from
10
Test Instrument File Access

NFS access time (write) used as test of concept
File size, location (file-system) are passed as
parameters in database (specified at run-time)
Measurements are started by automated process as
specified by Schedule table in database
Tested access to one file-system on several
client computers
Linux (PIII) system with NFSv2, 1KB blocksize
Linux (PIII) system with NFSv2, 8KB blocksize
Linux (PIII) system with NFSv3
Solaris system with NFSv3

11
Report Generation Tool

Sample tests are carried out automatically by a
Scheduler Aglet
Reports are requested via an html form. Users
specify a test-type, parameter-set and target
host. A Perl cgi-script queries the database and
plots results using Gnuplot.

12
Sample Report for File access
Results indicate server load, client config
Nightly backups
Weekly de-frag
13
Problems With the Mobile Agents

Transfer interrupted when several agents move to
/ from the same host within ? 1-2 sec
Small size of Aglets currently used (?15KB)
cannot explain the effective dead-time
The failure is presented to the Aglet as a
refusal (can detect, wait and retry)
Congestion at central host can be relieved by
following a circuit before returning (multiple
hosts)

14
Future System Development

Solve transfer interruption problem
Development of other mobility patterns
NFS read-access may be tested by writing on one
host and timing a read on a different host (to
avoid caching)
Use of itinerary can ease network congestion at
the central server
A tracking / error-reporting system is being
developed, and will be connected to a paging
system

15
Summary

Initial implementation is proving useful
Mobile agent architecture adds design work but
eases implementation, adds flexibility
Transfer interruption causing scalability
problems, but not insurmountable
Plan to have expanded system running before
data-taking begins