Title: MonALISA
1An Agent Based, Dynamic Service System to
Monitor, Control and Optimize Distributed
Systems
March 2006
Iosif Legrand California Institute of
Technology
2The MonALISA Framework
- MonALISA is a Dynamic, Distributed Service System
capable to collect any type of information from
different systems, to analyze it in near real
time and to provide support for automated control
decisions and global optimization of workflows in
complex grid systems. - The MonALISA system is designed as an ensemble
of autonomous multi-threaded, self-describing
agent-based subsystems which are registered as
dynamic services, and are able to collaborate and
cooperate in performing a wide range of
monitoring tasks. These agents can analyze and
process the information, in a distributed way,
and to provide optimization decisions in large
scale distributed applications.
3MonALISA is A Dynamic, Distributed Service
Architecture
- The framework is based on a hierarchical
structure of loosely coupled agents acting as
distributed services which are independent
autonomous entities able to discover themselves
and to cooperate using a dynamic set of proxies
or self describing protocols. - An agent-based architecture provides the ability
to invest the system with increasing degrees of
intelligence to reduce complexity and make
global systems manageable in real time. For an
effective use of distributed resources, these
services provide adaptability and
self-organization.
4The MonALISA Discovery System Services
Fully Distributed System with no Single Point of
Failure
Global Services or Clients
Clients , HL services repositories
Dynamic load balancing Scalability
Replication Security AAA for Clients
Proxies
AGENTS
Distributed System for gathering and Analyzing
Information.
MonALISA services
Distributed Dynamic Discovery- based on a lease
Mechanism and REN
Network of JINI-LUSs Secure Public
5Monitoring Grid sites, Running Jobs, Network
Traffic and Topology
JOBS
TOPOLOGY
ACCOUNTING
6Monitoring OSG Resources, Jobs Accounting
Running Jobs
Accounting
42 SITES 4 000 Nodes ( 10 000 CPUs)
Thousands of Jobs 60 000 parameters
7Monitoring CMS Jobs
Running Jobs per Site
Type of Jobs
Rate for Processing Events per Site
Integrated No. of Events
8FTP Data Transfer between GRID sites
Total FTP Traffic per VO
9Monitoring Internet2 backbone Network
- Test for a Land Speed Record
- 7 Gb/s in a single TCP stream from Geneva to
Caltech
10The UltraLight Network
BNL ESnet IN /OUT
11Monitoring The GLORIAD Ring
12Monitoring Network Topology Latency, Routers
13Bandwidth Challenge at SC2005
151 Gbs
500 TB Total in 4h
14Available Bandwidth Measurements
- one-to-one realtime bandwidth estimation.
15Monitoring the Execution of Jobs and the Time
Evolution
SPLIT JOBS
LIFELINES for JOBS
Summit a Job
DAG
16ApMon Application Monitoring
- Library of APIs (C, C, Java, Perl. Python) that
can be used to send any information to MonALISA
services - Flexibility, dynamic configuration, high
communication performance
dynamic reloading
Config Servlet
- Automated system monitoring
- Accounting information
MonALISA hosts
APPLICATION
MonALISA Service
ApMon
APPLICATION
MonALISA Service
ApMon
System Monitoring
No Lost Packages
ApMon configuration generated automatically by a
servlet / CGI script
ApMon Config
load1 0.24
processes 97
pages_in 83
17End User / Client Agent LISA- Localhost
Information Service Agent
- Authorization
- Service discovery
- Local detection of the hardware and software
configuration - Complete end-system monitoring Per-process load,
I/O and network throughputs, etc. - End-to-end performance measurements
- Will act as an active listener for all events
related with the requests generated by its local
applications.
18MonALISA agents to create on demand on an
optical path or tree
Discovery Secure Connection
2
3
ML Demon
1
Time to create a path on demand lt1s
independent of the location and the number of
connections
Control and Monitor the switch
Runs a ML Demon gtml_path IP1 IP4 copy file IP4
4
ML proxy services used in Agent Communication
19Monitoring and Controlling Optical Planes
Controlling
Port power monitoring
20Monitoring Optical Switches Agents to Create on
Demand an Optical Path
21The Functionality of the VINCI System
ML proxy services
Layer 3
ROUTERS
Agent
ETHERNET LAN-PHYor WAN-PHY
Layer 2
Agent
Agent
DWDM FIBER
Layer 1
Agent
Agent
Site A
Site B
Site C
22 Vertical Integration of Services
Real Time Correlations Feedback
between Major Layers is Crucial for Dynamic
Load Balancing , Adaptability and
Self-Organization
MPLS/GMPLS/TL1
Networking
Job 31
Farms Data Servers
Job1
Job3
Job2
Job 32
Job
Applications
Note Grid Services will have to Interact
Negotiate with Network Services for Network
Resources
Job
User
On-Demand, Dynamic and Self Adapting use of
networking resources
23Communities using MonALISA
- Major Communities
- OSG
- CMS
- ALICE
- D0
- STAR
- VRVS
- LGC RUSSIA
- SE Europe GRID
- APAC Grid
- UNAM Grid
- ABILENE
- ULTRALIGHT
- GLORIAD
- LHC Net
- RoEduNET
ABILENE
- Demonstrated at
- SC2003
- Telecom 2003
- WSIS 2003
- SC 2004
- I2 2005
- TERENA 2005
- IGrid 2005
- SC 2005
- CENIC 2006
- MonALISA
- Running 24 X 7 at 250 Sites
- Collecting 250,000 parameters in near real-time
- Update rate of 25,000 parameter updates per
second - Monitoring
- 12,000 computers
- gt 100 WAN Links
- Thousands of Grid jobs running con- currently
-
CMS-DC04
-
GRID3
VRVS
ALICE
24 The MonALISA Architecture Provides
- Distributed Registration and Discovery for
Services and Applications. - Monitoring all aspects of complex systems
- System information for computer nodes and
clusters - Network information WAN and LAN
- Monitoring the performance of Applications, Jobs
or services - The End User Systems, its performance
- Video streaming
- Can interact with any other services to provide
in near real-time customized information based on
monitoring data - Secure, remote administration for services and
applications - Agents to supervise applications, trigger alarms,
restart or reconfigure them, and to notify
other services when certain conditions are
detected. - The MonALISA framework is used to develop higher
level decision services, implemented as a
distributed network of communicating agents, to
perform global optimization tasks. - Graphical User Interfaces to visualize complex
information