Title: Monitoring of Interactive Grid Applications
1Monitoring of Interactive Grid Applications
Marian Bubak with Bartosz Balis, Wlodek Funika,
Tomasz Szepieniec, Roland Wismueller Institute
of Computer Science and ACC CYFRONET AGH, Cracow,
Poland LRR-TUM, Muenchen, Germany Institute for
Software Science, University of Vienna,
Austria EU CrossGrid Project www.eu-crossgrid.org
2Outline
- Motivation - CrossGrid in a nutshell
- Applications and their requirements
- Architecture
- Tools for applications development
- Monitoring system
- Concept of Grid application monitoring
- Grid extensions for OMIS
- Design of OCM-G
- Security
- Status
3 EU Funded Grid Project Space (Kyriakos
Baxevanidis)
4CrossGrid Collaboration
Ireland TCD Dublin
Poland Cyfronet INP Cracow PSNC Poznan ICM
IPJ Warsaw
Germany FZK Karlsruhe TUM Munich USTU Stuttgart
Netherlands UvA Amsterdam
Slovakia II SAS Bratislava
Austria U.Linz
Spain CSIC Santander Valencia RedIris UAB
Barcelona USC Santiago CESGA
Greece Algosystems Demo Athens AuTh Thessaloniki
Portugal LIP Lisbon
Italy DATAMAT
Cyprus UCY Nikosia
5Biomedical Application
CT / MRI scan
Segmentation
Visualization
LB flow
simulation
Medical
Medical
HDB
VE
DB
DB
WD
PC
PDA
10 simulations/day 60 GB 20 MB/s
Interaction
6VR-Interaction
7Cascade of Flood Simulations
Data sources
Meteorological simulations
Hydrological simulations
Users
Hydraulic simulations
Output visualization
8Example of the Flood Simulation - Flow and Water
Depth
9Distributed Data Analysis in High Energy Physics
- Objectives
- Distributed data access
- Distributed data mining techniques with neural
networks - Issues
- Typical interactive requests will run on o(TB)
distributed data - Transfer/replication times for the whole data
about one hour - Data transfers once and in advance of the
interactive session - Allocation, installation and set-up of
corresponding database servers before the
interactive session
10Weather Forecast and Air Pollution Modeling
- Distributed/parallel codes on the Grid
- Coupled Ocean/Atmosphere Mesoscale Prediction
System - STEM-II Air Pollution Code
- Integration of distributed databases
- Data mining applied to downscaling weather
forecast
11Key Features of CrossGrid Applications
- Data
- Data sources and data bases geographically
distributed - To be selected on demand
- Processing
- Large processing capacity required both HPC
HTC - Interactive
- Presentation
- Complex data requires versatile 3D visualisation
- Support for interaction and feedback to other
components
12Overview of the CrossGrid Architecture
1.4 Meteo Pollution
1.3 Data Mining on Grid (NN)
1.3 Interactive Distributed Data Access
1.2 Flooding
1.1 BioMed
Applications
3.1 Portal Migrating Desktop
2.4 Performance Analysis
2.2 MPI Verification
2.3 Metrics and Benchmarks
Supporting Tools
Applications Development Support
MPICH-G
1.1, 1.2 HLA and others
App. Spec Services
1.1 Grid Visualisation Kernel
1.1 User Interaction Services
3.1 Roaming Access
3.2 Scheduling Agents
3.3 Grid Monitoring
3.4 Optimization of Grid Data Access
DataGrid Replica Manager
Globus Replica Manager
Generic Services
GRAM
GSI
Replica Catalog
GIS / MDS
GridFTP
Globus-IO
DataGrid Job Submission Service
Replica Catalog
Fabric
Resource Manager (CE)
Resource Manager
Resource Manager (SE)
Resource Manager
3.4 Optimization of Local Data Access
CPU
Secondary Storage
Instruments ( Satelites, Radars)
Tertiary Storage
13Tool Environment
manual information transfer
14Tools Environment and Grid Monitoring
Applications
Portals (3.1)
G-PM Performance Measurement Tools (2.4)
MPI Debugging and Verification (2.2)
Metrics and Benchmarks (2.4)
Grid Monitoring (3.3) (OCM-G, RGMA)
Application programming environment
requires information from the Grid about current
status of applications and it should be able to
manipulate them
15Monitoring of Grid Applications
- Monitor obtain information on or manipulate
target application - e.g. read status of applications processes,
suspend application, read / write memory, etc. - Monitoring module needed by tools
- Debuggers
- Performance analyzers
- Visualizers
- ...
16CrossGrid Monitoring System
17Concept of Grid Applications Monitoring
- OCM-G Grid-enabled OMIS-Compliant Monitor
- OMIS On-line Monitoring Interface Specification
- Application-oriented
- information about running applications
- On-line
- information collected at runtime
- immediately delivered to consumers
- Information collected via instrumentation
- activated / deactivated on demand
- information of interest defined at runtime (lower
overhead)
18Monitoring Autonomous System
- Separate monitoring system
- Tool / Monitor interface OMIS
19Why OMIS ?
- Universal generic interface supporting different
tools - May be extended to add new grid-oriented
functionality - Fits to the GGFs Grid Monitoring Architecture
(GMA) - e.g., event-action paradigm enables
data-subscription scenario
20Very Short Overview of OMIS
- Target system view
- hierarchical set of objects
- nodes, processes, threads
- For the Grid new objects sites
- objects identified by tokens, e.g. n_1, p_1, etc.
- Three types of services
- information services
- manipulation services
- event services
21OMIS Services
- Information services
- obtain information on target system
- e.g. node_get_info obtain information on nodes
in the target system - Manipulation services
- perform manipulations on the target system
- e.g. thread_stop stop specified threads
- Event services
- detect events in the target system
- e.g. thread_started_libcall detect invocations
of specified functions - Information manipulation services actions
22OMIS Requests
- Services are combined into two types of
monitoring - requests
- Unconditional requests
- to be executed immediately
- executed only once
- Conditional requests
- to execute actions whenever event occurs
- actions can be executed multiple time
23OMIS Unconditional Requests
Actions
Operands
stop thread t_1
24OMIS Conditional Requests
thread_started_libcall(t_1, MPI_Send)
counter_inc(c_1)
Event
Operands
Actions
whenever thread t_1 invokes MPI_Send, increment
counter c_1
25New OMIS Services for Grid (1/3)
- Services related to the new object site
- site_attach attach to a site
- site_get_info return information on a site
- site_get_nodelist return a list of nodes on a
site - Services for application-related metrics
- hardware_read_counter return value of a
hardware performance counter
26New OMIS Services for Grid (2/3)
- Services for infrastructure-related metrics
- network_get_info return information on a
network connection - Benchmark-related services
- benchmark_get_result return a result of a
benchmark - benchmark_execute execute benchmark
27New OMIS Services for Grid (3/3)
- Services for application handling
- app_attach attach to an application
- app_attach2 attach to an application
- app_get_list get a list of running applications
- app_get_proclist return process list of an
application - Services related to probes
- thread_executes_probe a probe has been executed
28Grid-enabled OMIS-Compliant Monitor
- Features
- Permanent Grid service
- External interface OMIS
- Architecture two types of components
- Local Monitors
- Service Managers
29Components of OCM-G
- Service Managers
- one per site in the system
- permanent
- request distribution
- reply collection
- Local Monitors
- one per node, user pair
- transient (created or destroyed when needed)
- handle local objects
- actual execution of requests
30Monitoring Environment
- OCM-G Components
- Service Managers
- Local Monitors
- Application processes
- Tool(s)
- External name service
- Component discovery
31OCM-G Unconditional Requests
- Immediate response from the OCM-G
32OCM-G Conditional Request
- Two stages
- Request registration (msgs 1-1.2.2)
- Request executed when event occurs (msgs 2-2.3.1)
33OCM-G SM and LM Modules
- Core
- Initialization of the OCM-G components
- Initial preprocessing of all messages
34OCM-G SM and LM Modules
- Communication
- Uniform Interface for component-to-component
communication
35OCM-G SM and LM Modules
- Internal localization
- Internal name service
- Tokens
36OCM-G SM and LM Modules
- External localization
- Uniform access to external information services
37OCM-G SM and LM Modules
- Services
- Implementation of OMIS services
38OCM-G SM and LM Modules
- Request management
- OMIS requests analysis and distribution
- Reply handling
39OCM-G SM and LM Modules
- Application context
- Represents information about applications
40OCM-G SM and LM Modules
- User
- User management
- Authentication and authorization
41OCM-G - SM and LM Modules
- Application module
- Part of OCM-G linked to the application
42Security Issues
- OCM-G components handle multiple users, tools and
applications - possibility to issue a fake request (e.g., posing
as a different user) - authentication and authorization needed
- LMs are allowed for manipulations
- unauthorized user can do anything
43Security - Solutions
- LMs are user-bound
- Run as user processes
- Security ensured by OS mechanisms
- Service Managers are permanent
- Run as unprivileged processes (nobody)
- User Grid Id checked internally (partial
security) - Grid certificates for users, tools and SMs
incorporated (ultimate security)
44Status
- OCM implementation for clusters
- Software requirements specification
- OMIS extensions for the Grid
- OCM-G concept OO design
- 1st prototype in December 2002
- Available via a public software licence
- More www.eu-crossgrid.org