Title: Autonomous Configuration of Grid Monitoring Systems
1Autonomous Configuration of Grid Monitoring
Systems
- Kenichiro Shirose1, Satoshi Matsuoka1,3,
Hidemoto Nakada1,2, Hirotaka Ogawa2 - 1 Tokyo Institute of Technology
- 2 National Institute of Advanced Industrial
Science and Technology - 3 National Institute of Informatics
2Grid Monitoring System (1/2)
- In a practical Grid deployment, monitoring are
must at all levels - Resource Brokering
- Accounting and Auditing
- System Administration
- User feedback
3Grid Monitoring System (2/2)
- Monitoring system components fundamentally
distributed and subject to faults and
reconfigurations - Components heavily mutually dependent
- Too many monitors ? Probing effect potential
problem
Data request program
Data collect program
Data collect program
sensor
sensor
sensor
sensor
sensor
4Goal
- Design and implementation of autonomically
managed Grid monitoring system - Propose a framework for autonomic management of
Grid monitoring systems - Implement a prototype that autonomously
configures and tolerates faults of NWS sensors
5Grid Monitoring System Architecture
Consumer
- NWS Rich et al.
- MDS Globus Alliance
- R-GMA EU DataGrid Project
- Hawkeye Condor Project
Directory Service
Producer
Consist from common components (Grid Monitoring
Architecture 02 GGF) Directory Service
supports publication discovery of components
info Producer retrieves data from sources
makes them available to others Consumer
receives data from Producer processes
them Sources of Data collect data
(We employ NWS as the target, but the results are
generalizable)
6Overview of NWS
- NWS 99 Wolski measures CPU and network on
the Grid and forecasts their future values
Nameserver manages info on NWS components
(Client inquires Nameserver)
(Client requests monitored data)
Memoryhost gathers data from sensor
Sensor programs run on each machine and send data
to Memoryhost
7NWS clique-based network performance measurement
- Eliminate n2 traffic pressure of end-to-end
network measurements - Sensors on representative nodes measure
end-to-end performance amongst cliques - Measured value between the representing nodes is
returned as an approximation between any pair of
nodes from the respective clique pair
Representing node
clique
clique
8Requirement of autonomic Management on the Grid
- Need to be aware of correct configuration
fault recovery tactics based on various
information including its own probing (especially
the status of nodes and the network topology) - Additional requirements
- Applicability Support of multiple, existing
systems - Scalability Scale to numerous number of nodes
- Autonomy Manage with little or no user
intervention - Extensibility Possibility to incorporate various
autonomic, self-management features
9Four Steps of autonomic Management
Loop with given time interval
Forecast network topology check status of
nodes and processes
Reconfiguring groups when nodes are added,
removed, etc.
Form node groups
For all nodes at startup, For halted components
at recovery
Decide configuration
Start up the components on assigned nodes
10Implementation of Autonomic Grid Monitoring
System Prototype
- Support autonomic configuration, execution and
recovery of NWS components - Input a list of nodes (with attribute info)
- Four action steps
- Measure RTT between nodes and collect PID of
components of NWS - Form node groups based on RTT
- (Re)configure the components
- Execute the components
11RTT Measurement
Each node runs the script to measure RTT to
others and collect PID of NWS components on the
machine
initializationscript
ICMP
ICMP
Resources
RTT PID data
Management node
Executes ping ps systematically in parallel in
a n-by-n fashion
12Forming node groups
- For each node, decide the most proximal node
based on RTT - initialize H is a set of nodes, i 1, m is an
element of H - Until H has no elements
- If m is an element of H (It doesnt belong to any
group) - Move m from H to Gi, m becomes the most proximal
node from m - Else if i 1 or m is an element of Gi (It
belongs to the newest group) - i i 1, Gi is empty set, m becomes an element
of H - Else (m is an element of Gj (j 1,2,i-1))
- Merge Gj and Gi, make Gi be empty, and let m be
an element of H
13Configuration of components and execution
- Decide the nodes on which NWS Nameserver
Memoryhost will run - Nameserver a node with most connectivity with
other nodes and minimum average RTT from other
nodes - Memoryhost a node which is the most proximal
from most number of nodes in each group - These nodes will serve as clique representatives
- Create a script to execute respective NWS
components on respective nodes
14Fault Recovery(1/2)
- Status of the nodes components
- Nodes whether data collection executed properly
- Components whether they are running
- The current prototype handles two types of
faults in NWS - The node itself is active, but component is not ?
restart component - The node executing Nameserver or Memoryhost is
inactive ? select alternative node
15Fault Recovery (2/2)
- When an alternative node for Nameserver or
Memoryhost is selected - Other Sensorhosts Memoryhosts must be notified
of the change - For Memoryhost select another node in the same
(clique) group - Sensors in the group will be restarted with new
configuration - For Nameserver an appropriate node will need
to be selected globally - All components in the system will be restarted
with the new configuration
16Evaluation Environment
- Install our prototype on the Titech Campus Grid
_at_ Tokyo Institute of Technology - Dedicated 15 PC clusters on campus (over 800
processors), production usage - Each cluster connect to SuperTITANET a
multi-gigabit campus network backbone - Node spec
- CPU PentiumIII 1.4Ghz X 2
- Memory 512MB1GB
- NIC 100Base-T (WAN, per node) Myrinet
(cluster-local LAN) - Use 1 node of each cluster for this
evaluation(SCore control node)
17Experiments
- NWS autonomic configuration quality and time
required - Fault Recovery
- Memoryhost fault scenario
18RTT measurement of thetest configuration
RTT in a campus under 0.450ms RTT between
campus over 0.450 ms
under 0.450ms
RTT measurement
from 0.450ms to 1000ms
over 1000ms
Oookayama
About 30km apart
Suzukakedai
19Result of autonomic NWS configuration
nameserver of NWS
Appropriately grouped the clusters into two
groups, one for each campus
memoryhost
sensorhost
Data flow from sensor to memoryhost
Representation of network performance measurement
Oookayama
Suzukakedai
20Configuration Time
Configuration time is O(N) (N number of
nodes) Most of time is spent on execution of ping
or NWS
21Configuration time
- Configuration time is O(N) (N of machines)
- Due to sequential exec. of RTT measurement NWS
components - By parallel execution, we could reduce this to be
O(1) (independent of the number of machines)
22Fault Recovery case of Memoryhost (1)
Nameserver of NWS
Memoryhost
Sensorhost
Data flow from sensor to Memoryhost
When this node goes down the management feature
detects it
23Fault Recovery case of Memoryhost (2)
Nameserver of NWS
Memoryhost
Sensorhost
Data flow from sensor to Memoryhost
New Memoryhost starts running and all
Sensorhosts in the same clique are restarted.
24Problem of Our Prototype Implementation
- Single point of failure in the system
- Some components are centrally managed
- Solutions
- Replication of management function
- Distributed management algorithms
- Excessive time required for data collection and
execution components - Solutions
- O(1) Pinging
- Parallel execution of each command
25Conclusion
- Proposed autonomic management of Grid monitoring
system - Consists of Grouping of machines, configuration
and halt recovery - Implement autonomic management feature for NWS
- On a testbed (15 PC clusters), initial
configuration in 2 min.
26Future Work
- Eliminate single point of failure
- Distributed management architecture
- Support for Grid monitoring systems in general
- Adaptation to large-scale environment
- Evaluation with a larger set of machines on a WAN
environment
27First and Second Step
- Forecast network topology check status of nodes
and processes - Using network performance metric check if nodes
monitoring components are accessible - Forms node groups
- Using proper algorithm for forming
- For new nodes, editing reforming group
membership
28Third and Fourth Step
- Decide configuration
- In initial configuration, for all machines
- In recovery configuration, for halted components
and other components related with them - Start up the components on assigned nodes
- Execute processes and register information of the
components into Information Services
29Data Collection Step (1/2)
- Executes ping ps systematically in parallel in
n-by-n fashion - Each node runs the scripts to measure RTT to the
others and collect PID of NWS components on the
machine - One node gathers all data which were generated by
resource nodes
30Example (1/5)
2
3
1
5
6
4
31Example (2/5)
A node was selected, the most Proximal node from
this belongs to H
2
3
1
5
6
4
G1
32Example(3/5)
The most proximal node from 6 was 5, so a new
element 3 was selected. 2 (the most proximal
node from 3) belongs H, so 2 will belongs to G2
2
3
1
G2
5
6
4
G1
33Example(4/5)
The most proximal node from 2 was 5, so G2 will
be merged to G1
2
3
1
G2
5
6
4
G1
34Example(5/5)
A new element 4 was selected, The most proximal
node from it is 1
2
3
1
G1
5
6
4
G2
35Forming Result
The most proximal node from 1 was 4. H is empty
set, so forming is over
2
3
1
5
6
4
36Fault Recovery case of Nameserver
nameserver of NWS
memoryhost
sensorhost
Data flow from sensor to memoryhost
37Fault Recovery case of Nameserver
nameserver of NWS
memoryhost
sensorhost
Data flow from sensor to memoryhost
New Nameserver starts running and all other
component are restarted
38Time for fault recovery
- For number of sites is 7, nameserver down case
- Reconfiguration 1 second,
- Restart 37 second
- (Data collection to be measured)
39Sorting Result