Performance Analysis of the Globus Toolkit Monitoring and Discovery Service MDS2 - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Performance Analysis of the Globus Toolkit Monitoring and Discovery Service MDS2

Description:

... caching at MDS2 GRIS can bypass the performance bottleneck and support ... Placing primary components at well-connected sites can improve performance too ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 25

Provided by: xuehai7

Category:

more less

Transcript and Presenter's Notes

Title: Performance Analysis of the Globus Toolkit Monitoring and Discovery Service MDS2

1
Performance Analysis of the Globus Toolkit
Monitoring and Discovery Service (MDS2)

Xuehai Zhang, University of Chicago
Dr. Jennifer Schopf, Argonne National Lab

2
Grid Monitoring and Information Services

Why are they important?
Resource selection, scheduling
Prediction, system status monitoring, event
notification
Few quantitative performance studies have been
done
Their performance study will help in
Deployment
Performance tuning
Development of future systems

3
Performance Study of MDS2

MDS2 is the most common information and
monitoring service in production Grids
Zhang, Freschl, and Schopf (HPDC 03)
Evaluated MDS2 scalability and compared with two
other services, R-GMA and Hawkeye
The approach is coarse grain and focuses on
end-to-end performance only
This study
Revisits MDS2 scalability at a finer granularity
using NetLogger instrumentation
Enables us to better understand what and where
are the performance bottlenecks

4
Outline

Problem
MDS2 and NetLogger instrumentation
Experimental setup
Experiment results and analysis
Conclusion and future work

5
Monitoring and Discovery Service (MDS2)

Part of the Globus Toolkit
Based on Lightweight Directory Access Protocol
(LDAP)
Uses a hierarchical architecture
Grid Index Information Service (GIIS)
Grid Resource Information Service (GRIS)
Information Providers (IPs)

6
NetLogger Instrumentation

NetLogger is a toolkit to debug distributed
applications and identify bottleneck
Developed at Lawrence Berkeley National Lab
Instruments applications by logging interesting
events at every critical point
We used NetLogger to divide the end-to-end path
of a MDS2 query into 7 phases

7
Outline

Problem
MDS2 and NetLogger instrumentation
Experimental setup
Experiment results and analysis
Conclusion and future work

8
Performance Topics

Topic 1 MDS2 GRIS vs. User
Two configuration scenarios
GRIS always caches data
GRIS never caches data
Topic 2 MDS2 GIIS vs. User
As a directory server, GIIS is configured to
always cache data

1
2
9
Experimental Setup

We deployed and studied MDS v2.2 and v2.4
Both were instrumented with NetLogger v2.0.13
Server-sided Testbed Lucky nodes at ANL
7 dual-processor Linux boxes
Hostname lucky0,1,3-7.mcs.anl.gov
lucky0 and lucky6 ran Linux kernel 2.4.10 and the
rest ran kernel 2.4.19
Two 1133 MHz Intel PIII CPUs (with a 512KB cache
per CPU) and 512 MB RAM
Interconnect is 100 Mbps Ethernet

10
Experimental Setup (contd)

Client-sided Testbed at University of Chicago
(UC)
20 Linux boxes
15 machines equipped with a 1208MHz uni-processor
and 256 MB RAM
5 machines with 756 MHz CPUs and 256 MB RAM
The simulation of concurrent users
Simulated by multiple processes evenly
distributed to all client machines
Continuous queries separated by 1-second wait
period
100Mbps network connects ANL and UC

11
Performance Metrics

Throughput
The average number of requests processed by a
MDS2 service component per second
Observed Response Time (ORT) and Request
Processing Time (RPT)
ORT the average time from the user sends out a
request till it gets the response calculated at
the client side
RPT the average time for a MDS2 service
component to handle a user request calculated at
the server side
ORT is always greater than RPT
ORTTClient-connect TClient-Bind RPT
TClient-EndConnect
RPTTServer-InitSearch TServer-SearchIndex
TServer-Invoking TServer-GenResult

12
Performance Metrics (contd)

CPU_Load
CPU-Load CPU_User CPU_System
CPU_User the percent of CPU time used user mode
CPU_System the percent of CPU time in system
mode
Load1
Average number of processes ready to run during
the last 1 minute

13
Outline

Problem
MDS2 and NetLogger instrumentation
Experimental setup
Experiment results and analysis
Conclusion and future work

14
Experiment 1GRIS Scalability (with users)

10 reporting Information Providers
Up to 600 users
10 minutes querying
Query asks for all the data from all Information
Providers (10KB)
Each data point is the average of 100 data

Caching/ Without caching
1
15
Experiment 1 ResultGRIS Query Phases Performance

Without data caching, the bottleneck lies in the
server-sides Server-Invoking phase
it is due to the high cost of invoking
Information Providers
GRIS performance with data caching depends on the
client-side Client-Connect time
V2.4 GRIS outperforms V2.2 GRIS attributes to
better memory use

16
Experiment Set 1 ResultLoad1

GRIS host has higher load with more users because
more intensive contention among more queries
GRIS without data caching casts lower load than
GRIS with data caching because processes are
blocked waiting for resources

17
Experiment 1 Summary

Enable caching at MDS2 GRIS can bypass the
performance bottleneck and support more users
MDS2 GRIS should run on a well-connected machine
Duplicating MDS2 GRIS can improve performance

18
Experiment 2GIIS Scalability (with users)

5 reporting GRIS each with 10 Information
Providers
Up to 600 users
10 minutes querying
Query asks for all the data from all the
reporting GRIS (50KB)
Each data point is the average of 100 data

Caching
2
19
Experiment 2 Result GIIS Query Phases Performance

GIIS exhibits a high scalability generally due to
data caching
However, it is constrained by the client-sides
Client-Connect phase
V2.4 GIIS outperforms V2.2 GIIS
GIIS with data caching is similar to but more
efficient than GRIS with data caching

20
Experiment 2 ResultLoad1

GIIS host experiences a higher load with the
increasing number of users

21
Experiment 2 Summary

GIIS with data caching has a high scalability and
provides efficient directory service
When serving a large number of users, its
performance is constrained by the users
connection time
Duplicate the GIIS to keep the quality of service
when there are a larger number of users

22
Conclusion

Studied the scalability of MDS2 at a finer
granularity using NetLogger instrumentation
Located performance bottlenecks and constraints
for MDS2 GRIS and GIIS
Caching or pre-fetching the data is much more
important than we expected
Placing primary components at well-connected
sites can improve performance too

23
Future Work

Do more NetLogger-assisted experiments to address
other features of MDS2 GRIS and GIIS
Study more monitoring and information services
Study how access control affects performance
Perform WAN environment experiments

24
Contact Information