Title: Performance Analysis of the Globus Toolkit Monitoring and Discovery Service MDS2
1Performance Analysis of the Globus Toolkit
Monitoring and Discovery Service (MDS2)
- Xuehai Zhang, University of Chicago
- Dr. Jennifer Schopf, Argonne National Lab
2Grid Monitoring and Information Services
- Why are they important?
- Resource selection, scheduling
- Prediction, system status monitoring, event
notification - Few quantitative performance studies have been
done - Their performance study will help in
- Deployment
- Performance tuning
- Development of future systems
3Performance Study of MDS2
- MDS2 is the most common information and
monitoring service in production Grids - Zhang, Freschl, and Schopf (HPDC 03)
- Evaluated MDS2 scalability and compared with two
other services, R-GMA and Hawkeye - The approach is coarse grain and focuses on
end-to-end performance only - This study
- Revisits MDS2 scalability at a finer granularity
using NetLogger instrumentation - Enables us to better understand what and where
are the performance bottlenecks
4Outline
- Problem
- MDS2 and NetLogger instrumentation
- Experimental setup
- Experiment results and analysis
- Conclusion and future work
5Monitoring and Discovery Service (MDS2)
- Part of the Globus Toolkit
- Based on Lightweight Directory Access Protocol
(LDAP) - Uses a hierarchical architecture
- Grid Index Information Service (GIIS)
- Grid Resource Information Service (GRIS)
- Information Providers (IPs)
6NetLogger Instrumentation
- NetLogger is a toolkit to debug distributed
applications and identify bottleneck - Developed at Lawrence Berkeley National Lab
- Instruments applications by logging interesting
events at every critical point - We used NetLogger to divide the end-to-end path
of a MDS2 query into 7 phases
7Outline
- Problem
- MDS2 and NetLogger instrumentation
- Experimental setup
- Experiment results and analysis
- Conclusion and future work
8Performance Topics
- Topic 1 MDS2 GRIS vs. User
- Two configuration scenarios
- GRIS always caches data
- GRIS never caches data
- Topic 2 MDS2 GIIS vs. User
- As a directory server, GIIS is configured to
always cache data
1
2
9Experimental Setup
- We deployed and studied MDS v2.2 and v2.4
- Both were instrumented with NetLogger v2.0.13
- Server-sided Testbed Lucky nodes at ANL
- 7 dual-processor Linux boxes
- Hostname lucky0,1,3-7.mcs.anl.gov
- lucky0 and lucky6 ran Linux kernel 2.4.10 and the
rest ran kernel 2.4.19 - Two 1133 MHz Intel PIII CPUs (with a 512KB cache
per CPU) and 512 MB RAM - Interconnect is 100 Mbps Ethernet
10Experimental Setup (contd)
- Client-sided Testbed at University of Chicago
(UC) - 20 Linux boxes
- 15 machines equipped with a 1208MHz uni-processor
and 256 MB RAM - 5 machines with 756 MHz CPUs and 256 MB RAM
- The simulation of concurrent users
- Simulated by multiple processes evenly
distributed to all client machines - Continuous queries separated by 1-second wait
period - 100Mbps network connects ANL and UC
11Performance Metrics
- Throughput
- The average number of requests processed by a
MDS2 service component per second - Observed Response Time (ORT) and Request
Processing Time (RPT) - ORT the average time from the user sends out a
request till it gets the response calculated at
the client side - RPT the average time for a MDS2 service
component to handle a user request calculated at
the server side - ORT is always greater than RPT
- ORTTClient-connect TClient-Bind RPT
TClient-EndConnect - RPTTServer-InitSearch TServer-SearchIndex
TServer-Invoking TServer-GenResult
12Performance Metrics (contd)
- CPU_Load
- CPU-Load CPU_User CPU_System
- CPU_User the percent of CPU time used user mode
- CPU_System the percent of CPU time in system
mode - Load1
- Average number of processes ready to run during
the last 1 minute
13Outline
- Problem
- MDS2 and NetLogger instrumentation
- Experimental setup
- Experiment results and analysis
- Conclusion and future work
14Experiment 1GRIS Scalability (with users)
- 10 reporting Information Providers
- Up to 600 users
- 10 minutes querying
- Query asks for all the data from all Information
Providers (10KB) - Each data point is the average of 100 data
Caching/ Without caching
1
15Experiment 1 ResultGRIS Query Phases Performance
- Without data caching, the bottleneck lies in the
server-sides Server-Invoking phase - it is due to the high cost of invoking
Information Providers - GRIS performance with data caching depends on the
client-side Client-Connect time - V2.4 GRIS outperforms V2.2 GRIS attributes to
better memory use
16Experiment Set 1 ResultLoad1
- GRIS host has higher load with more users because
more intensive contention among more queries - GRIS without data caching casts lower load than
GRIS with data caching because processes are
blocked waiting for resources
17Experiment 1 Summary
- Enable caching at MDS2 GRIS can bypass the
performance bottleneck and support more users - MDS2 GRIS should run on a well-connected machine
- Duplicating MDS2 GRIS can improve performance
18Experiment 2GIIS Scalability (with users)
- 5 reporting GRIS each with 10 Information
Providers - Up to 600 users
- 10 minutes querying
- Query asks for all the data from all the
reporting GRIS (50KB) - Each data point is the average of 100 data
Caching
2
19Experiment 2 Result GIIS Query Phases Performance
- GIIS exhibits a high scalability generally due to
data caching - However, it is constrained by the client-sides
Client-Connect phase - V2.4 GIIS outperforms V2.2 GIIS
- GIIS with data caching is similar to but more
efficient than GRIS with data caching
20Experiment 2 ResultLoad1
- GIIS host experiences a higher load with the
increasing number of users
21Experiment 2 Summary
- GIIS with data caching has a high scalability and
provides efficient directory service - When serving a large number of users, its
performance is constrained by the users
connection time - Duplicate the GIIS to keep the quality of service
when there are a larger number of users
22Conclusion
- Studied the scalability of MDS2 at a finer
granularity using NetLogger instrumentation - Located performance bottlenecks and constraints
for MDS2 GRIS and GIIS - Caching or pre-fetching the data is much more
important than we expected - Placing primary components at well-connected
sites can improve performance too
23Future Work
- Do more NetLogger-assisted experiments to address
other features of MDS2 GRIS and GIIS - Study more monitoring and information services
- Study how access control affects performance
- Perform WAN environment experiments
24Contact Information
- Me
- Xuehai Zhang, University of Chicago
- Email
- hai_at_cs.uchicago.edu
- Web
- http//people.cs.uchicago.edu/hai
- Advisor and co-author
- Dr. Jennifer Schopf, Argonne National Lab
- jms_at_mcs.anl.gov