Title: Taiwan UniGrid
1Taiwan UniGrid
- Yeh-Ching Chung
- Department of Computer Science
- National Tsing Hua University
- Hsin-Chu, 300, Taiwan
2Outline
- Introduction
- Portal
- Broker and Scheduler
- Resource Information Service
- Storage Service
- Applications
- Conclusion
3Introduction (1)
- The purpose of grid computing is to integrate
various resources within a large network
environment. - The purpose of the UniGrid project is to build a
platform for academic research using grid-related
technologies in Taiwan.
4Introduction (2)
- 9 institutes join to develop the system
- ????
- ???????
- ??????
- ???????
- ???????
- ???????
- ???????
- ????????????
- ?????????
5Introduction (3)
- All institutes that participate in the UniGrid
project contribute some resources. - These resources can be used in collaboration for
large scale applications.
6Introduction (4)
7Outline
- Introduction
- Portal
- Broker and Scheduler
- Resource Information Service
- Storage Service
- Applications
- Conclusion
8Portal
- The UniGrid portal provides an interface for
UniGrid users to use the resources available in
the UniGrid system. - Functionalities of the portal
- System status monitoring
- Single sign-on
- User workflow management
- Project information
9System Status Monitoring (1)
- UniGrid users can examine the status of system
resources through the portal. - The portal gathers the current system information
from the information service and present these
information to the users.
10System Status Monitoring (2)
- Screenshot of the system status monitoring web
page
11Single Sign-On (1)
- Single sign-on is a mechanism whereby a single
authentication can permit a user to access all
resources where he has access permission, without
the need to enter multiple passwords. - All user account information are kept in a
database at the portal site. - When a user requests a service, his verification
data is passed to that service. - The request will be granted only if the identity
is verified by the verification web service
12Single Sign-On (2)
- User identity verification through single sign-on
service
13User Workflow Management (1)
- A UniGrid user can design and save his own
workflows at the UniGrid portal. - A user can select any workflow he designed and
execute the workflow through the UniGrid portal. - A user can also monitor the status of his
workflow through the UniGrid portal.
14User Workflow Management (2)
Workflow
parallel execution
sequential execution
15User Workflow Management (3)
- The workflows of each user is stored in the
portal storage in XML format. - ltflow name"testflow" numstages"3"gt
- ltstage name"stage1" numjobs"1"gt
- ltjob id"0"gt
- ltsortkeygt1lt/sortkeygt ltruntypegtmpilt/runtypegt
- ltworkdirgt/home/test/lt/workdirgt
- ltfilenamegtmm_mpilt/filenamegt
- ltrunrpgttruelt/runrpgt ltdatafile/gt ltargugt256lt/argugt
- ltotherurl/gt ltcpunogt4lt/cpunogt
- lt/jobgt
- lt/stagegt
-
- lt/flowgt
16User Workflow Management (4)
- Screenshot of the workflow editing web page
17User Workflow Management (5)
- When an user submits a workflow, the portal will
pass the selected workflow information to the
broker. - Upon receiving an execution request, the resource
broker will find the required resource for that
workflow and schedule its execution.
18User Workflow Management (6)
19User Workflow Management (7)
- Users can examine the execution status of his
workflow through the portals workflow monitoring
system. - All workflow execution information are stored in
a database at the machine with resource broker
installed on it. - The portal queries the database and obtain the
current status of a particular workflow. - The status information is processed and presented
in the form of web pages.
20User Workflow Management (8)
- Screenshot of the workflow monitoring web page
21User Workflow Management (9)
- Screenshot of the UniGrid workflow management web
page
22Outline
- Introduction
- Portal
- Broker and Scheduler
- Resource Information Service
- Storage Service
- Applications
- Conclusion
23Broker Scheduler (1)
- The broker provides a uniform interface to access
available resources in the UniGrid system. - The broker uses the resource information service
to obtain the current status of the resources in
the system. - After these information are gathered, the broker
will allocate the resources that meets the
requirements of the current job. - The jobs are then passed to the corresponding
local schedulers to be executed locally.
24Broker Scheduler (2)
25Broker Scheduler (3)
- Each participating organization has a local
scheduler (Condor) installed to schedule the jobs
assigned to that organization. - Condor
- A scheduler for large collections of
distributively owned computing resources - Developed by the researchers at University of
Wisconsin - Specialized for compute-intensive jobs
- Uses the ClassAd mechanism to match job
requirements to machine status and schedule the
jobs according to the matching results
26Related Research (1)
- Tools have been developed to simulate different
load sharing and scheduling policies on computing
grid and analyze their performance - Queuing methods
- Independent clusters
- Multiple queues
- Forwarding to no-need-to-wait site
- Forwarding to shortest-queue site
- Forwarding to least-load site,
- load
27Related Research (2)
- Queuing methods (contd.)
- Single queue
- Multi-pool centralized queue
- Single-pool centralized queue
- One big cluster
- Two-level scheduling
- Empty queue only
- Shortest queue first
- Least load first
- Two-level local queues
- Forwarding to shortest-queue site
28Related Research (3)
- Scheduling policies
- Non-FCFS
- Multi-pool centralized queue
- Single-pool centralized queue
- FCFS
- Two-level scheduling
- The performance of Non-FCFS is three times better
than FCFS
29Related Research (4)
- Implementation Approaches
- Multi-Pool Centralized Queue
- Global queue scheduling in the broker, no local
queuing system - Global queue scheduling in the broker, making
sure available processors through local queuing
system - Single-Pool Centralized Queue
- Global queue scheduling in the broker, no local
queuing system
30Related Research (5)
- Two-Level Scheduling (Empty-Queue-Only
Multi-Pool Grid) - Global queue in the broker, local queues in the
local queuing systems
31Related Research (6)
32Related Research (7)
- Simulation results (contd.)
33Related Research (8)
- Discussion
- Non-FCFS methods can effectively improve the
overall system utilization and performance. - The smallest first non-FCFS policy outperforms
all other policies in terms of waiting time and
waiting ratio. - As the worst case is concerned, the backfilling
policy is superior because it does not allow jobs
to be delayed by the backfilling activities
34Outline
- Introduction
- Portal
- Broker Scheduler
- Resource Information Service
- Storage Service
- Applications
- Conclusion
35Resource Information Services
- The resource information service provides
information about current resource status, these
information can be used by other services of the
system - Functionalities of the resource information
service - Information system
- Performance visualization of MPI parallel
programs execution
36Information System (1)
- Provides an interface for other services to query
various information about computing nodes - The statistics about the individual nodes are
obtained using MDS (Monitoring Discovery
Service) provided by the Globus Toolkit - The current network status between machines are
gathered using NWS (Network Weather Service) - Automatic update of node information
- When a new computing nodes is added/removed
37Information System (2)
- The Network Weather Service (NWS)
- A distributed system that periodically monitors
and dynamically forecasts the performance various
network and computational resources can deliver
over a given time interval - Developed by the researchers at UCSB
- It uses numerical models to generate forecasts of
what the conditions will be for a given time
frame - Because this functionality is analogous to
weather forecasting, the system is called Network
Weather Service
38Information System (3)
39Information System (4)
- Screenshot of the node status webpage
40Performance Visualization of MPI Programs (1)
- Input any application (depending on the
availability of compiler in grid platform) - Output performance visualization of the
execution of this application
41Performance Visualization of MPI Programs (2)
- Execution of a Parallel Application using 4
computing nodes
42Related Research (1)
- Communication localization data partitioning
techniques in cluster-based grid system - Localized communication enhances performance of
parallel applications on grid - Adaptive data partitioning for identical cluster
non-identical cluster grid topology - In-core out-of-core applications
43Related Research (2)
- Communication localization techniques for
identical cluster
Localized communication patterns
Original communication patterns
44Related Research (3)
- Communication localization techniques for
non-identical cluster
Original communication table
45Related Research (4)
- Communication localization techniques for
non-identical cluster (contd.)
Localized communication table
46Outline
- Introduction
- Portal
- Broker and Scheduler
- Resource Information Service
- Storage Service
- Applications
- Conclusion
47Storage Service
- The goal of storage service is to provide a
collaborative space where UniGrid users can share
their data and resources with others. - Components of the storage service
- Virtual storage system
- Data management system
48Virtual Storage System (1)
- Virtual storage system architecture
49Virtual Storage System (2)
- The virtual storage system is implemented with
Java as a web service - UniGrid services access the virtual storage
system when they need to fetch/modify users data
files - A client program is available for users to manage
his own storage space - The files are stored in a master file server and
replicas of the files are distributed to other
machines
50Virtual Storage System (3)
51Virtual Storage System (4)
- Screenshot of the storage service client program
52Data Management (1)
- The Data Management is the Web-based Replica
Access and Management System - It consists of the Registration, Search and
Manager system - The registration system is used in managing the
user for accessing the UniGrid System - The search system combines with the RLS and Web
technique - The manager system offers a friendly interface
for manager, it will be easy to maintain the
contents of database
- Structure of the Data Management System
53Data Management (2)
54Data Management (3)
- The Registration System
- In Security
- We design a web registration system
- User need to be registered in portal and logged
in by CA (Proxy-init) - In account manage
- Administrator
- User
- The detailed structure of Web Service System
55Data Management (4)
- The Search System
- Replica Index and Replica Location
- In LRC Sever, we can execute the basic command.
- We can update information of LRI server use the
batch command - Services
- We offer the service of the Job submit, files
list, files upload and data replication in single
server
- The detailed structure of Web Service System
56Data Management (5)
- The Manager System
- We plan to design a friendly interface for
manager, it will be easy to maintain the contents
of Metadata database, update the RLS database and
manage users account
- The detailed structure of Web Service System
57Outline
- Introduction
- Portal
- Broker and Scheduler
- Resource Information Service
- Storage Service
- Applications
- Conclusion
58Applications 1
Simulations of atmospheric circulations with the
NTU/Purdue nonhydrostatic numerical model. Model
characteristics Nonhydrostatic Explicit
forward-backward integration for both
high-frequency waves and gravity waves Implicit
diffusion scheme with a TKE prognostic
equation Time split schemes for high-frequency
waves, gravity waves, diffusion, and surface
processes. Physical processes Cloud
microphysics Surface similarity equation 3-layer
soil model Coriolis force
59Performance with the UniGrid
12 hr
50 sec
30 min
5 hr
17 min
12 hr
50 sec
31 min
5 hr
35 min
60Commands for submitting jobs
/opt/mpich/pgi/bin/mpirun nolocal machinefile
host np 8 nonh3d.exe gt test
host
NTU uninode11 2 uninode12 2 uninode14
2 uninode15 2
makefile
OBJ nonh3d.o tograds.o copy.o update.o sound.o
adv.o cloud1.o dampini.o\ initial.o
restart.o nbr2d.o startend.o tkeeq.o updtrp.o\
sprogi4.o sprogi2.o diffxy.o diffz.o pbl.o EXE
../nonh3d.exe OPT -O3 -Mextend -Msave
-Bstatic -byteswapio OPT -O3 -static
-ffixed-line-length-80 OPT -O3 -static OPT
-O3 OPT -static (EXE) (OBJ)
/opt/mpich/pgi/bin/mpif77 (OPT) -o (EXE)
(OBJ) .f.o /opt/mpich/pgi/bin/mpif77
(OPT) -c lt clean rm -f .o
../nonh3d.exe
host1
uninode11 uninode11 uninode10 uninode10 uninode12
uninode12 uninode14 uninode14 uninode15 uninode15
uninode5 uninode5 uninode7 uninode7 uninode9 unino
de9
61Three-dimension simulation of a thermal bubble
in an isentropic environment
Initial spherical bubble develops into a
mushroom-like shape. Two isentropic surfaces are
shown. The isentropic surface corresponding to a
higher potential temperature is in pink.
62Two-dimensional simulation of a sea breeze
10 5km 0
z
SBF
0
15km 30
x
The figure shows the total water mixing ratio
(vapor plus liquid) over land after 2.5 hr. The
label under the x-axis is the distance from the
coastline. Water vapor is pumped up from the
ground surface in the convective boundary layer
(with the red/orange color representing high
water vapor content in the air). The location of
the sea breeze front (SBF) is shown.
63Applications 2
- FASTA
- Compares a protein sequence to another protein
sequence or to a protein database, or a DNA
sequence to another DNA sequence or a DNA library
64Applications 3
- ClustalW
- A general purpose multiple sequence alignment
program for DNA or proteins.
65Conclusions and Future Work
- A prototype of UniGrid system has been developed
- Enhance the data grid part of UniGrid
- Promote the UniGrid system to universities in
Taiwan