Title: Timeshared Parallel Machines
1Timeshared Parallel Machines
- Need resource management
- Shrink and expand individual jobs to available
sets of processors - Example Machine with 100 processors
- Job1 arrives, can use 20-150 processors
- Assign 100 processors to it
- Job2 arrives, can use 30-70 processors,
- and will pay more if we meet its deadline
- Make resource allocation decisions
2Multiple Parallel Machines
- Faucet submits a request
- CPU seconds, min-max cpus, deadline, interacive?
- Parallel machines submit bids
- A job for 100 cpu hours may get a lower price bid
if - It has less tight deadline,
- more flexible PE range
- A job that requires 15 cpu minutes and a deadline
of 1 minute - Will generate a variety of bids
- A machine with idle time on its hand low bid
3How to make all of this work?
- The key fine-grained resource management model
- Work units are objects and threads
- rather than processes
- Data units are object data, thread stacks, ..
- Rather than pages
- Work/Data units can be migrated automatically
- during a run
4Anonymous Compute Power
What is needed to make this metaphor
work? Timeshared parallel machines in the
background effective resource management Quality
of computational service contracts/guarantees Fron
t ends that will allow agents to submit jobs on
users behalf
Computational Faucets
5Computational Faucets
- What does a Computational faucet do?
- Submit requests to the grid
- Evaluate bids and decide whom to assign work
- Monitor applications (for performance and
correctness) - Provide interface to users
- Interacting with jobs, and monitoring behavior
- What does it look like?
A browser!
6Faucets QoS
- User specifies desired job parameters such as
program executable name, executable platform, min
PE, max PE, estimated CPU-seconds (for various
PE), priority, etc. - User does not specify machine. Faucet software
contacts a central server and obtains a list of
available workstation clusters, then negotiates
with clusters and chooses one to submit the job. - User can view status of clusters.
- Planned file transfer, user authentication,
merge with Appspector for job monitoring.
Workstation Cluster
Faucet Client
Central Server
Workstation Cluster
Web Browser
Workstation Cluster
7Time-shared Parallel Machines
- To bid effectively (profitably) in such an
environment, a parallel machine must be able to
run well-paying (important) jobs, even when it is
already running others. - Allows a suitably written Charm/Converse
program running on a workstation cluster to
dynamically change the number of CPU's it is
running on, in response to a network (CCS)
request. - Works in coordination with a Cluster Manager to
give a job as many CPU's as are available when
there are no other jobs, while providing the
flexibility to accept new jobs and scale down.
8Appspector
- Appspector provides a web interface to submitting
and monitoring parallel jobs. - Submission user specifies machine, login,
password, program name (which must already be
available on the target machine). - Jobs can be monitored from any computer with a
web browser. Advanced program information can be
shown on the monitoring screen using CCS.
9BioCoRE
Goal Simulate the process of doing research.
Provide a web-based way to virtually bring
scientists together.
- Project Based
- Workbench for Modeling
- Conferences/Chat Rooms
- Lab Notebook
- Joint Document Preparation
http//www.ks.uiuc.edu/Research/biocore/