Deploying a High Throughput Computing Cluster - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Deploying a High Throughput Computing Cluster

Description:

Grids are persistent environments that enable software applications to integrate ... Administrator may steer matchmaking to utilize resources efficeintly ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 22
Provided by: vish5
Learn more at: https://www.isi.edu
Category:

less

Transcript and Presenter's Notes

Title: Deploying a High Throughput Computing Cluster


1
Deploying a High Throughput Computing Cluster
  • Jim Basney and Miron Livny
  • Presented by
  • Vishal Singh

2
Seminar Overview
  • I Introduction
  • Primary Goal of Condor
  • Condor Overview
  • II Challenges of deploying an HTC environment
  • Layered Software Architecture
  • Protocol flexibility
  • Remote file access
  • Checkpointing
  • III System administration of an HTC environment
  • Access policies
  • Reliability
  • System log file management
  • Security
  • IV Summary

3
Goals
  • GlobusThe Globus project is developing
    fundamental technologies needed to build
    computational grids. Grids are persistent
    environments that enable software applications to
    integrate instruments, displays, computational
    and information resources that are managed by
    diverse organizations in widespread locations.

CondorThe goal of the Condor project is to
develop, implement, deploy, and evaluate
mechanisms and policies that support High
Throughput Computing (HTC) on large collections
of distributively owned computing resources.
4
Condor Overview
  • Three entities
  • Customer Agent Manages a queue of
    application descriptions and sends resource
    requests to the matchmaker.
  • Resource Agent Implements the policies of
    resource owner and sends resource offers to
    matchmaker.
  • Matchmaker Finds a match between the resouce
    requests and the resource offers and notifies the
    agents when a match is found.

5
Four Primary Challenges
  • Utilization of heterogeneous resources
  • Evolution of network protocols
  • Remote file access
  • Utilization of non dedicated resources

6
Layered Software Architecture
ReasonPortability of HTC system
  • Network API provides both connection-oriented
    and
  • connectionless,reliable and unreliable interfaces.
  • Process management API provides the ability to
    create ,suspend,
  • unsuspend, and kill a process.
  • Workstation statistics API reports the
    information necessary to
  • 1.gtimplement the resource owner policies
  • 2.gtverify the validation of customer
    application requirements.

7
Layered resource management
architectureCondor
8
PROTOCOL FLEXIBILTY
Why?
Inconvenient to frequently update components in a
HTC, so new features are not deployed until a
future major system upgrade.
A general-purpose data format may help
Example of protocol data format
Backward compatibility is ensured.
9
Remote File Access
  • Guarantees a HTC application, access to data
    files from any workstation in the cluster.

Three Implementation options
  • Distributed file system

- Requires authentication of customer app. to
file system.
- Privileges need to be assigned.
  • Data file staging

- Large data files results in high start-up and
tear down costs.
10
Remote File Access (cont.)
Redirect file I/O system calls
HTC environment must interpose itself
between application and operating system and
service file system calls.
System call interposition
How?
  • Linking application with an interposition
  • library or trapping system calls thru O.S
  • HTC environment invokes an RPC

Benefits
No file system requirements on remote station
Drawbacks
- Many high latency operations reduce performance
of application.
- Developing and maintaining a portable
interposition system is difficult.
11
Checkpointing
  • What is a check point?

A snapshot of the state of an executing program.
Uses
  • Provide reliability
  • Enable preemptive-resume scheduling

Can be
  • kernel-level checkpointing
  • Often not provided by workstation
    operating systems.
  • User level checkpointing

12
Progress
  • I Introduction
  • Primary Goal of Condor
  • Condor Overview
  • II Challenges of deploying an HTC environment
  • Layered Software Architecture
  • Protocol flexibility
  • Remote file access
  • Checkpointing
  • III System administration of an HTC environment
  • Access policies
  • Reliability
  • System log file management
  • Security
  • IV Summary

13
System Administration
Administrator has to answer to.
  • Resource owners
  • Enforce access policies of resource owners.
  • Customers
  • Valuable services received from the HTC
    environment.
  • Policy makers
  • Has to demonstrate that the HTC is meeting the
    stated goals.

14
Access policies
  • Answers the question who and when can a resource
    can be used.

One method of policy specification is through
expressions
15
Access policies (cont.)
  • Can be optimized for throughput
  • Eg
  • For low-bandwidth networks a longer Vacate
    interval may be negotiated.
  • Vacate need not be attempted when chances of
    successful check point low.
  • Administrator may steer matchmaking to utilize
    resources efficeintly
  • when network bandwidth limited.

16
Reliability
Complications
  • Distinguish between normal and abnormal
    terminations
  • Choose the correct checkpoint to use for restart
  • Decide when it is safe to restart the application
  • problem of one bad node in HTC

Heuristically determine
- if application fails consistently on different
nodes
- if different applications fail on the same node
ImplyHTC must be prepared for failures and must
automate failure recovery for common failures.
17
Problem Diagnosis via System Logs
System logs are primary tools for diagnosing
system failures.
HTC Environment Logs
18
Monitoring and Accounting
HTC environment provides system monitoring and
accounting facilities to the administrator
Observations 1.gt Approximately 100 resources
were added to the cluster during the month. 2.gt
Resource availability followed a daily cyclic
pattern, where more resources were available for
HTC during the night 3.gtOn average, more
resources available on weekends compared to
weekends.
19
Security
  • An HTC environment is potentially vulnerable to

Resource Attack  - An unauthorized user gains
access to a resource via the HTC environment -
An authorized user violates the resource owners
access policy.
Customer Attack   - Customers account or data
files are compromised via the HTC environment.
Steps to be taken
- Protecting the resources requires an effective
user authentication mechanism.
- The HTC environment must ensure that all
resource agents are trustworthy
  • Unencrypted network streams and buffer-overflow
    attacks are potential
  • vulnerabilities.

20
Summary
  • The HTC software must be portable, reliable,
    and maintainable.
  • Layered architecture with flexible network
    provides such a framework.
  • Remote file access and checkpointing allow HTC
    to utilize distributively
  • owned, non-dedicated resources
  • Development and maintenance costs must be
    balanced.
  • The HTC software must provide secure services
    with effective logging.

21
Conclusion
Deploying an HTC environment is efficiently
managing all the complexities described for all
the three entitiesresource owners, customers and
policy makers.It is not exotic scheduling
algorithms and mechanisms which make an HTC
environment successful,but an emphasis on
usability, flexibility, reliability, and
maintainability.
Web site
Condor website http//www.cs.wisc.edu/condor
Write a Comment
User Comments (0)
About PowerShow.com