HTC in Research presentation

About This Presentation

Transcript and Presenter's Notes

Title: HTC in Research

1
HTC inResearch Education
2
Claims for benefits provided by Distributed
Processing Systems

High Availability and Reliability
High System Performance
Ease of Modular and Incremental Growth
Automatic Load and Resource Sharing
Good Response to Temporary Overloads
Easy Expansion in Capacity and/or Function

What is a Distributed Data Processing System? ,
P.H. Enslow, Computer, January 1978
3
Democratizationof ComputingYou do not need to
be asuper-person to do super-computing
4
NCBI FTP
Searching for small RNAs candidates in a
kingdom 45 CPU days
.ffn
.fna
IGRExtract3
.ptt
.gbk
RNAMotif
FindTerm
TransTerm
All other IGRs
ROI IGRs
BLAST
Terminators
Conservation
Known sRNAs, riboswitches
sRNAPredict
IGR sequences of candidates
Candidate loci
FFN_parse
IGRs all known sRNAs
BLAST
BLAST
TFBS matrices
homology
QRNA
ORFs flank known
ORFs flank candidates
Patser
2o cons.
BLAST
BLAST
paralogy
TFBSs
sRNA_Annotate
synteny
Annotated candidate sRNA-encoding genes
5
Education and Training

Computer Science develop and implement novel
HTC technologies (horizontal)
Domain Sciences develop and implement
end-to-end HTC capabilities that are fully
integrated in the scientific discovery process
(vertical)
Experimental methods develop and implement a
curriculum that harnesses HTC capabilities to
teach how to use modeling and numerical data to
answer scientific questions.
System Management develop and implement a
curriculum that uses HTC resources to teach how
to build, deploy, maintain and operate
distributed systems

"As we look to hire new graduates, both at the
undergraduate and graduate levels, we find that
in most cases people are coming in with a good,
solid core computer science traditional education
... but not a great, broad-based education in all
the kinds of computing that near and dear to our
business."
Ron BrachmanVice President of Worldwide Research
Operations, Yahoo

Yahoo! Inc., a leading global Internet company,
today announced that it will be the first in the
industry to launch an open source program aimed
at advancing the research and development of
systems software for distributed computing.
Yahoos program is intended to leverage its
leadership in Hadoop, an open source distributed
computing sub-project of the Apache Software
Foundation, to enable researchers to modify and
evaluate the systems software running on a
4,000-processor supercomputer provided by Yahoo.
Unlike other companies and traditional
supercomputing centers, which focus on providing
users with computers for running applications and
for coursework, Yahoos program focuses on
pushing the boundaries of large-scale systems
software research.

8
1986-2006Celebrating 20 years since we first
installed Condor in our CS department
9
Integrating Linux Technology with Condor Kim van
der Riet Principal Software Engineer

10
What will Red Hat be doing?

Red Hat will be investing into the Condor project
locally in Madison WI, in addition to driving
work required in upstream and related projects.
This work will include
Engineering on Condor features infrastructure
Should result in tighter integration with related
technologies
Tighter kernel integration
Information transfer between the Condor team and
Red Hat engineers working on things like
Messaging, Virtualization, etc.
Creating and packaging Condor components for
Linux distributions
Support for Condor packaged in RH distributions
All work goes back to upstream communities, so
this partnership will benefit all.
Shameless plug If you want to be involved, Red
Hat is hiring...

10
11
High Throughput Computingon Blue Gene

IBM Rochester Amanda Peters, Tom Budnik
With contributions from
IBM Rochester Mike Mundy, Greg Stewart, Pat
McCarthy
IBM Watson Research Alan King, Jim Sexton
UW-Madison Condor Greg Thain, Miron Livny,
Todd Tannenbaum

12
Condor and IBM Blue Gene Collaboration

Both IBM and Condor teams engaged in adapting
code to bring Condor and Blue Gene technologies
together
Initial Collaboration (Blue Gene/L)
Prototype/research Condor running HTC workloads
on Blue Gene/L
Condor developed dispatcher/launcher running HTC
jobs
Prototype work for Condor being performed on
Rochester On-Demand Center Blue Gene system
Mid-term Collaboration (Blue Gene/L)
Condor supports HPC workloads along with HTC
workloads on Blue Gene/L
Long-term Collaboration (Next Generation Blue
Gene)
I/O Node exploitation with Condor
Partner in design of HTC services for Next
Generation Blue Gene
Standardized launcher, boot/allocation services,
job submission/tracking via database, etc.
Study ways to automatically switch between
HTC/HPC workloads on a partition
Data persistence (persisting data in memory
across executables)
Data affinity scheduling
Petascale environment issues

13
The Grid Blueprint for a New Computing
Infrastructure Edited by Ian Foster and Carl
Kesselman July 1998, 701 pages.
The grid promises to fundamentally change the way
we think about and use computing. This
infrastructure will connect multiple regional and
national computational grids, creating a
universal source of pervasive and dependable
computing power that supports dramatically new
classes of applications. The Grid provides a
clear vision of what computational grids are, why
we need them, who will use them, and how they
will be programmed.
14

We claim that these mechanisms, although
originally developed in the context of a cluster
of workstations, are also applicable to
computational grids. In addition to the required
flexibility of services in these grids, a very
important concern is that the system be robust
enough to run in production mode continuously
even in the face of component failures.

Miron Livny Rajesh Raman, "High Throughput
Resource Management", in The Grid Blueprint for
a New Computing Infrastructure.
15
(No Transcript)
16
CERN 92
17
The search for SUSY

Sanjay Padhi is a UW Chancellor Fellow who is
working at the group of Prof. Sau Lan Wu located
at CERN (Geneva)
Using Condor Technologies he established a grid
access point in his office at CERN
Through this access-point he managed to harness
in 3 month (12/05-2/06) more that 500 CPU years
from the LHC Computing Grid (LCG) the Open
Science Grid (OSG) the Grid Laboratory Of
Wisconsin (GLOW) resources and local group owned
desk-top resources.

Super-Symmetry
18
High Throughput Computing

We first introduced the distinction between High
Performance Computing (HPC) and High Throughput
Computing (HTC) in a seminar at the NASA Goddard
Flight Center in July of 1996 and a month later
at the European Laboratory for Particle Physics
(CERN). In June of 1997 HPCWire published an
interview on High Throughput Computing.

19
Why HTC?

For many experimental scientists, scientific
progress and quality of research are strongly
linked to computing throughput. In other words,
they are less concerned about instantaneous
computing power. Instead, what matters to them is
the amount of computing they can harness over a
month or a year --- they measure computing power
in units of scenarios per day, wind patterns per
week, instructions sets per month, or crystal
configurations per year.

20
High Throughput Computingis a24-7-365activity
FLOPY ? (606024752)FLOPS
21
High Throughput Computing
EPFL 97

Miron Livny
Computer Sciences
University of Wisconsin-Madison
miron_at_cs.wisc.edu

22
Customers of HTC

Most HTC application follow the Master-Worker
paradigm where a group of workers executes a
loosely coupled heap of tasks controlled by on or
more masters.
Job Level - Tens to thousands of independent jobs
Task Level - A parallel application (PVM,MPI-2)
that consists of a small group of master
processes and tens to hundreds worker processes.

23
The Challenge

Turn large collections of existing
distributively owned computing resources into
effective High Throughput Computing Environments
Minimize Wait while Idle

24
Obstacles to HTC
(Sociology) (Robustness) (Portability) (Technology
)

Ownership Distribution
Size and Uncertainties
Technology Evolution
Physical Distribution

25
Sociology

Make owners ( system administrators) happy.
Give owners full control on
when and by whom private resources are used for
HTC
impact of HTC on private Quality of Service
membership and information on HTC related
activities
No changes to existing software and make it easy
to install, configure, monitor, and maintain

Happy owners ? more resources ? higher throughput

26
Sociology

Owners look for a verifiable contract with the
HTC environment that spells out the rules of
engagements.
System administrators do not like weird
distributed applications that have the potential
of interfering with the happiness of their
interactive users.

27
Robustness

To be effective, a HTC environment must run as
a 24-7-356 operation.
Customers count on it
Debugging and fault isolation may be a very
time consuming processes
In a large distributed system, everything that
might go wrong will go wrong.

Robust system ? less down time ? higher throughput
28
Portability

To be effective, the HTC software must run on
and support the latest greatest hardware and
software.
Owners select hardware and software according to
their needs and tradeoffs
Customers expect it to be there.
Application developer expect only few (if any)
changes to their applications.

Portability ? more platforms? higher throughput
29
Technology

A HTC environment is a large, dynamic and
evolving Distributed System
Autonomous and heterogeneous resources
Remote file access
Authentication
Local and wide-area networking

30
Robust and PortableMechanisms Hold The
ToHigh ThroughputComputing
Policies play only a secondary role in HTC
31
Leads to a bottom upapproach to building and
operating distributed systems
32
My jobs should run

on my laptop if it is not connected to the
network
on my group resources if my certificate expired
... on my campus resources if the meta scheduler
is down
on my national resources if the trans-Atlantic
link was cut by a submarine

33
The Open Science Grid(OSG)

Miron Livny - OSG PI Facility Coordinator,
Computer Sciences Department
University of Wisconsin-Madison

Supported by the Department of Energy Office of
Science SciDAC-2 program from the High Energy
Physics, Nuclear Physics and Advanced Software
and Computing Research programs, and the
National Science Foundation Math and Physical
Sciences, Office of CyberInfrastructure and
Office of International Science and Engineering
Directorates.
34
The Evolution of the OSG

LIGO operation
LIGO preparation

LHC construction, preparation
LHC Ops

iVDGL
(NSF)

OSG
Trillium
Grid3
GriPhyN
(DOENSF)
(NSF)

PPDG
(DOE)

DOE Science Grid
(DOE)

1999
2000
2001
2002
2005
2003
2004
2006
2007
2008
2009

European Grid Worldwide LHC Computing Grid

Campus, regional grids
35
The Open Science Grid vision

Transform processing and data intensive science
through a cross-domain self-managed national
distributed cyber-infrastructure that brings
together campus and community infrastructure and
facilitating the needs of Virtual Organizations
(VO) at all scales

36
D0 Data Re-Processing
Total Events
12 sites contributed up to 1000 jobs/day
OSG CPUHours/Week
2M CPU hours 286M events 286K Jobs on
OSG 48TB Input data 22TB Output data
37
The Three Cornerstones
Need to be harmonized into a well integrated
whole.
National
Campus
Community
38
OSG challenges

Develop the organizational and management
structure of a consortium that drives such a
Cyber Infrastructure
Develop the organizational and management
structure for the project that builds, operates
and evolves such Cyber Infrastructure
Maintain and evolve a software stack capable of
offering powerful and dependable capabilities
that meet the science objectives of the NSF and
DOE scientific communities
Operate and evolve a dependable and well managed
distributed facility

39
6,400 CPUs available Campus Condor pool
backfills idle nodes in PBS clusters - provided
5.5 million CPU-hours in 2006, all from idle
nodes in clusters Use on TeraGrid 2.4 million
hours in 2006 spent Building a database of
hypothetical zeolite structures 2007 5.5
million hours allocated to TG
http//www.cs.wisc.edu/condor/PCW2007/presentation
s/cheeseman_Purdue_Condor_Week_2007.ppt
40
Clemson Campus Condor Pool

Machines in 27 different locations on Campus
1,700 job slots
gt1.8M hours served in6 months
users from Industrial and Chemical engineering,
and Economics
Fast ramp up of usage
Accessible to the OSG through a gateway

41
Grid Laboratory of Wisconsin
2003 Initiative funded by NSF(MIR)/UW at 1.5M.
Second phase funded in 2007 by NSF(MIR)/UW at
1.5M. Six Initial GLOW Sites

Computational Genomics, Chemistry
Amanda, Ice-cube, Physics/Space Science
High Energy Physics/CMS, Physics
Materials by Design, Chemical Engineering
Radiation Therapy, Medical Physics
Computer Science

Diverse users with different deadlines and usage
patterns.
42
GLOW Usage 4/04-11/08
Over 35M CPU hours served!
43
The next 20 years

We all came to this meeting because we believe in
the value of HTC and are aware of the challenges
we face in offering researchers and educators
dependable HTC capabilities.
We all agree that HTC is not just about
technologies but is also very much about people
users, developers, administrators, accountants,
operators, policy makers,

Write a Comment

User Comments (0)

About PowerShow.com

HTC in Research PowerPoint PPT Presentation