Mark Silberstein, CS, Technion - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Mark Silberstein, CS, Technion

Description:

none – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 32

Provided by: ResearchM53

Category:

more less

Transcript and Presenter's Notes

Title: Mark Silberstein, CS, Technion

1
Computational Biology Laboratory
Distributed Systems Laboratory
Superlink-Online Harnessing the worlds
computers to hunt for disease-provoking genes

Mark Silberstein, CS, Technion
Dan Geiger, Computational Biology Lab
Assaf Schuster, Distributed Systems Lab
Genetics Research Institutes in Israel, EU, US

2
Purpose of disease gene hunting

Why Search ?
Detection of diseases before birth
Risk assessment and corresponding life style
changes
Finding the mutant proteins and developing
medicine
Understanding basic biological functions

How to Search ?
Find families segregating the disease (linkage
analysis) or collect unrelated healthy and
affected persons (Association analysis or LD
mapping)
Take a simple blood test from some individuals
Analyze the DNA in the lab
Compute the most likely location of disease gene

3
Steps in Gene Hunting
Linkageanalysis(106107 bp)
4
Recombination During Meiosis
5
Familial Onychodysplasia and dysplasia of distal
phalanges (ODP)
III-15
IV-10
IV-7
6
Family Pedigree
7
Marker Information Added
8
Maximum Likelihood Evaluation
The computational problem find a value of ?
maximizing Pr(data?)
LOD score (to quantify how confident we are)
Z(?)log10Pr(data?) / Pr(data?½).
9
Results of Multipoint Analysis
10
The Bayesian network model

Si3f

Li2f
y2

Xi2

Li2m
Li3f

Xi3

Li3m

Y3

Li1f

Xi1
Y1

Li1m

Si3m
Locus 3
Locus 4
Locus 2 (Disease)
Locus 1
This model depicts the qualitative relations
between the variables. We need also to specify
the joint distribution over these variables.
11
The Computational Task

Computing Pr(data?) for a specific value of ?

Exponential time and space in
variables
five per person
markers
gene loci
values per variable
alleles
non-typed persons
table dimensionality
cycles in pedigree

12
Task length distribution

Task length unknown upon submission
From seconds to millenniums
Computing task length? NP hard
Estimate task length as we go

lt3minuts lt2hours lt2days lt2weeks
lt3months gt3months
13
Divisible Tasks through Variable Conditioning
non trivial parallelization overhead
14
Free resource pools, grids

Weak/no quality of service
Random failures of execution machines
Preemption due to higher priority tasks
Hardware bugs may lead to incorrect results
Potentially unbounded execution/queue waiting
time
Dynamic/abrupt changes of resource availability
High network delays (communication over WAN)
Multiple tasks

15
Terminology

Basic unit of execution batch job
Non-interactive mode enqueue wait execute
return
Self-contained execution sandbox
A linkage analysis request - a task
A bag (of millions) of jobs
Turnaround time is important

16
Requirements

The system must be geneticists-friendly
Interactive experience
Low response time for short tasks
Prompt user feedback
Simple, secure, reliable, stable,
overload-resistant, concurrent tasks, multiple
users...
Fast computation of previously infeasible long
tasks via parallel execution
Harness all available resources grids, clouds,
clusters
Use them efficiently!

17
Grids or Clouds?
Remaining Jobs in Queue
Long tail due to failures
Time

Small tasks are severely slow on grids
Takes 5 minutes on 10-nodes dedicated cluster
May take several hours on a grid

Should we move scientific loads on the cloud? YES!
18
Grids or Clouds?

Consider 3.2x106 jobs, 40 min each
It took 21 days on 6000-8000 CPUs
It would cost about 10K on Amazons EC2

Should we move scientific loads on the cloud? NO!
19
Clouds or Grids? Clouds and Grids!
Opportunistic
Dedicated
Burst computing
Throughput computing
20
Cheap and Expensive Resources

Task sensitivity to QoS differ in different stages

Remaining jobs in queue

Use cheap unreliable resources
Grids
Community grids
Non-dedicated clusters

Use expensive reliable resources
Dedicated clusters
Clouds

Dynamically determine entering tail mode
Switch to expensive resources (gracefully)

21
Glue pools together via overlay
Submitter to Grid 2
Issues granularity, load balancing, firewalls,
failed resources, scheduler scalability
22
Practical considerations

Overlay scalability and firewall penetration
Server may not initiate connect to the agent
Compatibility with community grids
The server is based on BOINC
Agents are upgraded BOINC clients
Elimination of failed resources from scheduling
Performance statistics is analyzed
Resource allocation depending on the task state
Dynamic policy update via Condor classad mechanism

23
(No Transcript)
24
Superlink-online 1.0 http//bioinfo.cs.technion.a
c.il
25
Task Submission
26
Superlink-online statistics

1720 CPU years for 18,000 tasks during
2006-2008 (counting)
37 citations (several mutations found)
Examples Ichthyosis,"uncomplicated" hereditary
spastic paraplegia (1-9 people per 100,000)
Over 250 (counting) users Israeli and
international
Soroka H., Be'er Sheva, Galil Ma'aravi H.,
Nahariya, Rabin H., Petah Tikva, Rambam H.,
Haifa, Beney Tzion H., Haifa, Sha'arey Tzedek H.,
Jerusalem, Hadassa H., Jerusalem, Afula H. NIH,
Universities and research centers in US, France,
Germany, UK, Italy, Austria, Spain, Taiwan,
Australia, and others...
Task example
250 days on single computer - 7 hours on 300-700
computers
Short tasks few seconds even during severe
overload

27
Using our system in Israeli Hospitals

Rabin Hospital, by Motti Shochats group
New locus for mental retardation
Infantile bilateral striatal necrosis
Soroka Hospital, by Ohad Birks group
Lethal congenital contractural syndrome
Congenital cataract
Rambam Hospital, by Eli Shprechers group
Congenital recessive ichthyosis
CEDNIK syndrome
Galil Maaravi Hospital, by Tzipi Faliks group
Familial Onychodysplasia and dysplasia
Familial juvenile hypertrophy

28
Utilizing Community Computing
3.4 TFLOPs, 3000 users, from 75 countries
29
Superlink-online V2(beta) deployment
Submission server
EGEE-II BIOMED VO
Dedicated cluster
UW in Madison Condor pool
12,000 hosts operational during the last month
Superlink_at_Campus
Superlink_at_Technion
OSG GLOW VO
30
3.1 million jobs in 21 days
60 dedicated CPUs only
31
Conclusions

Our system integrates clusters, grids, clouds,
community grids, etc.
Geneticist friendly
Minimizes use of expensive resources while
providing QoS for tasks
Generic mechanism for scheduling policy
Can dynamically reroute jobs from one pool to
another according to a given optimization
function (budget, energy, etc.)

32
Why GPUs?
Memory BW 88 GB/s peak 56GB/s observed on
GTX8800 NVIDIA - 550 Memory BW 21GB/s peak on
3.0 Ghz Intel Core2 Quad - 1100 CPUs 1.4x
annual growth GPUs 1.7x annual growth
33
NVIDIA Compute Unified Device Architecture (CUDA)
GPU
1 cycle TB/s
Global Memory
34
Key ideas (Joint work with John Owens -UC Davis)

Software-managed cache
We implement the cache replacement policy in
software
Maximization of data reuse
Better compute/memory access ratio
A simple model for performance bounds
Yes, we are (optimal)
Use special function units for hardware-assisted
execution

35
Results summary

Experiment setup
CPU single core Intel Core 2 2.4GHz, 4MB L2
GPU NVIDIA G80 (GTX8800), 750MB GDDR4, 128 SP,
16K mem / 512 threads
Only kernel runtime included (no memory
transfers, no CPU setup time)

2500 2 x 25 x 25 x 2
Use of SFU expf is about 6x slower than
on GPU, but 200x slower on CPU
Hardware
Software managed Caching
36
Acknowledgments

Superlink-online team
Alumni Anna Tzemach, Julia Stolin, Nikolay
Dovgolevsky, Maayan Fishelson, Hadar Grubman,
Ophir Etzion
Current Artyom Sharov, Oren Shtark
Prof. Miron Livny (Condor pool UW Madison, OSG)
EGEE BIOMED VO and OSG GLOW VO
Microsoft TCI program, NIH grant, SciDAC
Institute for ultrascale visualization

If your grid is underutilized let us
know! Visit us at http//bioinfo.cs.technion.ac.i
l/superlink-online Superlink_at_TECHNION project
home page http//cbl-boinc-server2.cs.technion.ac
.il/superlinkattechnion
37