Title: Internet-Based TSP Computation with Javelin Michael Neary
1Internet-Based TSP Computation with Javelin
Michael Neary Peter CappelloComputer Science,
UCSB
2IntroductionGoals
- Service parallel applications that are
- Large too big for a cluster
- Coarse-grain to hide communication latency
- Simplicity of use
- Design focus decomposition composition of
computation. - Scalable high performance
- despite large communication latency
- Fault-tolerance
- 1000s of hosts, each dynamically disassociates.
3IntroductionSome Related Work
4IntroductionSome Applications
- Search for extra-terrestrial life
- Computer-generated animation
- Computer modeling of drugs for
- Influenza
- Cancer
- Reducing chemotherapys side-effects
- Financial modeling
- Storing nuclear waste
5Outline
- Architecture
- Model of Computation
- API
- Scalable Computation
- Experimental Results
- Conclusions Future Work
6Architecture Basic Components
Clients
Brokers
Hosts
7Architecture Broker Discovery
B
B
B
Broker Naming System
B
B
B
H
B
B
B
8Architecture Broker Discovery
B
B
B
Broker Naming System
B
B
B
H
B
B
B
9Architecture Broker Discovery
B
B
B
Broker Naming System
B
B
B
H
B
B
B
10Architecture Broker Discovery
B
B
B
Broker Naming System
B
B
B
H
B
B
B
PING (BID?)
11Architecture Broker Discovery
B
B
B
Broker Naming System
B
B
B
H
B
B
B
12ArchitectureNetwork of Broker-Managed Host Trees
- Each broker manages a tree of hosts
13ArchitectureNetwork of Broker-Managed Host Trees
14ArchitectureNetwork of Broker-Managed Host Trees
- Brokers form a network
- Client contacts broker
15ArchitectureNetwork of Broker-Managed Host Trees
- Brokers form a network
- Client contacts broker
- Client gets host trees
16Scalable ComputationDeterministic Work-Stealing
Scheduler
addTask( task )
getTask( )
Task container
stealTask( )
HOST
17Scalable ComputationDeterministic Work-Stealing
Scheduler
- Task getWork( )
-
- if ( my deque has a task )
- return task
- else if ( any child has a task )
- return childs task
- else
- return parent.getWork( )
CLIENT
HOSTS
18Models of Computation
- Master-slave
- AFAIK all proposed commercial applications
- Branch--bound optimization
- A generalization of master-slave.
19Models of ComputationBranch Bound
UPPER ? LOWER 0
0
20Models of ComputationBranch Bound
UPPER ? LOWER 2
0
2
21Models of ComputationBranch Bound
UPPER ? LOWER 3
0
2
3
22Models of ComputationBranch Bound
UPPER 4 LOWER 4
0
2
3
4
23Models of ComputationBranch Bound
UPPER 3 LOWER 3
0
2
3
3
4
24Models of ComputationBranch Bound
UPPER 3 LOWER 6
0
2
3
6
3
4
25Models of ComputationBranch Bound
UPPER 3 LOWER 7
26Models of ComputationBranch Bound
- Tasks created dynamically
- Upper bound is shared
- To detect termination scheduler detects tasks
that have been - Completed
- Killed (bounded)
27API
- public class Host implements Runnable
-
- . . .
- public void run()
-
- while ( (node jDM.getWork()) ! null )
-
- if ( isAtomic() )
- compute() // search space return result
- else
-
- child node.branch() // put children in
child array - for (int i 0 i lt node.numChildren
i) - if ( childi.setLowerBound() lt
UpperBound ) - jDM.addWork( childi )
- //else child is killed implicitly
-
-
-
28API
- private void compute()
- . . .
- boolean newBest false
- while ( (node stack.pop()) ! null )
- if ( node.isComplete() )
- if ( node.getCost() lt UpperBound )
- newBest true
- UpperBound node.getCost()
- jDM.propagateValue( UpperBound )
- best Node( childi )
-
- else
- child node.branch()
- for (int i 0 i lt node.numChildren
i) - if ( childi.setLowerBound() lt
UpperBound ) - stack.push( childi )
- //else child is killed implicitly
-
29Scalable ComputationWeak Shared Memory Model
- Slow propagation of bound affects performance not
correctness.
Propagate bound
30Scalable ComputationWeak Shared Memory Model
- Slow propagation of bound affects performance not
correctness.
Propagate bound
31Scalable ComputationWeak Shared Memory Model
- Slow propagation of bound affects performance not
correctness.
Propagate bound
32Scalable ComputationWeak Shared Memory Model
- Slow propagation of bound affects performance not
correctness.
Propagate bound
33Scalable ComputationWeak Shared Memory Model
- Slow propagation of bound affects performance not
correctness.
Propagate bound
34Scalable ComputationFault Tolerance via Eager
Scheduling
- When
- All tasks have been assigned
- Some results have not been reported
- A host wants a new task
- Re-assign a task!
- Eager scheduling tolerates faults balances the
load. - Computation completes, if at least 1 host
communicates with client.
35Scalable ComputationFault Tolerance via Eager
Scheduling
- Scheduler must know which
- Tasks have completed
- Nodes have been killed
- Performance ? balance
- Centralized schedule info
- Decentralized computation
36Experimental Results
37Experimental Results
Example of a bad graph
38Conclusions
- Javelin 2 relieves designer/programmer managing a
set of Inter- networked processors that is - Dynamic
- Faulty
- A wide set of applications is covered by
- Master-slave model
- Branch bound model
- Weak shared memory performs well.
- Use multicast (?) for
- Code distribution
- Propagating values
39Future Work
- Improve support for long-lived computation
- Do not require that the client run continuously.
- A dag model of computation
- with limited weak shared memory.
40Future WorkJini/JavaSpaces Technology
Continuously disperse Tasks among brokers via
a physics model
H
H
H
TaskManager aka Broker
H
H
H
H
H
41Future WorkJini/JavaSpaces Technology
- TaskManager uses persistent JavaSpace
- Host management trivial
- Eager scheduling simple
- No single point of failure
- Fat tree topology
42Future WorkAdvanced Issues
- Privacy of data algorithm
- Algorithms
- New computational complexity model
- Minimize communication between machines
- N-body problem,
- Accounting Associate specific work with specific
host - Correctness
- Compensation (how to quantify?)
- Create international open source organization
- System infrastructure
- Application codes
43(No Transcript)
44Models of ComputationBranch Bound
UPPER 3 LOWER 0
45ArchitectureBroker Name Service (BNS)
BNS
1. Register with BNS
BROKER
HOST
46ArchitectureBroker Name Service (BNS)
BNS
1. Register with BNS
BROKER
2. Get broker list
HOST
47ArchitectureBroker Name Service (BNS)
BNS
1. Register with BNS
BROKER
2. Get broker list
HOST
3. Ping brokers on list
48ArchitectureBroker Name Service (BNS)
BNS
1. Register with BNS
BROKER
2. Get broker list
4. Connect to selected broker
HOST
3. Ping brokers on list