CX: A Scalable, Robust Network for Parallel Computing PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: CX: A Scalable, Robust Network for Parallel Computing


1
CX A Scalable, Robust Network for Parallel
Computing
  • Peter Cappello Dimitrios Mourloukos
  • Computer Science
  • UCSB

2
Outline
  1. Introduction
  2. Related work
  3. API
  4. Architecture
  5. Experimental results
  6. Current future work

3
Introduction
  • Listen to the technology! Carver Mead

4
Introduction
  • Listen to the technology! Carver Mead
  • What is the technology telling us?

5
Introduction
  • Listen to the technology! Carver Mead
  • What is the technology telling us?
  • Internets idle cycles/sec growing rapidly

6
Introduction
  • Listen to the technology! Carver Mead
  • What is the technology telling us?
  • Internets idle cycles/sec growing rapidly
  • Bandwidth increasing getting cheaper

7
Introduction
  • Listen to the technology! Carver Mead
  • What is the technology telling us?
  • Internets idle cycles/sec growing rapidly
  • Bandwidth is increasing getting cheaper
  • Communication latency is not decreasing

8
Introduction
  • Listen to the technology! Carver Mead
  • What is the technology telling us?
  • Internets idle cycles/sec growing rapidly
  • Bandwidth increasing getting cheaper
  • Communication latency is not decreasing
  • Human technology is getting neither cheaper nor
    faster.

9
IntroductionProject Goals
  • Minimize job completion time
  • despite large communication latency

10
IntroductionProject Goals
  • Minimize job completion time
  • despite large communication latency
  • Jobs complete with high probability
  • despite faulty components

11
IntroductionProject Goals
  • Minimize job completion time
  • despite large communication latency
  • Jobs complete with high probability
  • despite faulty components
  • Application program is oblivious to
  • Number of processors
  • Inter-process communication
  • Fault tolerance

12
IntroductionFundamental Issue Heterogeneity

OS1
OS2
OS3
OS4
OS5
M1
M2
M3
M4
M5
Heterogeneous machine/OS
13
IntroductionFundamental Issue Heterogeneity

OS1
OS2
OS3
OS4
OS5
M1
M2
M3
M4
M5
Heterogeneous machine/OS
Functionally Homogeneous JVM
?
14
Outline
  1. Introduction
  2. Related work
  3. API
  4. Architecture
  5. Experimental results
  6. Current future work

15
Related work
  • Cilk ? Cilk-NOW ? Atlas
  • DAG computational model
  • Work-stealing

16
Related work
  • Linda ? Piranha ? JavaSpaces
  • Space-based coordination
  • Decoupled communication

17
Related work
  • Charlotte (Milan project / Calypso prototype)
  • High performance ? Fault tolerance not achieved
    via transactions
  • Fault tolerance via eager scheduling

18
Related work
  • SuperWeb ?Javelin?Javelin
  • Architecture client, broker, host

19
Outline
  1. Introduction
  2. Related work
  3. API
  4. Architecture
  5. Experimental results
  6. Current future work

20
API
  • DAG Computational model
  • int f( int n )
  • if ( n lt 2 )
  • return n
  • else
  • return f( n-1 ) f( n-2 )

21
DAG Computational Model
int f( int n ) if ( n lt 2 ) return n else
return f( n-1 ) f( n-2 )
f(4)
Method invocation tree
22
DAG Computational Model
int f( int n ) if ( n lt 2 ) return n else
return f( n-1 ) f( n-2 )
f(4)
f(3)
f(2)
Method invocation tree
23
DAG Computational Model
int f( int n ) if ( n lt 2 ) return n else
return f( n-1 ) f( n-2 )
f(4)
f(3)
f(2)
f(2)
f(1)
f(1)
f(0)
Method invocation tree
24
DAG Computational Model
  • int f( int n )
  • if ( n lt 2 ) return n
  • else return f( n-1 ) f( n-2 )

f(4)
f(3)
f(2)
f(1)
f(1)
f(0)
f(2)
f(1)
f(0)
Method invocation tree
25
DAG Computational Model / API
execute( ) if ( n lt 2 ) setArg(
, n ) else spawn ( )
spawn ( ) spawn ( )
_______________________________
f(n)
f(4)


f(n-1)
f(n-2)
execute( ) setArg( , in0
in1 )


26
DAG Computational Model / API
execute( ) if ( n lt 2 ) setArg(
, n ) else spawn ( )
spawn ( ) spawn ( )
_______________________________
f(n)
f(4)

f(3)
f(2)

f(n-1)
f(n-2)
execute( ) setArg( , in0
in1 )



27
DAG Computational Model / API
execute( ) if ( n lt 2 ) setArg(
, n ) else spawn ( )
spawn ( ) spawn ( )
_______________________________
f(n)
f(4)

f(3)
f(2)

f(n-1)
f(2)
f(1)
f(1)
f(0)
f(n-2)

execute( ) setArg( , in0
in1 )




28
DAG Computational Model / API
execute( ) if ( n lt 2 ) setArg(
, n ) else spawn ( )
spawn ( ) spawn ( )
_______________________________
f(n)
f(4)

f(3)
f(2)

f(n-1)
f(2)
f(1)
f(1)
f(0)
f(n-2)

f(1)
f(0)

execute( ) setArg( , in0
in1 )




29
Outline
  1. Introduction
  2. Related work
  3. API
  4. Architecture
  5. Experimental results
  6. Current future work

30
Architecture Basic Entities
register ( spawn getResult ) unregister
CONSUMER
PRODUCTION NETWORK
CLUSTER NETWORK
31
Architecture Cluster
PRODUCER
TASK SERVER
PRODUCER
PRODUCER
PRODUCER
32
A Cluster at Work
TASK SERVER
PRODUCER
READY
PRODUCER
WAITING
33
A Cluster at Work
f(4)
TASK SERVER
PRODUCER
READY
f(4)
PRODUCER
WAITING
34
A Cluster at Work
f(4)
TASK SERVER
PRODUCER
READY
f(4)
f(4)
PRODUCER
WAITING
35
A Cluster at Work
f(4)
TASK SERVER
PRODUCER
READY
f(4)
PRODUCER
WAITING
36
Decompose
  • execute( )
  • if ( n lt 2 )
  • setArg( ArgAddr, n )
  • else
  • spawn ( )
  • spawn ( f(n-1) )
  • spawn ( f(n-2) )

37
A Cluster at Work
f(4)
TASK SERVER
PRODUCER
READY
f(4)
f(3)
f(3)
f(2)
f(2)
PRODUCER
WAITING


38
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(3)
f(3)
f(2)
f(2)
PRODUCER
WAITING


39
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(3)
f(3)
f(3)
f(2)
f(2)
PRODUCER
WAITING
f(2)


40
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(3)
f(3)
f(2)
PRODUCER
WAITING
f(2)


41
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(3)
f(2)
f(3)
f(2)
f(1)
f(1)
f(0)
f(2)
f(1)
f(1)
f(0)

PRODUCER
WAITING

f(2)




42
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(2)
f(1)
f(1)
f(0)
f(2)
f(1)
f(1)
f(0)

PRODUCER
WAITING





43
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(2)
f(2)
f(1)
f(0)
f(1)
f(2)
f(1)
f(1)
f(0)

PRODUCER
WAITING
f(1)





44
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(2)
f(0)
f(1)
f(2)
f(1)
f(1)
f(0)

PRODUCER
WAITING
f(1)





45
Compute Base Case
  • execute( )
  • if ( n lt 2 )
  • setArg( ArgAddr, n )
  • else
  • spawn ( )
  • spawn ( f(n-1) )
  • spawn ( f(n-2) )

46
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(1)
f(2)
f(0)
f(0)
f(1)
f(2)
f(1)
f(1)
f(0)
f(1)
f(0)

PRODUCER
WAITING
f(1)







47
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(1)
f(0)
f(0)
f(1)
f(1)
f(0)
f(1)
f(0)

PRODUCER
WAITING







48
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(1)
f(1)
f(0)
f(0)
f(1)
f(1)
f(0)
f(1)
f(0)

PRODUCER
WAITING
f(0)







49
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(1)
f(1)
f(0)
f(1)
f(0)
f(1)
f(0)

PRODUCER
WAITING
f(0)







50
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(1)
f(1)
f(0)
f(1)
f(0)
f(1)
f(0)

PRODUCER
WAITING
f(0)







51
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(1)
f(0)

f(1)
f(0)

PRODUCER
WAITING







52
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(1)
f(0)

f(1)
f(0)

PRODUCER
WAITING






53
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(1)

f(0)

f(1)
f(0)

PRODUCER
WAITING
f(1)






54
A Cluster at Work
TASK SERVER
PRODUCER
READY

f(0)
f(1)
f(0)

PRODUCER
WAITING
f(1)






55
Compose
  • execute( )
  • setArg( ArgAddr, in0 in1 )

56
A Cluster at Work
TASK SERVER
PRODUCER
READY

f(0)
f(1)
f(0)

PRODUCER
WAITING
f(1)






57
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(0)
f(0)
PRODUCER
WAITING






58
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(0)
f(0)
PRODUCER
WAITING
f(0)






59
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(0)
PRODUCER
WAITING
f(0)






60
A Cluster at Work
TASK SERVER
PRODUCER
READY
f(0)
PRODUCER
WAITING
f(0)






61
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING






62
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING





63
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING






64
A Cluster at Work
TASK SERVER
PRODUCER
READY
PRODUCER
WAITING






65
A Cluster at Work
TASK SERVER
PRODUCER
READY
PRODUCER
WAITING






66
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING




67
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING



68
A Cluster at Work
TASK SERVER
PRODUCER
READY


PRODUCER
WAITING



69
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING



70
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING



71
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING


72
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING

73
A Cluster at Work
TASK SERVER
PRODUCER
READY


PRODUCER
WAITING

74
A Cluster at Work
TASK SERVER
PRODUCER
READY

PRODUCER
WAITING

75
A Cluster at Work
TASK SERVER
PRODUCER
READY

R
PRODUCER
WAITING

76
A Cluster at Work
TASK SERVER
PRODUCER
  • Result object is sent to Production Network
  • Production Network
  • returns it to Consumer

READY
R
PRODUCER
WAITING
77
Task Server ProxyOverlap Communication with
Computation
PRODUCER
TASK SERVER
READY
Task Server Proxy
PRIORITY Q
COMP
COMM
OUTBOX
INBOX
WAITING
78
Architecture Work stealing eager scheduling
  • A task is removed from server only after a
    complete signal is received.
  • A task may be assigned to multiple producers
  • Balance task load among producers of varying
    processor speeds
  • Tasks on failed/retreating producers are
    re-assigned.

79
Architecture Scalability
  • A cluster tolerates producer
  • Retreat
  • Failure
  • 1 task server however is a
  • Bottleneck
  • Single point of failure.
  • We introduce a network of task servers.

80
Scalability Class loading
  • CX class loader loads classes
  • (Consumer JAR) in each
  • servers class cache

2. Producer loads classes from its server
81
Scalability Fault-tolerance
Replicate a servers tasks on its sibling
82
Scalability Fault-tolerance
Replicate a servers tasks on its sibling
83
Scalability Fault-tolerance
When server fails, its sibling restores state to
replacement server
Replicate a servers tasks on its sibling
84
ArchitectureProduction network of clusters
  • Network tolerates single server failure.
  • Restores ability to tolerate a single failure.
  • ? ability to tolerate a sequence of failures

85
Outline
  1. Introduction
  2. Related work
  3. API
  4. Architecture
  5. Experimental results
  6. Current future work

86
Preliminary experiments
  • Experiments run on Linux cluster
  • 100 port Lucent P550 Cajun Gigabit Switch
  • Machine
  • 2 Intel EtherExpress Pro 100 Mb/s Ethernet cards
  • Red Hat Linux 6.0
  • JDK 1.2.2_RC3
  • Heterogeneous
  • processor speeds
  • processors/machine

87
Fibonacci Tasks with Synthetic Load
execute( ) if ( n lt 2 )
synthetic workload() setArg( , n
) else synthetic workload() spawn (
) spawn ( ) spawn (
)
execute( ) synthetic
workload() setArg( , in0 in1 )
f(n)




f(n-1)
f(n-2)
88
TSEQ vs. T1 (seconds)Computing F(8)
Workload TSEQ T1 Efficiency
4.522 497.420 518.816 0.96
3.740 415.140 436.897 0.95
2.504 280.448 297.474 0.94
1.576 179.664 199.423 0.90
0.914 106.024 120.807 0.88
0.468 56.160 65.767 0.85
0.198 24.750 29.553 0.84
0.058 8.120 11.386 0.71
89
Average task time Workload 1 1.8 sec. Workload
2 3.7 sec.
Parallel efficiency for F(13) 0.87 Parallel
efficiency for F(18) 0.99
90
Outline
  1. Introduction
  2. Related work
  3. API
  4. Architecture
  5. Experimental results
  6. Current future work

91
Current work
  • Implement CX market maker (broker)
  • Solves discovery problem between Consumers
    Production networks
  • Enhance Producer with Leas Fork/Join Framework
  • See gee.cs.oswego.edu

92
Current work
  • Enhance computational model branch bound.
  • Propagate new bounds thru production network 3
    steps

SEARCH TREE
PRODUCTION NETWORK
BRANCH
TERMINATE!
93
Current work
  • Enhance computational model branch bound.
  • Propagate new bounds thru production network 3
    steps

SEARCH TREE
PRODUCTION NETWORK
TERMINATE!
94
Current work
  • Investigate computations that appear ill-suited
    to adaptive parallelism
  • SOR
  • N-body.

95
End of CX Presentation
  • www.cs.ucsb.edu/research/cx
  • Next release End of June, includes source.
  • E-mail cappello_at_cs.ucsb.edu

96
IntroductionFundamental Issues
  • Communication latency
  • Long latency ? Overlap computation with
    communication.
  • Robustness
  • Massive parallelism ? faults
  • Scalability
  • Massive parallelism ? login privileges cannot be
    required.
  • Ease of use
  • Jini ? easy upgrade of system components

97
Related work
  • Market mechanisms
  • Huberman, Waldspurger, Malone, Miller Drexler,
    Newhouse Darlington

98
Related work
  • CX integrates
  • DAG computational model
  • Work-stealing scheduler
  • Space-based, decoupled communication
  • Fault-tolerance via eager scheduling
  • Market mechanisms (incentive to participate)

99
Architecture Task identifier
  • Dag has spawn tree
  • TaskID path id
  • Root.TaskID 0
  • TaskID used to detect duplicate
  • Tasks
  • Results.

F(4)
1
2
F(3)
F(2)
2
1
1
2
0
F(2)
F(1)
F(1)
F(0)
2
1
F(1)
F(0)

0
0
0



100
Architecture Basic Entities
  • Consumer
  • Seeks computing resources.
  • Producer
  • Offers computing resources.
  • Task Server
  • Coordinates task distribution among its
    producers.
  • Production Network
  • A network of task servers their associated
    producers.

101
Defining Parallel Efficiency
  • Scalar Homogeneous set of P machines
  • Parallel efficiency (T1 / P) / TP
  • Vector Heterogeneous set of P machines
  • P P1, P2, , Pd , where there are
  • P1 machines of type 1,
  • P2 machines of type 2,
  • Pd machines of type d
  • Parallel efficiency ( P1 / T1 P2 / T2 Pd
    / Td ) 1 / TP

102
Future work
  • Support special hardware / data inter-server
    task movement.
  • Diffusion model
  • Tasks are homogeneous gas atoms diffusing through
    network.
  • N-body model Each kind of atom (task) has its
    own
  • Mass (resistance to movement code size, input
    size, )
  • attraction/repulsion to different servers
  • Or other massive entities, such as
  • special processors
  • large data base.

103
Future Work
  • CX preprocessor to simplify API.
Write a Comment
User Comments (0)
About PowerShow.com