Title: DAS 3 and StarPlane have Landed
1DAS 3 and StarPlane have Landed
Freek Dijkstra
2DAS history
- Project to prove distributed clusters are as
effective as supercomputers - Simple Computer Science grid that works
3Parallel to Distributed Computing
- Cluster Computing
- Parallel languages (Orca, Spar)
- Parallel applications
- Distributed Computing
- Parallel processing on multiple clusters
- Study non-trivially parallel applications
- Exploit hierarchical structure forlocality
optimizations - Grid Computing
4DAS-2 Usage
- 200 users 25 Ph.D. Theses
- Simple, clean, laboratory-like system
- Example Applications
- Solving Awari (3500-year old game)
- HIRLAM Weather forecasting
- GRAPE simulation hardware for astrophysics
- Manta distributed supercomputing in Java
- Ensflow Stochastic ocean flow model
http//www.cs.vu.nl/das2/
5Grid Computing
- Ibis Java-centric grid computing
- Satin divide-and-conquer on grids
- Zorilla P2P distributed supercomputing
- KOALA co-allocation of grid resources
- CrossGrid interactive simulation and
visualization of a biomedical system - VL-e scientific collaboration using the grid
(e-Science) - LamdaRAM share memory among cluster nodes
Applications
Grid Middleware
Computing Clusters Network
6Colourful Future DAS-3
- Timeline
- Autumn DAS-3 proposal initiated
- Summer Proposal accepted
- September European tender preparation
- December Tender call
- February Five proposals received
- April ClusterVision chosen
- June Pilot cluster at VU
- August Intended installation
- End Official ending DAS-2
- Funding
- NWO, NCF, VL-e (UvA, Delft, part VU),
MultimediaN (UvA), Universiteit Leiden
2004
2005
2006
7DAS-2 Cluster
head node
1 Gbit/s Ethernet
M y r i n e t
To local University and wide area interconnect
100 Mb/s Ethernet
2 Gbit/s
Fast interconnect
Local interconnect
32-72 compute nodes
8DAS-3 Cluster
head node
10 Gbit/s Ethernet
M y r i n e t
N o r t e l
To SURFnet
To local University
10 Gbit/s Ethernet
1 Gbit/s Ethernet
10 Gbit/s
Fast interconnect
Local interconnect
32-85 compute nodes
9Heterogeneous Clusters
10Problem space
DAS-2
CPU
Data
Network
DAS-3 StarPlane
11SURFnet6
- In The Netherlands SURFnet connects between 180
- universities
- academic hospitals
- most polytechnics
- research centers.
- with a user base of 750k users
- 6000km fiber
- comparable to railway system
12Common Photonic Layer (CPL)
- 5 rings
- Initially 36 lambdas (4x9)
- Later 72 lambdas (8x9)
- Troughput of each lambda is up to 10 Gb/s now
- Later up to 40 Gb/s per lambda
13Quality of Service (QoS) by providing wavelengths
- Old Quality of Service
- One fiber, with a single lambda
- Set part of it aside on request
- Rest gets less service
- New Quality of Service
- One fiber, multiple lambda (separate colours)
- Move requests to other lambdas as needed
- Rest also gets happier!
14StarPlane Topology
- 4 DAS-3 sites, with 5 clusters
- Interconnected with 4 to 8 dedicated lambdas of
10 Gb/s each - Same fiber as for regular Internet
- External Connectivity
- Grid 5000
- GridLab
- Media archives in Hilversum
15StarPlane Project
- StarPlane will use the SURFnet6 infrastructure to
interconnect the DAS-3 sites - The novelty to give flexibility directly to the
applications by allowing them to choose the
logical topology in real time - Ultimately configure within subseconds
- People and Timeline
- 1 postdoc, 1 AIO, 1 scientific programmer(Jason
Maassen - VU Li Xu - UvA JP Velders - UvA) - February 2006 - February 2010
- Funding
- NWO, with major contributions from SURFnet and
Nortel.
16Application - Network Interaction
Network
Application
Use
Request start, ring, full mesh
Configuration
Control Plane
17Application - Network Interaction
Application Initiated Network Configuration
Network
time
Work Flow Manager
Workflow Initiated Network Configuration
App1
App2
App3
Network
time
18StarPlane Applications
- Large stand-alone file transfers
- User-driven file transfers
- Nightly backups
- Transfer of medical data files (MRI)
- Large file (speedier) Stage-in/Stage-out
- MEG modeling (Magneto encephalography)
- Analysis of video data
- Application with static bandwidth requirements
- Distributed game-tree search
- Remote data access for analysis of video data
- Remote visualization
- Applications with dynamic bandwidth requirements
- Remote data access for MEG modeling
- SCARI
19Conclusions
- This fall, DAS-3 will be available at a
university near you - StarPlane allows applications to configure the
network - We aim for fast (subsecond) lambda switching.
- Workflow systems and/or applications need to
become network aware - For details see the StarPlane poster this
evening!
20DAS 3 and StarPlane have Landed
- Architecture, Status ...
- ... and Application Research
21Network Memory
- LambdaRAM software uses memory in the local
cluster as a local cache. - Faster then caching at disk (access time 1ms
for network 10ms for disk)
(Very) high-rez remote image
Blue box active (visualized) zoom region
Green area cached on other cluster nodes
http//www.evl.uic.edu/cavern/optiputer/lambdaram.
html