Title: TeraGrid and IWIRE: Models for the Future
1TeraGrid and I-WIRE Models for the Future?
- Rick Stevens and Charlie Catlett
- Argonne National Laboratory
- The University of Chicago
2TeraGrid Interconnect Objectives
- Traditional Interconnect sites/clusters using
WAN - WAN bandwidth balances cost and utilization-
objective to keep utilization high to justify
high cost of WAN bandwidth - TeraGrid Build a wide area machine room
network - TeraGrid WAN objective to handle peak M2M traffic
- Partnering with Qwest to begin with 40 Gb/s and
grow to 80 Gb/s within 2 years. - Long-Term TeraGrid Objective
- Build Petaflops capable distributed system,
requiring Petabytes storage and a Terabit/second
network. - Current objective is to step toward this goal.
- Terabit/second network will require many lambdas
operating at minimum OC-768 and its architecture
is not yet clear.
3Outline and Major Issues
- Trends in national cyberinfrastructure
development - TeraGrid as a model for advanced grid
Infrastructure - I-WIRE as a model for advanced regional fiber
infrastructure - What is needed for these models to succeed
- Recommendations
4Trends Cyberinfrastructure
- Advent of regional dark fiber infrastructure
- Community owned and managed (via 20 yr IRUs)
- Typically supported by state or local resources
- Lambda services (IRUs) viable replacements for
bandwidth service contracts - Need to be structured with built in capability
escalation (BRI) - Need strong operating capability to exploit this
- Regional (NGO) groups moving faster (much
faster!) than national network providers and
agencies - A viable path to putting bandwidth on a Moores
law curve - Source of new ideas for national infrastructure
architecture
5Traditional Cluster Network Access
GbE
OC-48 Cloud
OC-12
High performance cluster system interconnect
using Myrinet with very high bisection bandwidth
(hundreds of GB/s) with external connection of n
x GbE, n is small integer.
(Time to move entire contents of memory)
Traditionally, high-performance computers have
been islands of capability separated by wide area
networks that provide a fraction of a percent of
the internal cluster network bandwidth.
6To Build a Distributed Terascale Cluster
Each node with external GbE
Big Fast Interconnect
Interconnect
(Time to move entire contents of memory and
application state on rotating disk)
5 GB/s 200 nodes x 25 MB/s (20 of GbE per
node)
TeraGrid is building a machine room network
across the country while increasing external
cluster bandwidth to many GbE. Requires edge
systems that handle n x 10 GbE and hubs that
handle minimum 10 x 10 GbE.
713.6 TF Linux TeraGrid
574p IA-32 Chiba City
32
32
256p HP X-Class
32
32
Argonne 64 Nodes 1 TF 0.25 TB Memory 25 TB disk
Caltech 32 Nodes 0.5 TF 0.4 TB Memory 86 TB disk
128p Origin
32
24
128p HP V2500
32
HR Display VR Facilities
24
8
5
8
5
92p IA-32
HPSS
24
HPSS
OC-12
ESnet HSCC MREN/Abilene Starlight
Extreme Black Diamond
4
OC-48
Calren
OC-48
OC-12
NTON
GbE
Juniper M160
OC-12 ATM
NCSA 500 Nodes 8 TF, 4 TB Memory 240 TB disk
SDSC 256 Nodes 4.1 TF, 2 TB Memory 225 TB disk
Juniper M40
Juniper M40
OC-12
vBNS Abilene Calren ESnet
OC-12
2
2
OC-12
OC-3
Myrinet Clos Spine
8
4
UniTree
8
HPSS
2
Sun Starcat
Myrinet Clos Spine
4
1024p IA-32 320p IA-64
1176p IBM SP Blue Horizon
16
14
64x Myrinet
4
32x Myrinet
1500p Origin
Sun E10K
32x FibreChannel
8x FibreChannel
10 GbE
Fibre Channel Switch
32 quad-processor McKinley Servers (128p _at_ 4GF,
12GB memory/server)
16 quad-processor McKinley Servers (64p _at_ 4GF,
8GB memory/server)
IA-32 nodes
Router or Switch/Router
8TeraGrid Network Architecture
- Cluster interconnect using multi-stage
switch/router tree with multiple 10 GbE external
links - Separation of cluster aggregation and site border
routers necessary for operational reasons - Phase 1 Four routers or switch/routers
- each with three OC-192 or 10 GbE WAN PHY
- MPLS to allow for gt10 Gb/s between any two sites
- Phase 2 Add Core routers or switch/routers
- Each with ten OC-192 or 10 GbE WAN PHY
- Ideally should be expandable with additional 10
Gb/s interfaces
9Option 1 Full Mesh with MPLS
Los Angeles
Chicago
One Wilshire (Carrier Fiber Collocation Facility)
455 N. Cityfront Plaza (Qwest Fiber Collocation
Facility)
2200mi
1 mi
710 N. Lakeshore (Starlight)
115mi
20mi
Qwest San Diego POP
140mi
25mi
DWDM
20mi
OC-192
Caltech
SDSC
ANL
NCSA
10 GbE
Site Border Router or Switch/Router
Cienna Corestream DWDM
Cluster Aggregation Switch/Router
DWDM TBD
Caltech Cluster
SDSC Cluster
NCSA Cluster
ANL Cluster
Other site resources
IP Router
10Expansion Capability Starlights
Los Angeles
Chicago
One Wilshire (Carrier Fiber Collocation Facility)
455 N. Cityfront Plaza (Qwest Fiber Collocation
Facility)
2200mi
Regional Fiber Aggregation Points
1 mi
IP Router (packets) or Lambda Router (circuits)
Additional Sites And Networks
710 N. Lakeshore (Starlight)
115mi
20mi
Qwest San Diego POP
140mi
25mi
DWDM
20mi
OC-192
Caltech
SDSC
ANL
NCSA
10 GbE
Site Border Router or Switch/Router
Cienna Corestream DWDM
Cluster Aggregation Switch/Router
DWDM TBD
Caltech Cluster
SDSC Cluster
NCSA Cluster
ANL Cluster
Other site resources
IP Router
11Partnership Toward Terabit/s Networks
- Aggressive Current-Generation TeraGrid Backplane
- 3 x 10 GbE per site today with 40 Gb/s in core
- Grow to 80 Gb/s or higher core within 18-24
months - Requires hundreds of Gb/s in core/hub devices
- Architecture Evaluation for Next-Generation
Backplane - Higher Lambda-Counts, Alternative Topologies
- OC-768 lambdas
- Parallel Persistent Testbed
- Use of 1 or more Qwest 10 Gb/s lambdas to keep
next-generation technology and architecture
testbeds going at all times. - Partnership with Qwest and local fiber/transport
infrastructure to test OC-768 and additional
lambdas. - Can provide multiple, additional dedicated
regional10 Gb/s lambdas and dark fiber for OC-768
testing beginning 2q 2002 via I-WIRE.
12I-Wire Logical and Transport Topology
Starlight (NU-Chicago)
Argonne
Qwest455 N. Cityfront
UC Gleacher 450 N. Cityfront
UIC
UIUC/NCSA
McLeodUSA 151/155 N. Michigan Doral Plaza
Level(3) 111 N. Canal
Illinois Century Network James R. Thompson
Ctr City Hall State of IL Bldg
UChicago
- Next Steps-
- Fiber to FermiLab, other sites
- Additional fiber to ANL, UIC
- DWDM terminals at Level(3), McLeodUSA locations
- Experiments with OC-768, Optical
Switching/Routing
IIT
13Gigapops ? Terapops (OIX)
Gigapop data from Internet2
14Leverage Regional/Community Fiber
15Recommendations
- ANIR Program should support
- Interconnection of fiber islands via bit rate
independent or advanced ?s (BRI ?s) - Hardware to light-up community fibers and build
out advanced testbeds - People resources to run these research community
driven infrastructures - A next gen connection program will not help
advance state of the art - Lambda services need to be BRI