Title: The Energy Sciences Network BESAC August 2004
1The Energy Sciences NetworkBESAC August 2004
- William E. Johnston, ESnet Dept. Head and Senior
Scientist - R. P. Singh, Federal Project Manager
- Michael S. Collins, Stan Kluz,Joseph Burrescia,
and James V. Gagliardi, ESnet Leads - Gizella Kapus, Resource Manager
- and the ESnet Team
- Lawrence Berkeley National Laboratory
Mary Anne Scott Program Manager Advanced
Scientific Computing Research Office of
Science Department of Energy
2What is ESnet?
- Mission
- Provide, interoperable, effective and reliable
communications infrastructure and leading-edge
network services that support missions of the
Department of Energy, especially the Office of
Science - Vision
- Provide seamless and ubiquitous access, via
shared collaborative information and
computational environments, to the facilities,
data, and colleagues needed to accomplish their
goals. - Role
- A component of the Office of Science
infrastructure critical to the success of its
research programs (program funded through
ASCR/MICS managed and operated by ESnet staff at
LBNL).
3Why is ESnet important?
- Enables thousands of DOE, university and industry
scientists and collaborators worldwide to make
effective use of unique DOE research facilities
and computing resources independent of time and
geographic location - Direct connections to all major DOE sites
- Access to the global Internet (managing 150,000
routes at 10 commercial peering points) - User demand has grown by a factor of more than
10,000 since its inception in the mid 1990sa
100 percent increase every year since 1990 - Capabilities not available through commercial
networks - Architected to move huge amounts of data between
a small number of sites - High bandwidth peering to provide access to US,
European, Asia-Pacific, and other research and
education networks.
Objective Support scientific research by
providing seamless and ubiquitous access to the
facilities, data, and colleagues
4How is ESnet Managed?
- A community endeavor
- Strategic guidance from the OSC programs
- Energy Science Network Steering Committee (ESSC)
- BES represented by Nestor Zaluzec, ANL and Jeff
Nichols, ORNL - Network operation is a shared activity with the
community - ESnet Site Coordinators Committee
- Ensures the right operational sociology for
success - Complex and specialized both in the network
engineering and the network management in order
to provide its services to the laboratories in an
integrated support environment - Extremely reliable in several dimensions
- Taken together these points make ESnet a unique
facility supporting DOE science that is quite
different from a commercial ISP or University
network
5what now???
- VISION - A scalable, secure, integrated network
environment for ultra-scale distributed science
is being developed to make it possible to combine
resources and expertise to address complex
questions that no single institution could manage
alone. - Network Strategy
- Production network
- Base TCP/IP services 99.9 reliable
- High-impact network
- Increments of 10 Gbps switched lambdas (other
solutions) 99 reliable - Research network
- Interfaces with production, high-impact and other
research networks start electronic and advance
towards optical switching very flexible
UltraScience Net - Revisit governance model
- SC-wide coordination
- Advisory Committee involvement
6Where do you come in?
- Early identification of requirements
- Evolving programs
- New facilities
- Participation in management activities
- Interaction with BES representatives on ESSC
- Next ESSC meeting on Oct 13-15 in DC area
7What Does ESnet Provide?
- A network connecting DOE Labs and their
collaborators that is critical to the future
process of science - An architecture tailored to accommodate DOEs
large-scale science - move huge amounts of data between a small number
of sites - High bandwidth access to DOEs primary science
collaborators Research and Education
institutions in the US, Europe, Asia Pacific, and
elsewhere - Full access to the global Internet for DOE Labs
- Comprehensive user support, including owning
all trouble tickets involving ESnet users
(including problems at the far end of an ESnet
connection) until they are resolved 24x7
coverage - Grid middleware and collaboration services
supporting collaborative science - trust, persistence, and science oriented policy
8What is ESnet Today?
- Essentially all of the national data traffic
supporting US science is carried by two networks
ESnet and Internet-2 / Abilene (which plays a
similar role for the university community)
9How Do Networks Work?
- Accessing a service, Grid or otherwise, such as a
Web server, FTP server, etc., from a client
computer and client application (e.g. a Web
browser_ involves - Target host names
- Host addresses
- Service identification
- Routing
10How Do Networks Work?
- core routers
- focus on high-speed packet forwarding
LBNL
ESnet (Core network)
router
core router
router
- peering routers
- Exchange reachability information (routes)
- implement/enforce routing policy for each
provider - provide cyberdefense
border router
core router
gateway router
peering router
DNS
- border/gateway routers
- implement separate site and network provider
policy (including site firewall policy)
peeringrouter
router
router
Big ISP(e.g. SprintLink)
router
router
router
Google, Inc.
router
11ESnet Core is a High-Speed Optical Network
ESnet site
site LAN
Site ESnet network policy demarcation (DMZ)
Site IP router
ESnet IP router
ESnet hub
- Wave division multiplexing
- today typically 64 x 10 Gb/s optical channels per
fiber - channels (referred to as lambdas) are usually
used in bi-directional pairs
- Lambda channels are converted to electrical
channels - usually SONET data framing or Ethernet data
framing - can be clear digital channels (no framing e.g.
for digital HDTV)
10GE
10GE
ESnet core
optical fiber ring
A ring topology network is inherently reliable
all single point failures are mitigated by
routing traffic in the other direction around the
ring.
12ESnet Provides Full Internet Serviceto DOE
Facilities and Collaboratorswith High-Speed
Access to all Major Science Collaborators
CAnet4 MREN Netherlands Russia StarTap Taiwan
(ASCC)
SEA HUB
ESnet IP
Japan
Chi NAP
NY-NAP
QWEST ATM
MAE-E
SNV HUB
MAE-W
PAIX-E
Fix-W
PAIX-W
Euqinix
42 end user sites
Office Of Science Sponsored (22)
International (high speed) OC192 (10G/s
optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet
(1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155
Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s)
NNSA Sponsored (12)
Joint Sponsored (3)
Other Sponsored (NSF LIGO, NOAA)
Laboratory Sponsored (6)
peering points
ESnet core Packet over SONET Optical Ring and
Hubs
hubs
SNV HUB
high-speed peering points
13ESnets Peering InfrastructureConnects the DOE
Community With its Collaborators
CAnet4 CERN MREN Netherlands Russia StarTap Taiwa
n (ASCC)
Australia CAnet4 Taiwan (TANet2) Singaren
GEANT - Germany - France - Italy - UK - etc
SInet (Japan) KEK Japan Russia (BINP)
KDDI (Japan) France
PNW-GPOP
SEA HUB
2 PEERS
Distributed 6TAP 19 Peers
Abilene
Japan
1 PEER
CalREN2
NYC HUBS
1 PEER
LBNL
Abilene 7 Universities
SNV HUB
5 PEERS
Abilene
2 PEERS
PAIX-W
26 PEERS
MAX GPOP
MAE-W
22 PEERS
39 PEERS
20 PEERS
FIX-W
6 PEERS
3 PEERS
LANL
CENIC SDSC
Abilene
ATL HUB
TECHnet
ESnet provides access to all of the Internet by
managing the full complement of Global Internet
routes (about 150,000) at 10 general/commercial
peering points high-speed peerings w/ Abilene
and the international RE networks. This is a lot
of work, and is very visible, but provides full
access for DOE.
ESnet Peering (connections to other networks)
University
International
Commercial
14What is Peering?
- Peering points exchange routing information that
says which packets I can get closer to their
destination - ESnet daily peeringreport(top 20 of about 100)
- This is a lot of work
peering with this outfitis not random, it
carriesroutes that ESnet needs(e.g. to the
Russian Backbone Net)
15What is Peering?
- Why so many routes? So that when I want to get
to someplace out of the ordinary, I can get
there. For examplehttp//www-sbras.nsc.ru/eng/sb
ras/copan/microel_main.html (Technological Design
Institute of Applied Microelectronics,
Novosibirsk, Russia)
16Predictive Drivers for the Evolution of ESnet
August 13-15, 2002
Organized by Office of Science Mary Anne Scott,
Chair Dave Bader Steve Eckstrand Marvin
Frazier Dale Koelling Vicky White
Workshop Panel Chairs Ray Bair and Deb
Agarwal Bill Johnston and Mike Wilde Rick
Stevens Ian Foster and Dennis Gannon Linda
Winkler and Brian Tierney Sandy Merola and
Charlie Catlett
- The network is needed for
- long term (final stage) data analysis
- control loop data analysis (influence an
experiment in progress) - distributed, multidisciplinary simulation
- The network and middleware requirements to
support DOE science were developed by the OSC
science community representing major DOE science
disciplines
- Climate
- Spallation Neutron Source
- Macromolecular Crystallography
- High Energy Physics
- Magnetic Fusion Energy Sciences
- Chemical Sciences
- Bioinformatics
Available at www.es.net/research
17The Analysis was Driven by the Evolving Process
of Science
analysis was driven by
18Evolving Quantitative Science Requirements for
Networks
19Observed Drivers for ESnet Evolution
- Are we seeing the predictions of two years ago
come true? - Yes!
20OSC Traffic Increases by 1.9-2.0 X Annually
ESnet is currently transporting about 250
terabytes/mo.(250,000,000 MBy/mo.)
ESnet Monthly Accepted Traffic
TBytes/Month
Annual growth in the past five years has
increased from 1.7x annually to just over 2.0x
annually.
21ESnet Top 20 Data Flows, 24 hr. avg., 2004-04-20
ESnet is Engineered to Move a Lot of Data
A small number of science users account for a
significant fraction of all ESnet traffic
SLAC (US) ? IN2P3 (FR)
1 Terabyte/day
Fermilab (US) ? CERN
SLAC (US) ? INFN Padva (IT)
Fermilab (US) ? U. Chicago (US)
U. Toronto (CA) ? Fermilab (US)
Helmholtz-Karlsruhe (DE)? SLAC (US)
CEBAF (US) ? IN2P3 (FR)
INFN Padva (IT) ? SLAC (US)
Fermilab (US) ? JANET (UK)
SLAC (US) ? JANET (UK)
DOE Lab ? DOE Lab
Argonne (US) ? Level3 (US)
DOE Lab ? DOE Lab
Fermilab (US) ? INFN Padva (IT)
Argonne ? SURFnet (NL)
IN2P3 (FR) ? SLAC (US)
- Since BaBar data analysis started, the top 20
ESnet flows have consistently accounted for 50
of ESnets monthly total traffic (130 of 250
TBy/mo)
22ESnet Top 10 Data Flows, 1 week avg., 2004-07-01
- The traffic is not transient Daily and weekly
averages are about the same. - SLAC is a prototype for what will happen when
Climate, Fusion, SNS, Astrophysics, etc., start
to ramp up the next generation science
SLAC (US) ? INFN Padua (IT)5.9 Terabytes
SLAC (US) ? IN2P3 (FR) 5.3 Terabytes
FNAL (US) ? IN2P3 (FR)2.2 Terabytes
FNAL (US) ? U. Nijmegen (NL)1.0 Terabytes
SLAC (US)? Helmholtz-Karlsruhe (DE) 0.9 Terabytes
CERN ? FNAL (US)1.3 Terabytes
U. Toronto (CA) ? Fermilab (US)0.9 Terabytes
FNAL (US)? Helmholtz-Karlsruhe (DE) 0.6 Terabytes
U. Wisc. (US)? FNAL (US) 0.6 Terabytes
FNAL (US)? SDSC (US) 0.6 Terabytes
23ESnet is a Critical Element of Large-Scale Science
- ESnet is a critical part of the large-scale
science infrastructure of high energy physics
experiments, climate modeling, magnetic fusion
experiments, astrophysics data analysis, etc. - As other large-scale facilities such as SNS
turn on, this will be true across DOE
24Science Mission Critical Infrastructure
- ESnet is a visible and critical piece of general
DOE science infrastructure - if ESnet fails, tens of thousands of DOE and
University users know it within minutes if not
seconds - Requires high reliability and high operational
security in the - network operations, and
- ESnet infrastructure support the systems that
support the operation and management of the
network and services - Secure and redundant mail and Web systems are
central to the operation and security of ESnet - trouble tickets are by email
- engineering communication by email
- engineering database interface is via Web
- Secure network access to Hub equipment
- Backup secure telephony access to all routers
- 24x7 help desk (joint w/ NERSC) and 24x7 on-call
network engineers
25Automated, real-time monitoring of traffic levels
and operating state of some 4400 network entities
is the primary network operational and diagnosis
tool
Network Configuration
Performance
OSPF Metrics(routing and connectivity)
SecureNet
Hardware Configuration
IBGP Mesh(routing and connectivity)
26ESnets Physical Infrastructure
Equipment rack detail at NYC Hub, 32 Avenue of
the Americas (one of ESnets core optical ring
sites)
Picture detail
27Typical Equipment of an ESnet Core Network Hub
Qwest DS3 DCX
Sentry power 48v 30/60 amp panel (3900 list)
AOA Performance Tester (4800 list)
Sentry power 48v 10/25 amp panel (3350 list)
DC / AC Converter (2200 list)
Cisco 7206 AOA-AR1 (low speed links to MIT
PPPL) (38,150 list)
Lightwave Secure Terminal Server (4800 list)
ESnet core equipment _at_ Qwest 32 AofA HUB NYC,
NY (1.8M, list)
Juniper T320 AOA-CR1 (Core router) (1,133,000
list)
Juniper OC192 Optical Ring Interface (the AOA end
of the OC192 to CHI (195,000 list)
Juniper OC48 Optical Ring Interface (the AOA end
of the OC48 to DC-HUB (65,000 list)
Juniper M20 AOA-PR1 (peering RTR) (353,000 list)
28Disaster Recovery and Stability
- Engineers, 24x7 Network Operations Center,
generator backed power - Spectrum (net mgmt system)
- DNS (name IP address translation)
- Eng database
- Load database
- Config database
- Public and private Web
- E-mail (server and archive)
- PKI cert. repository and revocation lists
- collaboratory authorization service
- Remote Engineer
- partial duplicate infrastructure
DNS
Remote Engineer
Duplicate Infrastructure Currently deploying full
replication of the NOC databases and servers and
Science Services databases in the NYC Qwest
carrier hub
- Remote Engineer
- partial duplicate infrastructure
- The network must be kept available even if, e.g.,
the West Coast is disabled by a massive
earthquake, etc.
- Reliable operation of the network involves
- remote NOCs
- replicated support infrastructure
- generator backed UPS power at all critical
network and infrastructure locations
- non-interruptible core - ESnet core operated
without interruption through - N. Calif. Power blackout of 2000
- the 9/11/2001 attacks, and
- the Sept., 2003 NE States power blackout
29ESnet WAN Security and Cyberattack Defense
- Cyber defense is a new dimension of ESnet
security - Security is now inherently a global problem
- As the entity with a global view of the network,
ESnet has an important role in overall security
30 minutes after the Sapphire/Slammer worm was
released, 75,000 hosts running Microsoft's SQL
Server (port 1434) were infected. (The Spread of
the Sapphire/Slammer Worm, David Moore (CAIDA
UCSD CSE), Vern Paxson (ICIR LBNL), Stefan
Savage (UCSD CSE), Colleen Shannon (CAIDA),
Stuart Staniford (Silicon Defense), Nicholas
Weaver (Silicon Defense UC Berkeley EECS)
http//www.cs.berkeley.edu/nweaver/sapphire )
Jan., 2003
30ESnet and Cyberattack Defense
Sapphire/Slammer worm infection hits creating
almost a full Gb/s (1000 megabit/sec.) traffic
spike on the ESnet backbone
31Cyberattack Defense
ESnet third response shut down the main peering
paths and provide only limited bandwidth paths
for specific lifeline services
ESnet second response filter traffic from
outside of ESnet
ESnet first response filters to assist a site
peeringrouter
X
X
router
ESnet
router
LBNL
attack traffic
router
X
borderrouter
Lab first response filter incoming traffic at
their ESnet gateway router
gatewayrouter
peeringrouter
border router
Lab
gatewayrouter
Lab
- Sapphire/Slammer worm infection created a Gb/s of
traffic on the ESnet core until filters were put
in place (both into and out of sites) to damp it
out.
32Science Services Support for Shared,
Collaborative Science Environments
- X.509 identity certificates and Public Key
Infrastructure provides the basis of secure,
cross-site authentication of people and systems
(www.doegrids.org) - ESnet negotiates the cross-site,
cross-organization, and international trust
relationships to provide policies that are
tailored to collaborative science in order to
permit sharing computing and data resources, and
other Grid services - Certification Authority (CA) issues certificates
after validating request against policy - This service was the basis of the first routine
sharing of HEP computing resources between US and
Europe
33Science Services Public Key Infrastructure
Report as of July 15,2004
34Voice, Video, and Data Tele-Collaboration Service
- Another highly successful ESnet Science Service
is the audio, video, and data teleconferencing
service to support human collaboration - Seamless voice, video, and data teleconferencing
is important for geographically dispersed
scientific collaborators - ESnet currently provides to more than a thousand
DOE researchers and collaborators worldwide - H.323 (IP) videoconferences (4000 port hours per
month and rising) - audio conferencing (2500 port hours per month)
(constant) - data conferencing (150 port hours per month)
- Web-based, automated registration and scheduling
for all of these services - Huge cost savings for the Labs
35ESnets Evolution over the Next 10-20 Years
- Upgrading ESnet to accommodate the anticipated
increase from the current 100/yr traffic growth
to 300/yr over the next 5-10 years is priority
number 7 out of 20 in DOEs Facilities for the
Future of Science A Twenty Year Outlook - Based on the requirements of the OSC Network
Workshops, ESnet must address - Capable, scalable, and reliable production IP
networking - University and international collaborator
connectivity - Scalable, reliable, and high bandwidth site
connectivity - Network support of high-impact science
- provisioned circuits with guaranteed quality of
service(e.g. dedicated bandwidth) - Science Services to support Grids,
collaboratories, etc
36New ESnet Architecture to Accommodate OSC
- The future requirements cannot be met with the
current, telecom provided, hub and spoke
architecture of ESnet
Chicago (CHI)
New York (AOA)
ESnetCore
DOE sites
Washington, DC (DC)
Sunnyvale (SNV)
Atlanta (ATL)
El Paso (ELP)
- The core ring has good capacity and resiliency
against single point failures, but the
point-to-point tail circuits are neither reliable
nor scalable to the required bandwidth
37Evolving Requirements for DOE Science Network
Infrastructure
S
C
S
C
guaranteedbandwidthpaths
I
1-40 Gb/s,end-to-end
I
2-4 yr Requirements
1-3 yr Requirements
C
C
C
C
storage
S
S
S
compute
C
instrument
I
cache compute
CC
S
C
CC
CC
I
CC
CC
CC
C
3-5 yr Requirements
4-7 yr Requirements
CC
100-200 Gb/s,end-to-end
C
S
38- ESnet new architecture goals full redundant
connectivity for every site and high-speed access
for every site (at least 10 Gb/s) - Two part strategy
- 1) MAN rings provide dual site connectivity and
much higher site bandwidth - 2) A second backbone will provide
- multiply connected MAN rings for protection
against hub failure - extra backbone capacity
- a platform for provisioned, guaranteed bandwidth
circuits - alternate path for production IP traffic
- carrier neutral hubs
Europe
Asia-Pacific
NLR (2nd Backbone)
Chicago (CHI)
New York (AOA)
ESnetExistingCore
Washington, DC (DC)
Sunnyvale(SNV)
Existing hubs
Atlanta (ATL)
New hubs
El Paso (ELP)
DOE/OSC sites
39Conclusions
- ESnet is an infrastructure that is critical to
DOEs science mission - Focused on the Office of Science Labs, but serves
many other parts of DOE - ESnet is working hard to meet the current and
future networking need of DOE mission science in
several ways - Evolving a new high speed, high reliability,
leveraged architecture - Championing several new initiatives which will
keep ESnets contributions relevant to the needs
of our community
40Reference -- Planning Workshops
- High Performance Network Planning Workshop,
August 2002 - http//www.doecollaboratory.org/meetings/hpnpw
- DOE Workshop on Ultra High-Speed Transport
Protocols and Network Provisioning for
Large-Scale Science Applications, April 2003 - http//www.csm.ornl.gov/ghpn/wk2003
- Science Case for Large Scale Simulation, June
2003 - http//www.pnl.gov/scales/
- DOE Science Networking Roadmap Meeting, June 2003
- http//www.es.net/hypertext/welcome/pr/Roadmap/ind
ex.html - Workshop on the Road Map for the Revitalization
of High End Computing, June 2003 - http//www.cra.org/Activities/workshops/nitrd
- http//www.sc.doe.gov/ascr/20040510_hecrtf.pdf
(public report) - ASCR Strategic Planning Workshop, July 2003
- http//www.fp-mcs.anl.gov/ascr-july03spw
- Planning Workshops-Office of Science
Data-Management Strategy, March May 2004 - http//www-conf.slac.stanford.edu/dmw2004
(report coming soon)