Title: Enabling Platforms for highperformance computational Grids oriented to scalable virtual organization
1Enabling Platforms for high-performance
computational Grids oriented to scalable virtual
organization (GRID.IT)
- P. Castoldi, F. Baroncelli, F. Cugini, B.
Martini, - V. Martini, F. Paolucci, L. Valcarenghi
TERENA Workshop on "Service Oriented Optical
Networks Catania, May 14th 2006
2Facts about the GRID.IT project
15 Workpackages WP1 - Grid Oriented Optical
Switching Paradigms WP2 - High Performance
Photonic Testbed WP3 - Grid Deployment WP4 -
Security WP5 - Data Intensive Core Services WP6 -
Knowledge Services for Intensive Data
Analysis WP7 - Grid Portals WP8 -
High-performance Component-based Programming
Environments WP9 - Grid-enabled Scientific
Libraries WP10 - Grid Applications for
Astrophysics WP11- Grid Applications for Earth
Observation Systems Application WP12 - Grid
Applications for Biology WP13 - Grid Applications
for Molecular Virtual Reality WP14 - Grid
Applications for Geophysics WP15 - Management
- National project
- funded by Ministry of University and Research
under the FIRB (Fundamental Research Incentive
Fund) line - Duration 31 year
- (Nov. 02 Oct. 06)
- 4 clusters of partners
- CNIT, (5 universities)
- UTDallas, CNIT subcontractor
- CNR, National Research Council (3 institutes)
- INFN,National Institute for Nuclear Physics (3
institutes) - ASI, Italian Space Agency
CNIT (National Inter-university Consortium for
Telecommunications) is a non-profit Consortium of
34 Italian universities operating in the telecom
area, coordinating large research initiatives
with own researchers and staff from affiliated
universities
3Global Grid Computing
Global Grid Computing expands resource horizon
from LAN to WAN (not limited to optical networks
..)
- Bottlenecks
- Computational, storage, etc. resources (CPU)
... same as before - Network Resources become scarse and difficult to
be reserved
- Requirements
- QoS-enabled network connectivity
- Network resource monitoring, adaptation and
availability - Application task staging should be network-aware
- Grid user should possess some ability to trigger
network connectivity
- A solution - main streamline
- A new functional layer, consisting of network
middleware is introduced to meet the above
requirements .. general concept ..
4A network-centric view of a Grid
A Grid network is an overlay L7 network on top of
an independent L1/2/3 network
Application
L7 Grid Middleware
L7 Grid Resources
Scheduler
GRAM
GRAM Grid Resource Allocation Manager
- Grid Network Interface should ()
- hide network details (e.g., topology,
configuration) to the Grid middleware - be as simple as possible
- allow end-to-end, on-demand, and real-time
service requests
Network Interface
L1/2/3 Resource Management System
L1/2/3 Resources
Network Monitoring
Network Configurator
OIF-UNI
Note that, the network service interfaces for
Grid will have a higher level of abstraction
(hiding details) than what is provided by a
traditional Service Network or Element Management
System ()
Transport Network
() Draft-ggf-masum-grid-network-services
5Application services and network services
- Customers run application services that exploit
(stack of) network protocols for connectivity
needs - Application services are abstract description of
application (logic) - Network protocols (transport and ancillary
functions such as routing, signaling, link
management) are logically classified in a few
categories of network services connectionless
IP, L1/L2/L3 VPN
Customer3
Customer4
End-user
Customer5
Customer1
Customer5
Customer2
Application services
Internet Access
Hosting
VoIP/MoIP
Storage
Grid
PSTN
IP
MPLS
POS
ATM
TDM
Fast/Giga Ethernet
Network Protocols
SDH
SONET
SDH
SDH
SDH
WDM services
6From UNI to a service interface
- Via User to Network Interface (UNI) in Control
Plane-enabled ASTN/GMPLS networks client networks
can request some network services but (e.g.) - it provides only point-to-point network services
- it does not coordinate services provided by an
arbitrary set of edge nodes - it is not designed to be used by applications
- Consistently with existing approaches (IMS and
NGN), and efforts of other EU project (MUPBED,
NOBEL), applications (e.g. grid) should be
enabled to set-up an application platform, i.e.
a network service tailored to their needs - To this purpose, a Service Plane is used that
exports a new service interface towards
applications, namely the user-to-service
interface (USI).
7The SO-ASTN
UNI
Control Plane
Edge CPE
CPE
Edge CPE
NMI-A
Management Plane
CCI
CCI
CCI
Client/Access Network
Transport Plane
NMI-T
8The service interface
- UPI is an interim human-to-human or
machine-to-machine interface mediated by the MP
currently used as a service interface. - The USI is an evolved machine-to-machine
interface that must enable the application entity
to require services - provided by different administrative network
domains - without dealing with the network technology
details - without dealing with the network topology details
- The USI must support
- both executive on multiple administrative domains
or informative services on an administrative
domain - the transparency of applications across multiple
domains - session-based services (e.g., high-definition
video-telephony) - non-session-based services (e.g., e-Business
transactions)
9On-Demand VPN via USIExperimental demonstration
6
1
Site A
DSE 1
DSE 3
DSE 2
2
2
AE-A
5
5
B
A
C
3
4
ER 1 Router ID 1.1.1.1
3
4
4
VPN
3
Site B
CR 3
Site C
ER 2 Router ID 2.2.2.2
MPLS OSPF RSVP-TE
CR 2
AE-B
ER 3 Router ID 3.3.3.3
AE-C
CR 1
1 - VPN Service Request (B,C, bandwidth) 2 - VPN
Config (router ID, groups name, VPN id,
bandwidth) 3 - VPN Routing Configuration (local
address, groups name, routing instance) 4 -
Tunnel LSP Set-up (egress router 1, bandwidth)
5 DSE ACK 6 - VPN ACK
AE Application Entity ER Edge Router CR
Core Router DSE Distributed Service Element
10Supporting functions for the SP
RESOURCE PROVISIONING
RESOURCE MONITORING
- Network Topology why?
- Grid Applications need network topology to
optimally allocate tasks among different sites. - A detailed topology detector is needed in order
to satisfy QoS requirements - So far ..
- Existing tools provide Grids with only
end-to-end network parameters, not sufficient in
case of guaranteed-bandwidth connection requests
(LSP, VPN)
- Path Computation Element (PCE) why?
- Definition Entity capable of computing a network
path or route based on a network graph and
applying computational constraints. - Advantages
- Traffic Engineering (TE) route elaboration may be
highly CPU-intensive. PCE avoids router CPU
utilization. - Optimal TE solutions, administrative policies and
optimal Management solutions - Useful in scenarios where the node has limited
visibility of the network topology to the
destination (multi-area, multi-domain,multi-layer)
INTEGRATED FAULT TOLERANCE
- Combining network and application resilience
mechanisms why? - Grid fault tolerant schemes alone may not be as
efficient as network resilience schemes - Application layer scheme may not restore
previous QoS connectivity in full
11Centralized TDS
3. XML Replies
1. Topology request
- Based on a central resource broker.
- Broker has the routers list and administrator
privileges on them. - Broker directly queries routers with
router-based requests. - Three kinds of topologies can be discovered
- The Grid topology is discovered or updated in
time ranges of a few seconds
2. USI Queries
4. XML Topology file
12TDS XML Topologies and Retrieval Strategies
Topologies
TDS Triggering Mechanisms
- EVENT-DRIVEN BASED
- Network status changes active network
monitoring - SNMP traps sent by VO nodes
- TIMEOUT BASED
- Periodical polling
- Delivery time ltTimeout
- No active monitoring
TDS Update Methods
- GLOBAL
- Refresh entire topology at each invocation
- Large number of messages exchanged
- INCREMENTAL
- Update of existing topology
- Low network load
13Path Computation Element (PCE)
TED
lttopologygt ltnodegt
ltnode-idgt10.10.1.1lt/node-idgt
ltnum-linksgt2lt/num-linksgt ltlinkgt
ltadj-node-idgt10.10.2.1lt/adj-node-idgt
ltavailable-bw7gt1000lt/available-bw7gt
lt/linkgt ltlinkgt
ltadj-node-idgt10.10.3.1lt/adj-node-idgt
ltavailable-bw7gt1000lt/available-bw7gt
lt/linkgt lt/nodegt . lt/topologygt
C elaboration
XSLT elaboration
ltted-database junosstyle"detail"gt
ltted-database-idgt10.10.14.1-1lt/ted-database-idgt
ltted-database-typegtNetlt/ted-database-typegt
ltted-database-agegt22648lt/ted-database-agegt
ltted-database-link-ingt2lt/ted-database-link-in
gt ltted-database-link-outgt2lt/ted-database-link
-outgt ltted-database-protocolgtOSPF(0.0.0.0)lt/t
ed-database-protocolgt - ltted-link
junosstyle"database"gt
ltted-link-togt10.10.13.2lt/ted-link-togt
ltted-link-local-addressgt0.0.0.0lt/ted-link-local-ad
dressgt ltted-link-remote-addressgt0.0.0.0lt/
ted-link-remote-addressgt
ltted-link-metricgt0lt/ted-link-metricgt -
ltswitching-capability-descriptor
heading"ISCD(1)"gt
ltswitching-typegtPacketlt/switching-typegt
ltencoding-typegtPacketlt/encoding-typegt
ltmaximum-lsp-bw0gt0 0bpslt/maximum-lsp-bw0gt
ltmaximum-lsp-bw1gt1
0bpslt/maximum-lsp-bw1gt
ltmaximum-lsp-bw2gt2 0bpslt/maximum-lsp-bw2gt
ltmaximum-lsp-bw3gt3 0bpslt/maximum-lsp-bw3gt
ltmaximum-lsp-bw4gt4
0bpslt/maximum-lsp-bw4gt
ltmaximum-lsp-bw5gt5 0bpslt/maximum-lsp-bw5gt
ltmaximum-lsp-bw6gt6 0bpslt/maximum-lsp-bw6gt
ltmaximum-lsp-bw7gt7
0bpslt/maximum-lsp-bw7gt
lt/switching-capability-descriptorgt
lt/ted-linkgt . lt/ted-databasegt
lttopologygt ltnodegt
ltnode-idgt10.10.1.1lt/node-idgt
ltnum-linksgt2lt/num-linksgt ltlinkgt
ltadj-node-idgt10.10.2.1lt/adj-node-idgt
ltavailable-bw7gt1000lt/available-bw7gt
lt/linkgt ltlinkgt
ltadj-node-idgt10.10.3.1lt/adj-node-idgt
ltavailable-bw7gt1000lt/available-bw7gt
lt/linkgt lt/nodegt . lt/topologygt
3
TED download
Topology
1
LSP Traffic Matrix
2
lttopologygt ltnodegt
ltnode-idgt10.10.1.1lt/node-idgt
ltnum-linksgt2lt/num-linksgt ltlinkgt
ltadj-node-idgt10.10.2.1lt/adj-node-idgt
ltavailable-bw7gt1000lt/available-bw7gt
lt/linkgt ltlinkgt
ltadj-node-idgt10.10.3.1lt/adj-node-idgt
ltavailable-bw7gt1000lt/available-bw7gt
lt/linkgt lt/nodegt . lt/topologygt
LP formulation
PCE
4
lttopologygt ltnodegt
ltnode-idgt10.10.1.1lt/node-idgt
ltnum-linksgt2lt/num-linksgt ltlinkgt
ltadj-node-idgt10.10.2.1lt/adj-node-idgt
ltavailable-bw7gt1000lt/available-bw7gt
lt/linkgt ltlinkgt
ltadj-node-idgt10.10.3.1lt/adj-node-idgt
ltavailable-bw7gt1000lt/available-bw7gt
lt/linkgt lt/nodegt . lt/topologygt
LP elaboration
5
Router configuration
LSP strict routes
PCE functions for optimal TE solution 1, 2, 3
Download from TE Database of relevant
information, XSLT elaboration, C elaboration to
produce LP formulation 4 - PCE runs LP
formulation to identify Label Switch Path (LSP)
traffic allocation that minimizes the maximum
link bandwidth (Least-fill policy) 5 - PCE
configures LSPs on every Ingress Router (strict
routes) Results show that PCE performs fast and
achieves optimal bandwidth utilization if
compared with CSPF algorithm performed by nodes
14Cooperative application-networkQoS-Aware Fault
Tolerance
- Assumption
- Qualified applications (e.g. visualization)
requires communication QoS guarantees - QoS parameter
- minimum bandwidth
- Objective
- Maximize recovered connections and minimize
required network resources upon network link
failure - Possible approach
- Integrating QoS unaware layer (application) and
QoS capable layer (network) fault tolerance ? QoS
aware integrated fault tolerance - QoS capable layer fault tolerance
- (G)MPLS path restoration
- Software layer fault tolerance
- Service replication (server migration)
15Integrated Fault Tolerance Advantages Path
Restoration Service Replication
Primary LSP
Client
Primary Video Server
Backup LSP
another primary LSP
Backup Video Server
LSP to Backup Video Server
16Conclusions
- The problem of providing a connection oriented
service in a WAN environment to individual
qualified applications (e.g. grid) have been
faced from an architectural point of view with
regard to - The Service Plane and service interface
- A Centralized Topology Discovery Service (TDS)
- Path Computation Element (PCE)
- Integrated resilience scheme
- But ?
- People working on grid computing are mainly
computer scientists - People working on networks are telecommunication
engineers - Not easy to create a common view on the topic.
17References
- P. Castoldi, L. Valcarenghi, "On the Advantages
of Integrating Service Migration and GMPLS Path
Restoration for Grid Network Failure Recovery",
1st International Workshop on Networks for Grid
Applications (Gridnets 2004) co-located with
Broadnets 2004, San Jose, USA, Oct. 2004. - Barbara Martini, Fabio Baroncelli, Piero
Castoldi, "A Novel Service Oriented Framework for
Automatic Switched Transport Network", 9th
IFIP/IEEE International Symposium on Integrated
Network Management, Niece (France) 15-19 May,
2005 - F. Baroncelli, B. Martini, L. Valcarenghi, P.
Castoldi, "A Service Oriented Network
Architecture suitable for Global Grid Computing",
Optical Networks Design and Modeling (ONDM
2005), Milan, Italy, February 2005. - L. Valcarenghi, L. Rossi, F. Paolucci, F. Cugini,
P. Castoldi, "Multi-Layer Bandwidth Recovery for
Multimedia Communications an Experimental
Evaluation", 1st Conference on Next Generation
Internet Networks Traffic Engineering, 18-20
April 2005, Rome, Italy - Barbara Martini, Fabio Baroncelli, Piero
Castoldi, Angelica Aprigliano, "Experimental
validation of a service oriented network
architecture applied to global Grid computing",
1st International Conference on AUtomated
Production of Cross Media Content for
Multi-channel Distribution (AXMEDIS '05), Firenze
(Italy), 30 Nov - 2 Dec. 2005. - Barbara Martini, Fabio Baroncelli, Piero
Castoldi, Americo Muchanga, Lena Wosinska, "The
Service Oriented Optical Network (SOON) Project",
Proc. of Reliability issues in Next Generation
Optical Networks (RONEXT), COST270 WG1 workshop,
colocated with ICTON 2005, July 3 - 7, 2005,
Barcelona, Spain. - Luca Valcarenghi, Piero Castoldi, "QoS-Aware
Connection Resilience for Network-Aware Grid
Computing Fault Tolerance", Proc. of Reliability
issues in Next Generation Optical Networks
(RONEXT), COST270 WG1 workshop, colocated with
ICTON 2005, July 3 - 7, 2005, Barcelona, Spain - Luca Valcarenghi, Francesco Paolucci, Luca
Foschini, Filippo Cugini, and Piero Castoldi,
"Centralized and Distributed Topology Discovery
Service Implementations", 13th Annual IEEE
Symposium on High Performance Interconnects,
Stanford University, August 17-19, 2005. - L. Valcarenghi, L. Foschini, F. Paolucci, F.
Cugini, P. Castoldi, "Topology Discovery Services
for Monitoring the Global Grid", IEEE
Communication magazine special issue on "Optical
Control Plane for Grid Networks Opportunities,
Challenges and the Vision", March 2006, pp.
110-117. - F. Baroncelli, B. Martini, L. Valcarenghi and P.
Castoldi "Service Composition in Automatically
Switched Transport Networks", IEEE International
Conference on Networking and Services (ICNS'06)
July 16-18, 2006, Silicon Valley, USA - L.Valcarenghi and P. Castoldi, "Topology-Aware
Replica Placement Hauristics in the Global Grid
Proc. of 2 Reliability issues in Next Generation
Optical Networks (RONEXT) Workshop, colocated
with ICTON '06, Nottingham, U.K., 18-22 June
2006
18 E-mail castoldi_at_sssup.it
SantAnna School CNIT, CNR research area, Via
Moruzzi 1, 56124 Pisa, Italy