Title: Data Mining Engineering
1Grid Computing
Konzepte, Techniken und Anwendungen
Peter Brezany Institut für Softwarewissenschaft Un
iversität Wien Tel. 01/4277 38825 E-mail
brezany_at_par.univie.ac.at Sprechstunde Dienstag
13.00-14.00
2Lernziele
- Motivation für Grids
- Grundbegriffe
- Bestehende Architekturen
- Neue Entwicklungen
- Web Services
- Integration von Web Services und Grid Services
- OGSA (Open Grid Service Architecture)
3Einführung
- Grid Computing ein relativ neues
Forschungsgebiet - Früher nur in wissenschaftlichen Kreisen bekannt
und big-science Anwendungen. - Jetzt näher zum every-day life (e-Business,
medicine, usw.) - Große Firmen (IBM, Sun, Microsoft) machen jetzt
auch mit. - Bei Grid Computing geht es um das gemeinsame
Verwenden von verschiedenen Arten von Resourcen,
eine moderne Sharing-Community
4Einleitende Visionen
- Beispiel Wasserversorgung
- Früher Hausquelle / Brunnen
- Heute Wassersammelstelle ? Leitungen ?
Wasserhahn - Beispiel Energieversorgung
- Früher Generator
- Heute Großer Generator? Stromleitungen ?
Steckdose - Power Grid ? Computational Grid / Grid Computing
- (z.B. NASA Information Power Grid
(www.ipg.nasa.gov)) - Logische Konsequenz Grid Computing
Rechenleistung (und vieles mehr) aus der
Steckdose - Viele Rechner zu einem Großen Netz verbunden
Vorteile - Komplett neue Möglichkeiten der Zusammenarbeit
für Unternehmen - Hardwareersparnis (mieten) (vgl. Generator /
Quelle) - Teuere Software mieten statt kaufen
- Selbst z.B. Rechenleistung anbieten
5Grid Computing Vision
- "The Internet is about
- getting computers to talk together
- Grid computing is about
- getting computers to work together."
- Tom Hawk, IBM's general manager of Grid computing
6Grid Computing Vision (2)
- Tim Berners-Lee replies to the question What did
you have in mind when you first developed the
Web? by saying - "The dream behind the Web is of a common
information space in which we communicate by
sharing information. - If applied to the Grid computing this sentence
can be rephrased to - The dream behind the Grid computing is a common
resource space in which we can work together
using shared recources.
7Web im Vergleich zum Grid
8Web im Vergleich zum Grid (2)
9Web im Vergleich zum Grid (3)
Source Norman Paton
10Grid Computing - Definition
- Definition nach www.globus.org1
-
- The Grid ist eine Infrastruktur, die eine
integrierte, gemeinschaftliche - Verwendung von Ressourcen erlaubt. Als Ressourcen
kommen nicht - Rechenleistung und Speicherplatz in Frage,
sondern ganze (und beliebige) - Geräte können im Grid gemeinschaftlich
verwendet werden, also zum - Beispiel Hochleistungscomputer, Netzwerke,
Datenbanken, Teleskope, - Mikroskope bis zu Elektronenbeschleunigern. Ziel
des Grid ist es, dass - man auf Geräte zugreifen kann, als ob man sie
besitzen würde, ohne sie - kaufen zu müssen.
- Charakteristika von Grid-Anwendungen
- - Große Datenmengen
- - Großer Rechenaufwand
- Sicheres Resourcen-Sharing zwischen unabhängigen
Organisationen - --------------------------------------------------
--------- - 1Praktisch alle wichtigsten Grid Projekte
bauen auf middleware Globus (1998 -Globus 1, 2001
- Globus 2, 2003 - Globus 3, 2005 Globus 4)
11Abstrakte Grid Architektur
gni Grid node (Knoten) Grid node Computing
Element (CE), Storage Element
(SE), telescope, microscope, etc.
ni compute node
ioi I/O node
n1
n2
io1
gn1
gn6
IN
gn2
WN
n3
n4
io2
gn5
gn3
gn4
Example gn2 combined CE and SE
(research of Prof. Schikuta, Univ. Vienna)
12Grid Problem
- Das Grid-Problem
- Koordinierte gemeinsame Resourcennutzung
(-sharing) - und gemeinsames Lösen von Problemen in
dynamischen, - multiinstitutionalen Organisationen.
- Sharing bedeutet hier
- Direkter Zugang zu Computern, Software,
Daten, - Geräten, etc.
- Sharing Regeln zwischen Anbietern und
Benutzern - definieren wem was wie wann zur Verfügung
steht. - Anzahl von Individuen und/oder Institutionen
- Sharing Regeln
VO (Virtual Org.)
13Grid Voraussetzungen
- Gemeinsame Verwendung von geographisch getrennten
Resourcen - Keine gemeinsame Zentrale
- Keine zentrale Kontrolle
- Niemand ist allwissend
- Keine Vertrauensbeziehungen untereinander
- Komplexe Anforderungen
- Programm X auf den Rechnern von Y ausführen
- (Vertrag P) wobei die Daten von Z stammen
- (Vertrag Q). Y und Z müssen keine Beziehung
- haben. (Delegation)
14Virtuelle Organisation (VO)
- Zweck, Ziel, Größe, Dauer, Struktur, etc.
variieren - Anforderungen von VOs
- Hochflexible Sharing-Beziehungen (C/S bis
P2P) - Ausgereifte und präzise Kontrolle
- Feine und grobe Zugangskontrolle
- Abrechnung
- Zeitplanung
15VO Beispiel
- Autohersteller beauftragt
- Application service provider (ASP)
Finanzielle Vorhersage - Storage service provider (SSP)
(Historische) Daten - Cycle providers Rechenleistung
für die Analyse - Szenarienanalysen für neue Fabrik (bzw.
Standort) durchzuführen.
16VO Beispiel (2)
Figure An actual organization can participate in
one or more VOs by sharing some or all of
its resources. We show three actual organizations
(the ovals), and two VOs P, which links
participants in an aerospace design consortium,
and Q, which links colleagues who have agreed to
share spare computing cycles, for example to run
ray tracing computations. The organization on the
left participates in P, the one to the right
participates in Q, and the third is a member of
both P and Q. The policies governing access
to resources (summarized in quotes) vary
according to the actual organizations, resources,
and VOs involved.
17Definitionen Protokoll, Dienst, API, SDK
- Protokoll
- Menge von Regeln für Endpunkte von
- Telekommunikationssystemen zum
Informationsaustausch - Standardprotokoll gewährleistet
Interoperabilität - Dienst (Service)
- Netzwerkfähige Instanz mit einer bestimmten
Fähigkeit - Definiert durch Protokoll und Reaktion auf
eine Protokoll-Nachricht - (service protocol behavior)
- Application Program Interface (API)
- Standardinterface für Zugriff auf
Funktionalität (ein Protokoll kann mehrere
APIs haben) - Ermöglicht Portabilität
- Software Develpment Kit (SDK)
- Implementiert ein API (zB. Globus Toolkit)
18Grid Protokoll Architektur vs. IP Architektur
Application
19Grid Architektur (1)
- Fabric
- (Computer / Dateisysteme / Archive /
Netzwerke / Sensoren / ...) - (open, read, write, close, ...)
- Kaum Beschränkungen am low-level solang
Schnittstellen erfüllt - Connectivity (neck)
- Kommunikation (IP, DNS, Routing, ...)
- Sicherheit (Grid Security Infrastructure,
GSI) - - Einheitliche Authentifikation
- - Single sign-on
- - Delegation
- - Public Key Technologie
20Grid Architektur (2)
- Resource Layer (neck)
- Grid Resource Allocation Management (GRAM)
- Zuweisung, Reservierung, Monitoring,
Steuerung von - Rechenresourcen
- GridFTP Protokoll (FTP Erweiterungen)
- Hochgeschwindigkeitsdatenzugriff und
Transport - Grid Resource Information Service (GRIS)
- Zugang zu Struktur- und Statusinformationen
- Netzwerkreservierung, Beobachtung und
Steuerung - Baut auf Connectivity Layer (GSI IP) auf.
21Grid Architektur (3)
- Collective Layer
- Globale Protokolle und Dienste
- Baut auf dem neck auf ist komplett
unabhängig - von den Resourcen
- Verzeichnisdienste
- Monitoring- und Diagnosedienste
- Datenreplikationsdienste
- etc.
- Applications
- Verwenden Dienste beliebiger Layer
22Sanduhr-Modell Internet heute
23Sanduhr-Modell Globus Grid
24Data Grid
- Ursprüngliche Motivation Wissenschaftliche
Anwendungen - sind sehr daten intensiv und enorm große Menge
von - Forschern aus der ganzen Welt will einen
schnellen - Zugriff auf diese Daten haben.
- Perspektive Anwendungen von Data Grids Medical
Grids, - E-Business und E-Commerce Grids.
25Modell Architecture für Data Grids
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Selected Replica
Replica Selection
GridFTP commands
Performance Information Predictions
NWS
Disk Cache
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
26Storage Model
- 2 different kinds of files
- Master files (owned by their creators)
- Replica files. There may be many replicas of a
master file. - Replicas are owned by, managed by, and may be
deleted by, - the Grid.
- The notion of replicas is new, and critical in a
Grid - environment. Example
- Before a DataGrid job can run at site A, data at
site B may need to be copied to site A. - This data may then be used by subsequent jobs at
site A, or may be needed by jobs at site C, which
has a better network connection to site A than
site B. For this reason, the data should be kept
at site A as long as possible. - The ReplicaManager keeps track of all replica
data so that the replica selection service can
select the optimal replica to use for a given
job, or to request the creation of a new replica.
27Data Replication Across Grid Nodes
By providing a copy (replica) of a data item
close to a client application, access times can
be reduced. Replication can also help in load
balancing and can improve reliability.
X/data/file1.DB
file1.DB
Y/data/file1.DB
(logical name)
Z/data/file1.DB
X, Y, Z Grid sites
Replica Catalog
(physical names)
28SQLDatabaseService
This servis allows to efficiently store, retrieve
and query very large amounts of meta data held in
any type of local or remote RDBMS. The database
can be used for the implementation of
catalogs. Spitfire project A set of grid
enabled database middleware services access to
relational databases.
29State of the Art in 2002
- Die bisher diskutierten Konzepte implementiert
von mehreren SDK, z.B. Globus (U.S.), Unicorn (EU
Projekt), European Data Grid (EU Projekt), usw. - Nur in wissenschaftlichen Kreisen gut bekannt und
Fokus auf big-science Anwendungen. - Fast keine Anbindung von Datenbanktechnologien,
Anwendung von flat files. - Notwendigkeit näher zum every-day life
(e-Business, medicine, usw.) zu sein. - Ignorierung von Web Entwicklung Web Service
Technologien - Große Firmen (IBM, Sun, Microsoft, usw.) beginen
jetzt auch mitzumachen. - Richtung Web Services
30Integration von Grid und Web ServicesOpen Grid
Service Architecture - OGSA
- Integration von Grid- und Webtechnologien -
zuerst nur eine Initiative vom Globus-Projekt und
IBM jetzt eine Aufgabe des Global Grid Forums. - Erweiterung von Web Service Standards wie SOAP
und WDSL um die offenen Spezifikationen von
Globus. - OGSA- ein Set von Spezifikationen und Standards,
das die Vorteile von Grid-Computing mit denen von
Webservices kombinieren soll. Damit will man eine
Plattform schaffen, die eine gemeinsame Nutzung
von Anwendungen- und Computer-Ressourcen über das
Internet auch für den kommerziellen Bereich
interessant macht. -
- Das neue Set an OGSA-Spezifikationen erweitert
Standards wie XML, WSDL und SOAP mit
Grid-Computing-Standards, die vom
Globus-Projekt-Team entwickelt wurden.
31Web Services der Baukasten für verteilte Systeme
- Software Dienst (service) akzeptiert einen
digitalen Antrag (Abfrage, usw.) und liefert eine
digitale Antwort. - Web service Abkürzung für Web of Services
32Die Evolution von Software Services
Programme in Assembler, C, usw. Komponentfunktion
en kommunizieren in einzelnen Speicherbereich.
Entfernete Programme können kollaborieren.
33Die Evolution von Software Services (2)
2 LANS, 1 benutzt CORBA und 1 benutzt DCOM
2 verbundene LANS sie benutzen eine
CORBA/DCOM Brücke
34Die Evolution von Software Services (3)
35Die Evolution von Software Services (4)
SOAP ist ein universalles Protokoll, das alles
verbindet.
36SOAP Simple Object Access Protocol
Der neue Standard für Netzwerk-Kommunikation
zwischen software services. SOAP messages sind
über HTTP gesendete XML Dokumente.
37SOAP Simple Object Access Protocol (2)
SOAP Prozessor konvertiert XML Nachrichten in
native Aufrufe.
38SOAP Simple Object Access Protocol (3)
Der Client braucht WSDL, bevor er den Service
aufruft.
39Publizierung eines Dienstes
Beispiel Aktienkauf Anwendung des Packages
GLUE package example.soap // An interface
for buying stock public interface ITrader
/ Purchase the specific stock
_at_param quantity The number of shares to
purchase. _at_param symbol The ticker symbol of
the company. _at_throws TradeException, if the
symbol is not recognized. _at_return The cost of
the purchase. / float buy (int quantity,
String symbol ) throws TradeException
Itrader.java
40Publizierung eines Dienstes (2)
Trader.java
package example.soap public class Trader
implements Itrader public float buy (int
quantity, String symbol ) throws
TradeException if (symbol.equals(
IBM ) ) return 117.4 quantity
else if (symbol.equals( MSFT ) ) return
117.4 quantity else throw new
TradeException( symbol symbol not
recognized)
41Publizierung eines Dienstes (3)
TraderServer.java
package example.soap import -----.registry.Regis
try import -----.server.http.HTTP public class
TradeServer public static void main (
String args ) throws Exception // start a
web server on port 8003, accept messages via
/soap HTTP.startup (http//localhost8003/soap
) // publish an instance of Trader Registry.pu
blish( trader, new Trader() )
42Bindung zu einem Web-Service
Wenn ein Objekt schon als Web-Service publiziert
ist, kann sich der SOAP-Client zu ihm binden und
ihn aufrufen.
package example.soap import -----.registry.Regist
ry public class TraderClient public
static void main ( String args ) throws
Exception // the URL of the web service WSDL
file String url ( http//localhost8003/soap/t
rader.wsdl ) // read the WSDL file and bind
to its associated web service ITrader trader
(ITrader) Registry.bind( url, ITrader.class
) // invoke the web service as if it was a
local object float ibmCost trader.buy (54,
IBM ) System.out.println( IBM cost is
ibmCost )
TraderClient.java
43Der Client Proxy
Der binding Prozeß antwortet mit einem proxy, der
eine Java Schnittstelle implementiert, deren
Metode die Methoden der entferneten Stelle
wiederspiegeln.
44WSDL
- WSDL Web Service Description Language
- WSDL beschreibt, was ein Web Service machen kann,
wo er sich befindet, und wie er aufgerufen werden
kann. - Eine Anwendung kann sich theoretisch einen
optimalen Service aus mehreren Services wählen.
45UDDI
- UDDI Universal Description, Discovery , and
Integration - UDDI ermöglicht Publikation und Abfragen von
Informationen über Services. - Beispiel ACME- Kreditkontrollen
46UDDI (2)
UDDI wirkt als Heirats-Vermittler zwischen
Service- Anbietern und Konsumenten.
47UDDI (3)
-
-
- Veröffentlichen
Finden -
Binden
Service Registrierung
Service Provider
Service Verbraucher
48UDDI (4)
Öffentliche UDDI-Operatoren synchronisieren
regelmäßig ihre Inhalte.
49Open Grid Service Architecture - OGSA
- Einordnung
- radikales Refactoring von Grid alt (Globus 2)
- Integration von Technologien der Grid- und
WebService-Community - Ziele
- ermöglichen verteilter, heterogener und
dynamischer VOs - effizientes Ressource-Sharing
- Plattform- und Programmiersprachenunabhängigkeit
- basierend auf offenen Standards
- Virtualisierung
- e-business und e-science Anwendungen, auch
kommerzielle Nutzung - auf Basis moderner Technologien (Web Services,
Grid Technologien) - Players wer steht dahinter?
- Global Grid Forum (ursprünglich initiiert von
Globus, IBM), zusätzlichANL, NASA, US DOE, US
NSF, HP-Compaq, Intel, Microsoft, Sun,
50Service - Grid Service - Grid
- Ein Service ist eine netzwerkfähige Entität, die
ihre Funktionalität durch Nachrichtenaustausch
anbietet. - Ein Grid Service ist ein Web Service, dass die in
der WSDL beschriebenen OGSA-Interfaces
implementiert und damit in Verbindung stehende
Konventionen befolgt. - Ein Grid ist eine erweiterbare, dynamische Menge
von einzelnen Grid Services, die auf
unterschiedliche Art und Weise miteinander
kombiniert werden können, um den individuellen
Anforderungen von VOs entsprechen zu können.
51OGSA Architecture
Client
Definition WSDL Messages zB SOAP Transport zB
HTTP
Grid Service
Factory
WebService
Business- Logic WebService- Community
OGSA Grid- Community
Notification
serviceData
Konventionen
Hosting Environment
Hardware
52Grid and Web Services Convergence?
1991 ?
Grid
? 2004
Web
GT Globus Toolkit, OGSI Open Grid Service
Infrastructure However, despite enthusiasm for
OGSI, adoption within Web community turned out to
be problematic
53Web Service Invocation
54Web Service Invocation (2)
55The Idea of Grid Services
- Web Services are stateless and persistent.
- Grid Services are stateful and persistent or
transient. - Lifecycle Management
- Notifications
- Service Data Elements
- GSH / GSR
56Terminology
- Web Service A software component identified by a
URI RFC 2396, whose public interfaces and
bindings are defined and described using XML. Its
definition can be discovered by other software
systems. These systems may then interact with the
Web service in a manner prescribed by its
definition, using XML based messages conveyed by
Internet protocols. (Web Services Glossary,
WS-Arch W3C Working Group, Draft 14 May 2003) - Web Service Consumer An software component that
sends messages to a Web Service. - Stateful Web Service A Web Service that
maintains some state between different operation
invocations issued by the same or different Web
Service Consumers. - Grid Service(s) A general term used to refer to
all aspects of OGSI. The term Grid Service is
sometimes used to refer to a Grid Service
Description document and/or a Grid Service
Instance for a particular service. - Grid Service Description A WSDL(-like) document
that defines the interface of Grid Service
Instances. The defined interface must extend the
OGSI GridService portType. - Grid Service Instance A stateful Web service
whose interface adheres to that defined by a Grid
Service Description and whose lifetime management
properties are well defined. - Service Data Element An attribute-like construct
exposing state information through operations
defined by the GridService portType. - Grid Service Handle A URI that permanently
identifies a Grid Service Instance. - Grid Service Reference A temporal,
binding-specific endpoint that provides access to
a Grid Service Instance.
57Grid Service OGSA OGSI GT3OGSA Open
Grid Service Architecture
58Grid Service OGSA OGSI GT3 (2)
- Grid Services are defined by OGSA. The Open Grid
Services Architecture (OGSA) aims to define a new
common and standard architecture for grid-based
applications. Right at the center of this new
architecture is the concept of a Grid Service.
OGSA defines what Grid Services are, what they
should be capable of, what types of technologies
they should be based on, but doesn't give a
technical and detailed specification (which would
be needed to implement a Grid Service). - Grid Services are specified by OGSI. The Open
Grid Services Infrastructure is a formal and
technical specification of the concepts described
in OGSA, including Grid Services. - The Globus Toolkit 3 is an implementation of
OGSI. GT3 is a usable implementation of
everything that is specified in OGSI (and,
therefore, of everything that is defined in
OGSA). - Grid Services are based on Web Services. Grid
Services are an extension of Web Services. We'll
see what Web Services are in the next page, and
what Grid Services are in the page after that. - I still don't get it What is the difference
between OGSA, OGSI, and GT3? Consider the
following simple example. Suppose you want to
build a new house. The first thing you need to do
is to hire an architect to draw up all the plans,
so you can get an idea of what your house will
look like. Once you're happy with the architect's
job, it's time to hire an engineer who will make
detailed blueprints that specify construction
details (like where to put the master beams, the
power cables, the plumbing, etc.). The engineer
then passes all those blueprints to qualified
professional workers (construction workers,
electricians, plumbers, etc) who will actually
build the house. We could say that OGSA (the
definition) is the architect, OGSI (the
specification) is the engineer, and GT3 (the
implementation) is the workers.
59OGSA - GridService
60GT 3 Architecture I
- Grid Services, which we have already seen, are
the 'GT3 Core' layer. Let's take a look at the
rest of the layers from the bottom up - GT3 Security Services Security is an important
factor in grid-based applications. GT3 Security
Services can help us restrict access to our Grid
Services, so only authorized clients can use
them. For example, we said that only our New
York, Los Angeles, and Seattle offices could
access MathService. We want to make sure only
those offices have access to MathService and, of
course, we want all the data exchanged between
MathService and clients to be encrypted so we can
keep malicious users from intercepting our data.
Besides the usual security measures (putting the
web server behind a firewall, etc.) GT3 gives us
one more layer of security with technologies such
as SSL and X.509 digital certificates. - GT3 Base Services This layer actually includes a
whole lot of interesting services - Managed Job Service Suppose some particular
operation in MathService might take hours or even
days to be done. Of course, we don't want to
simply stand in front of a computer waiting for
the result to arrive (specially if, after 8 hours
of waiting, all we get might simply be an error
message!) We need to be able to check on the
progress of the operation periodically, and have
some control over it (pause it, stop it, etc.)
This is usually called job management (in this
case, the term 'job' is used instead of
'operation'), The Managed Job Service allows us
to treat our invocations like jobs, and manage
them accordingly.
61GT 3 Architecture II
- Index Service Remember from a short introduction
to Web Services that we usually know what type of
Web Service we need, but we have no idea of where
they are. This also happens with Grid Services
we might know we need a Grid Service which meets
certain requirements, but we have no idea of what
its location is. While this was solved in Web
Services with UDDI, GT3 has its own Index
Service. For example, we could have several dozen
MathServices all around the country, each with
different characteristics (some might be better
suited for statistical analysis, while others
might me better for performing simulations).
Index Service will allow is to query what
MathService meets our particular requirements. - Reliable File Transfer (RFT) Service This
service allows us to perform large file transfers
between the client and the Grid Service. For
example, suppose we have an operation in
MathService which has to crunch several gigabytes
of raw data (for a statistical analysis, for
example). Of course, we're not going to send all
that information as parameters. We'll be able to
send it as a file. Furthermore, RFT guarantees
the transfer will be reliable (hence its name).
For example, if a file transfer is interrupted
(due to a netwok failure, for example), RFT
allows us to restart the file transfer from the
moment it broke down, instead of starting all
over again. - GT3 Data Services This layer includes Replica
Management, which is very useful in applications
that have to deal with very big sets of data.
When working with large amount of data, we're
usually not interested in downloading the whole
thing, we just want to work with a small part of
all that data. Replica Management keeps track of
those subsets of data we will be working with. - Other Grid Services Other non-GT3 services can
run on top of the GT3 Architecture.
62Service Data
63Service Data
64Service Data
65Notification Interfaces
66Motivation for Notifications
67Pull-Notifications
68Notifications in GT3
69ChallengeAdvanced Grid ApplicationsExample
Knowledge Discoveryin Grid Databases
70Motivation
Business
Medicine
Scientific experiments
Data and data exploration
cloud
Simulations
Earth observations
71The Knowledge Discovery Process
Knowledge
OLAP Queries
OLAP
Online Analytical Mining
Evaluation and Presentation
Data Mining
Selection and Transformation
Data Warehouse
Cleaning and Integration
72The GridMiner Project in Vienna
- GridMiner A knowledge discovery Grid
infrastructure (http//www.gridminer.org/) - OGSA-based architecture
- Workflow management
- Grid-aware data preprocessing and data mining
services - Data mediation service
- OLAP service
- GUI
- Implementation on top of Globus Toolkit 3.0
- Application Management of patients with
traumatic brain injuries
73GridMiner Architecture
GridMiner Workflow
GM DSCE Dynamic Service Control
GridMiner Core
GMPPS Preprocessing
GMDMS Data Mining
GMPRS Presentation
GMDIS Integration
GMOMS OLAM
GridMiner Base
GMMS Mediation
GMIS Information
GMRB Resource Broker
GMCMS OLAP / Cubes
Grid Core
Grid Core Services
Security
File and Database Access Service
Replica Management
Fabric
Grid Resources
Data Sources
74Collaboration of GM-Services
Example 3
75The Control Layer
- Control Layer
- Provision of the whole knowledge discovery
process to a client - Knowledge discovery process in GridMiner
- services to execute not known
- order of service execution
- sequential and concurrent execution
- Approaches investigated
- Data Mining Query Language
- Standard Workflow Orchestration Approach
(BPEL4WS, WSFL, GSFL, ) - Our approach Dynamic Service Control
76The Control LayerStandard Service Orchestration
Approach (BPEL4WS)
77Workflow Models
Composition by Service Publisher
Composition by Service Consumer
78The Control Layer - ApproachesDynamic Service
Control
Client
- Dynamic Service Control Language (DSCL)
- based on XML
- easy to use
- supports OGSA Grid Services
- specially design to support knowledge discovery
processes - Dynamic Service Control Engine (DSCE)
- processes workflow according to DSCL
subscribe
Notification sink
Start, stop, resume
(re)connect
query results
DSCL
notify
DSCE
Service A
Service D
Service B
Service C
OGSA Grid Services
79Dynamic Service Control Language (DSCL)
- Features
- Control flow
- concurrent execution of activities
- sequential execution of activities
- Activities
- creation of new Grid Service Instances
- invoking operations on Grid Service Instances
- querying information of Grid Service Instances
- destroying of Grid Service Instances
80DSCL - Structure
dscl
variables
composition
qreate Service
invoke
query SDE
qreate Service
invoke
query SDE
qreate Service
invoke
81DSCL - Variables
- Initializing by simple type value
- Initializing by arrays
xsitypexsdint xmlnsxsihttp//www.w3.org/
2001/XMLSchema-instance xmlnsxsdhttp//www.w
3.org/2001/XMLSchema xmlnsns1http//ogsa.glo
bus.org4711
xsitypesoapencArray soapencarrayTypexsdin
t2 xmlnsxsihttp//www.w3.org/2001/XMLSchem
a-instance xmlnssoapenchttp//schemas.xmlsoa
p.org/soap/encoding/ xmlnsxsdhttp//www.w3.o
rg/2001/XMLSchema xmlnsns1http//ogsa.globus
.org 23 ncitem-112 le
82DSCL Control Flow
act2.1
act1
act2.2
dscl
variables
composition
sequence
createService activityIDact1
parallel
invoke activityIDact2.1
invoke activityIDact2.2
sequence
83Grid Database Access
84Grid Database Access With OGSA-DAI(DAI Data
Access and Integraion)
OGSA-DAI provides Grid Database Service (GDS)
GDS gets a query via Perform Document GDS Engine
process specified activities GDS returns results
85Grid Database Access With OGSA-DAI
86Grid Data Mediation Service - Architecture
87GDMS Example Scenario
- Heterogeneities
- Name in A is Alexander Wöhrer
- Name in C has to be combined
- Distribution
- 3 data sources
88Grid and Web Services Convergence Yes!
Web Services Resource Framework - WSRF
Grid
Web
The definition of WSRF means that Grid and Web
communities can move forward on a common base
First publications on WSRF January 2004
89WSRF
Service Requestor
Grid Service A
OGSI Grid Service
Service Requestor
Web Services
Resource A
Resource B
Resource C
Web Service and WS-Resource Combination in WS-RSFM
90Literatur
- Grid Computing Making the Global Infrastructure
a Reality. - By F. Berman, G. Fox, T. Hey (Eds.), Wiley
2003 - www.globus.org
- www.gridminer.org (unser Forschungsprojekt)
- Viele Dokumente im Web