Data Mining Engineering - PowerPoint PPT Presentation

1 / 90
About This Presentation
Title:

Data Mining Engineering

Description:

Autohersteller beauftragt: Application service provider (ASP) Finanzielle Vorhersage ... ITrader trader = (ITrader) Registry.bind( url, ITrader.class ) ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 91
Provided by: peterb4
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Engineering


1
Grid Computing
Konzepte, Techniken und Anwendungen
Peter Brezany Institut für Softwarewissenschaft Un
iversität Wien Tel. 01/4277 38825 E-mail
brezany_at_par.univie.ac.at Sprechstunde Dienstag
13.00-14.00
2
Lernziele
  • Motivation für Grids
  • Grundbegriffe
  • Bestehende Architekturen
  • Neue Entwicklungen
  • Web Services
  • Integration von Web Services und Grid Services
  • OGSA (Open Grid Service Architecture)

3
Einführung
  • Grid Computing ein relativ neues
    Forschungsgebiet
  • Früher nur in wissenschaftlichen Kreisen bekannt
    und big-science Anwendungen.
  • Jetzt näher zum every-day life (e-Business,
    medicine, usw.)
  • Große Firmen (IBM, Sun, Microsoft) machen jetzt
    auch mit.
  • Bei Grid Computing geht es um das gemeinsame
    Verwenden von verschiedenen Arten von Resourcen,
    eine moderne Sharing-Community

4
Einleitende Visionen
  • Beispiel Wasserversorgung
  • Früher Hausquelle / Brunnen
  • Heute Wassersammelstelle ? Leitungen ?
    Wasserhahn
  • Beispiel Energieversorgung
  • Früher Generator
  • Heute Großer Generator? Stromleitungen ?
    Steckdose
  • Power Grid ? Computational Grid / Grid Computing
  • (z.B. NASA Information Power Grid
    (www.ipg.nasa.gov))
  • Logische Konsequenz Grid Computing
    Rechenleistung (und vieles mehr) aus der
    Steckdose
  • Viele Rechner zu einem Großen Netz verbunden
    Vorteile
  • Komplett neue Möglichkeiten der Zusammenarbeit
    für Unternehmen
  • Hardwareersparnis (mieten) (vgl. Generator /
    Quelle)
  • Teuere Software mieten statt kaufen
  • Selbst z.B. Rechenleistung anbieten

5
Grid Computing Vision
  • "The Internet is about
  • getting computers to talk together
  • Grid computing is about
  • getting computers to work together."
  • Tom Hawk, IBM's general manager of Grid computing

6
Grid Computing Vision (2)
  • Tim Berners-Lee replies to the question What did
    you have in mind when you first developed the
    Web? by saying
  • "The dream behind the Web is of a common
    information space in which we communicate by
    sharing information.
  • If applied to the Grid computing this sentence
    can be rephrased to
  • The dream behind the Grid computing is a common
    resource space in which we can work together
    using shared recources.

7
Web im Vergleich zum Grid
8
Web im Vergleich zum Grid (2)
9
Web im Vergleich zum Grid (3)
Source Norman Paton
10
Grid Computing - Definition
  • Definition nach www.globus.org1
  • The Grid ist eine Infrastruktur, die eine
    integrierte, gemeinschaftliche
  • Verwendung von Ressourcen erlaubt. Als Ressourcen
    kommen nicht
  • Rechenleistung und Speicherplatz in Frage,
    sondern ganze (und beliebige)
  • Geräte können im Grid gemeinschaftlich
    verwendet werden, also zum
  • Beispiel Hochleistungscomputer, Netzwerke,
    Datenbanken, Teleskope,
  • Mikroskope bis zu Elektronenbeschleunigern. Ziel
    des Grid ist es, dass
  • man auf Geräte zugreifen kann, als ob man sie
    besitzen würde, ohne sie
  • kaufen zu müssen.
  • Charakteristika von Grid-Anwendungen
  • - Große Datenmengen
  • - Großer Rechenaufwand
  • Sicheres Resourcen-Sharing zwischen unabhängigen
    Organisationen
  • --------------------------------------------------
    ---------
  • 1Praktisch alle wichtigsten Grid Projekte
    bauen auf middleware Globus (1998 -Globus 1, 2001
    - Globus 2, 2003 - Globus 3, 2005 Globus 4)

11
Abstrakte Grid Architektur
gni Grid node (Knoten) Grid node Computing
Element (CE), Storage Element
(SE), telescope, microscope, etc.
ni compute node
ioi I/O node
n1
n2
io1
gn1
gn6
IN
gn2
WN
n3
n4
io2
gn5
gn3
gn4
Example gn2 combined CE and SE
(research of Prof. Schikuta, Univ. Vienna)
12
Grid Problem
  • Das Grid-Problem
  • Koordinierte gemeinsame Resourcennutzung
    (-sharing)
  • und gemeinsames Lösen von Problemen in
    dynamischen,
  • multiinstitutionalen Organisationen.
  • Sharing bedeutet hier
  • Direkter Zugang zu Computern, Software,
    Daten,
  • Geräten, etc.
  • Sharing Regeln zwischen Anbietern und
    Benutzern
  • definieren wem was wie wann zur Verfügung
    steht.
  • Anzahl von Individuen und/oder Institutionen
  • Sharing Regeln

VO (Virtual Org.)
13
Grid Voraussetzungen
  • Gemeinsame Verwendung von geographisch getrennten
    Resourcen
  • Keine gemeinsame Zentrale
  • Keine zentrale Kontrolle
  • Niemand ist allwissend
  • Keine Vertrauensbeziehungen untereinander
  • Komplexe Anforderungen
  • Programm X auf den Rechnern von Y ausführen
  • (Vertrag P) wobei die Daten von Z stammen
  • (Vertrag Q). Y und Z müssen keine Beziehung
  • haben. (Delegation)

14
Virtuelle Organisation (VO)
  • Zweck, Ziel, Größe, Dauer, Struktur, etc.
    variieren
  • Anforderungen von VOs
  • Hochflexible Sharing-Beziehungen (C/S bis
    P2P)
  • Ausgereifte und präzise Kontrolle
  • Feine und grobe Zugangskontrolle
  • Abrechnung
  • Zeitplanung

15
VO Beispiel
  • Autohersteller beauftragt
  • Application service provider (ASP)
    Finanzielle Vorhersage
  • Storage service provider (SSP)
    (Historische) Daten
  • Cycle providers Rechenleistung
    für die Analyse
  • Szenarienanalysen für neue Fabrik (bzw.
    Standort) durchzuführen.

16
VO Beispiel (2)
Figure An actual organization can participate in
one or more VOs by sharing some or all of
its resources. We show three actual organizations
(the ovals), and two VOs P, which links
participants in an aerospace design consortium,
and Q, which links colleagues who have agreed to
share spare computing cycles, for example to run
ray tracing computations. The organization on the
left participates in P, the one to the right
participates in Q, and the third is a member of
both P and Q. The policies governing access
to resources (summarized in quotes) vary
according to the actual organizations, resources,
and VOs involved.
17
Definitionen Protokoll, Dienst, API, SDK
  • Protokoll
  • Menge von Regeln für Endpunkte von
  • Telekommunikationssystemen zum
    Informationsaustausch
  • Standardprotokoll gewährleistet
    Interoperabilität
  • Dienst (Service)
  • Netzwerkfähige Instanz mit einer bestimmten
    Fähigkeit
  • Definiert durch Protokoll und Reaktion auf
    eine Protokoll-Nachricht
  • (service protocol behavior)
  • Application Program Interface (API)
  • Standardinterface für Zugriff auf
    Funktionalität (ein Protokoll kann mehrere
    APIs haben)
  • Ermöglicht Portabilität
  • Software Develpment Kit (SDK)
  • Implementiert ein API (zB. Globus Toolkit)

18
Grid Protokoll Architektur vs. IP Architektur
Application
19
Grid Architektur (1)
  • Fabric
  • (Computer / Dateisysteme / Archive /
    Netzwerke / Sensoren / ...)
  • (open, read, write, close, ...)
  • Kaum Beschränkungen am low-level solang
    Schnittstellen erfüllt
  • Connectivity (neck)
  • Kommunikation (IP, DNS, Routing, ...)
  • Sicherheit (Grid Security Infrastructure,
    GSI)
  • - Einheitliche Authentifikation
  • - Single sign-on
  • - Delegation
  • - Public Key Technologie

20
Grid Architektur (2)
  • Resource Layer (neck)
  • Grid Resource Allocation Management (GRAM)
  • Zuweisung, Reservierung, Monitoring,
    Steuerung von
  • Rechenresourcen
  • GridFTP Protokoll (FTP Erweiterungen)
  • Hochgeschwindigkeitsdatenzugriff und
    Transport
  • Grid Resource Information Service (GRIS)
  • Zugang zu Struktur- und Statusinformationen
  • Netzwerkreservierung, Beobachtung und
    Steuerung
  • Baut auf Connectivity Layer (GSI IP) auf.

21
Grid Architektur (3)
  • Collective Layer
  • Globale Protokolle und Dienste
  • Baut auf dem neck auf ist komplett
    unabhängig
  • von den Resourcen
  • Verzeichnisdienste
  • Monitoring- und Diagnosedienste
  • Datenreplikationsdienste
  • etc.
  • Applications
  • Verwenden Dienste beliebiger Layer

22
Sanduhr-Modell Internet heute
23
Sanduhr-Modell Globus Grid
24
Data Grid
  • Ursprüngliche Motivation Wissenschaftliche
    Anwendungen
  • sind sehr daten intensiv und enorm große Menge
    von
  • Forschern aus der ganzen Welt will einen
    schnellen
  • Zugriff auf diese Daten haben.
  • Perspektive Anwendungen von Data Grids Medical
    Grids,
  • E-Business und E-Commerce Grids.

25
Modell Architecture für Data Grids
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Selected Replica
Replica Selection
GridFTP commands
Performance Information Predictions
NWS
Disk Cache
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
26
Storage Model
  • 2 different kinds of files
  • Master files (owned by their creators)
  • Replica files. There may be many replicas of a
    master file.
  • Replicas are owned by, managed by, and may be
    deleted by,
  • the Grid.
  • The notion of replicas is new, and critical in a
    Grid
  • environment. Example
  • Before a DataGrid job can run at site A, data at
    site B may need to be copied to site A.
  • This data may then be used by subsequent jobs at
    site A, or may be needed by jobs at site C, which
    has a better network connection to site A than
    site B. For this reason, the data should be kept
    at site A as long as possible.
  • The ReplicaManager keeps track of all replica
    data so that the replica selection service can
    select the optimal replica to use for a given
    job, or to request the creation of a new replica.

27
Data Replication Across Grid Nodes
By providing a copy (replica) of a data item
close to a client application, access times can
be reduced. Replication can also help in load
balancing and can improve reliability.
X/data/file1.DB
file1.DB
Y/data/file1.DB
(logical name)
Z/data/file1.DB
X, Y, Z Grid sites
Replica Catalog
(physical names)
28
SQLDatabaseService
This servis allows to efficiently store, retrieve
and query very large amounts of meta data held in
any type of local or remote RDBMS. The database
can be used for the implementation of
catalogs. Spitfire project A set of grid
enabled database middleware services access to
relational databases.
29
State of the Art in 2002
  • Die bisher diskutierten Konzepte implementiert
    von mehreren SDK, z.B. Globus (U.S.), Unicorn (EU
    Projekt), European Data Grid (EU Projekt), usw.
  • Nur in wissenschaftlichen Kreisen gut bekannt und
    Fokus auf big-science Anwendungen.
  • Fast keine Anbindung von Datenbanktechnologien,
    Anwendung von flat files.
  • Notwendigkeit näher zum every-day life
    (e-Business, medicine, usw.) zu sein.
  • Ignorierung von Web Entwicklung Web Service
    Technologien
  • Große Firmen (IBM, Sun, Microsoft, usw.) beginen
    jetzt auch mitzumachen.
  • Richtung Web Services

30
Integration von Grid und Web ServicesOpen Grid
Service Architecture - OGSA
  • Integration von Grid- und Webtechnologien -
    zuerst nur eine Initiative vom Globus-Projekt und
    IBM jetzt eine Aufgabe des Global Grid Forums.
  • Erweiterung von Web Service Standards wie SOAP
    und WDSL um die offenen Spezifikationen von
    Globus.
  • OGSA- ein Set von Spezifikationen und Standards,
    das die Vorteile von Grid-Computing mit denen von
    Webservices kombinieren soll. Damit will man eine
    Plattform schaffen, die eine gemeinsame Nutzung
    von Anwendungen- und Computer-Ressourcen über das
    Internet auch für den kommerziellen Bereich
    interessant macht.
  • Das neue Set an OGSA-Spezifikationen erweitert
    Standards wie XML, WSDL und SOAP mit
    Grid-Computing-Standards, die vom
    Globus-Projekt-Team entwickelt wurden.

31
Web Services der Baukasten für verteilte Systeme
  • Software Dienst (service) akzeptiert einen
    digitalen Antrag (Abfrage, usw.) und liefert eine
    digitale Antwort.
  • Web service Abkürzung für Web of Services

32
Die Evolution von Software Services
Programme in Assembler, C, usw. Komponentfunktion
en kommunizieren in einzelnen Speicherbereich.
Entfernete Programme können kollaborieren.
33
Die Evolution von Software Services (2)
2 LANS, 1 benutzt CORBA und 1 benutzt DCOM
2 verbundene LANS sie benutzen eine
CORBA/DCOM Brücke
34
Die Evolution von Software Services (3)
35
Die Evolution von Software Services (4)
SOAP ist ein universalles Protokoll, das alles
verbindet.
36
SOAP Simple Object Access Protocol
Der neue Standard für Netzwerk-Kommunikation
zwischen software services. SOAP messages sind
über HTTP gesendete XML Dokumente.
37
SOAP Simple Object Access Protocol (2)
SOAP Prozessor konvertiert XML Nachrichten in
native Aufrufe.
38
SOAP Simple Object Access Protocol (3)
Der Client braucht WSDL, bevor er den Service
aufruft.
39
Publizierung eines Dienstes
Beispiel Aktienkauf Anwendung des Packages
GLUE package example.soap // An interface
for buying stock public interface ITrader
/ Purchase the specific stock
_at_param quantity The number of shares to
purchase. _at_param symbol The ticker symbol of
the company. _at_throws TradeException, if the
symbol is not recognized. _at_return The cost of
the purchase. / float buy (int quantity,
String symbol ) throws TradeException
Itrader.java
40
Publizierung eines Dienstes (2)
Trader.java
package example.soap public class Trader
implements Itrader public float buy (int
quantity, String symbol ) throws
TradeException if (symbol.equals(
IBM ) ) return 117.4 quantity
else if (symbol.equals( MSFT ) ) return
117.4 quantity else throw new
TradeException( symbol symbol not
recognized)
41
Publizierung eines Dienstes (3)
TraderServer.java
package example.soap import -----.registry.Regis
try import -----.server.http.HTTP public class
TradeServer public static void main (
String args ) throws Exception // start a
web server on port 8003, accept messages via
/soap HTTP.startup (http//localhost8003/soap
) // publish an instance of Trader Registry.pu
blish( trader, new Trader() )
42
Bindung zu einem Web-Service
Wenn ein Objekt schon als Web-Service publiziert
ist, kann sich der SOAP-Client zu ihm binden und
ihn aufrufen.
package example.soap import -----.registry.Regist
ry public class TraderClient public
static void main ( String args ) throws
Exception // the URL of the web service WSDL
file String url ( http//localhost8003/soap/t
rader.wsdl ) // read the WSDL file and bind
to its associated web service ITrader trader
(ITrader) Registry.bind( url, ITrader.class
) // invoke the web service as if it was a
local object float ibmCost trader.buy (54,
IBM ) System.out.println( IBM cost is
ibmCost )
TraderClient.java
43
Der Client Proxy
Der binding Prozeß antwortet mit einem proxy, der
eine Java Schnittstelle implementiert, deren
Metode die Methoden der entferneten Stelle
wiederspiegeln.
44
WSDL
  • WSDL Web Service Description Language
  • WSDL beschreibt, was ein Web Service machen kann,
    wo er sich befindet, und wie er aufgerufen werden
    kann.
  • Eine Anwendung kann sich theoretisch einen
    optimalen Service aus mehreren Services wählen.

45
UDDI
  • UDDI Universal Description, Discovery , and
    Integration
  • UDDI ermöglicht Publikation und Abfragen von
    Informationen über Services.
  • Beispiel ACME- Kreditkontrollen

46
UDDI (2)
UDDI wirkt als Heirats-Vermittler zwischen
Service- Anbietern und Konsumenten.
47
UDDI (3)

  • Veröffentlichen
    Finden

  • Binden

Service Registrierung
Service Provider
Service Verbraucher
48
UDDI (4)
Öffentliche UDDI-Operatoren synchronisieren
regelmäßig ihre Inhalte.
49
Open Grid Service Architecture - OGSA
  • Einordnung
  • radikales Refactoring von Grid alt (Globus 2)
  • Integration von Technologien der Grid- und
    WebService-Community
  • Ziele
  • ermöglichen verteilter, heterogener und
    dynamischer VOs
  • effizientes Ressource-Sharing
  • Plattform- und Programmiersprachenunabhängigkeit
  • basierend auf offenen Standards
  • Virtualisierung
  • e-business und e-science Anwendungen, auch
    kommerzielle Nutzung
  • auf Basis moderner Technologien (Web Services,
    Grid Technologien)
  • Players wer steht dahinter?
  • Global Grid Forum (ursprünglich initiiert von
    Globus, IBM), zusätzlichANL, NASA, US DOE, US
    NSF, HP-Compaq, Intel, Microsoft, Sun,

50
Service - Grid Service - Grid
  • Ein Service ist eine netzwerkfähige Entität, die
    ihre Funktionalität durch Nachrichtenaustausch
    anbietet.
  • Ein Grid Service ist ein Web Service, dass die in
    der WSDL beschriebenen OGSA-Interfaces
    implementiert und damit in Verbindung stehende
    Konventionen befolgt.
  • Ein Grid ist eine erweiterbare, dynamische Menge
    von einzelnen Grid Services, die auf
    unterschiedliche Art und Weise miteinander
    kombiniert werden können, um den individuellen
    Anforderungen von VOs entsprechen zu können.

51
OGSA Architecture
Client
Definition WSDL Messages zB SOAP Transport zB
HTTP
Grid Service
Factory
WebService
Business- Logic WebService- Community
OGSA Grid- Community
Notification
serviceData
Konventionen
Hosting Environment
Hardware
52
Grid and Web Services Convergence?
1991 ?
Grid
? 2004
Web
GT Globus Toolkit, OGSI Open Grid Service
Infrastructure However, despite enthusiasm for
OGSI, adoption within Web community turned out to
be problematic
53
Web Service Invocation
54
Web Service Invocation (2)
55
The Idea of Grid Services
  • Web Services are stateless and persistent.
  • Grid Services are stateful and persistent or
    transient.
  • Lifecycle Management
  • Notifications
  • Service Data Elements
  • GSH / GSR

56
Terminology
  • Web Service A software component identified by a
    URI RFC 2396, whose public interfaces and
    bindings are defined and described using XML. Its
    definition can be discovered by other software
    systems. These systems may then interact with the
    Web service in a manner prescribed by its
    definition, using XML based messages conveyed by
    Internet protocols. (Web Services Glossary,
    WS-Arch W3C Working Group, Draft 14 May 2003)
  • Web Service Consumer An software component that
    sends messages to a Web Service.
  • Stateful Web Service A Web Service that
    maintains some state between different operation
    invocations issued by the same or different Web
    Service Consumers.
  • Grid Service(s) A general term used to refer to
    all aspects of OGSI. The term Grid Service is
    sometimes used to refer to a Grid Service
    Description document and/or a Grid Service
    Instance for a particular service.
  • Grid Service Description A WSDL(-like) document
    that defines the interface of Grid Service
    Instances. The defined interface must extend the
    OGSI GridService portType.
  • Grid Service Instance A stateful Web service
    whose interface adheres to that defined by a Grid
    Service Description and whose lifetime management
    properties are well defined.
  • Service Data Element An attribute-like construct
    exposing state information through operations
    defined by the GridService portType.
  • Grid Service Handle A URI that permanently
    identifies a Grid Service Instance.
  • Grid Service Reference A temporal,
    binding-specific endpoint that provides access to
    a Grid Service Instance.

57
Grid Service OGSA OGSI GT3OGSA Open
Grid Service Architecture
58
Grid Service OGSA OGSI GT3 (2)
  • Grid Services are defined by OGSA. The Open Grid
    Services Architecture (OGSA) aims to define a new
    common and standard architecture for grid-based
    applications. Right at the center of this new
    architecture is the concept of a Grid Service.
    OGSA defines what Grid Services are, what they
    should be capable of, what types of technologies
    they should be based on, but doesn't give a
    technical and detailed specification (which would
    be needed to implement a Grid Service).
  • Grid Services are specified by OGSI. The Open
    Grid Services Infrastructure is a formal and
    technical specification of the concepts described
    in OGSA, including Grid Services.
  • The Globus Toolkit 3 is an implementation of
    OGSI. GT3 is a usable implementation of
    everything that is specified in OGSI (and,
    therefore, of everything that is defined in
    OGSA).
  • Grid Services are based on Web Services. Grid
    Services are an extension of Web Services. We'll
    see what Web Services are in the next page, and
    what Grid Services are in the page after that.
  • I still don't get it What is the difference
    between OGSA, OGSI, and GT3? Consider the
    following simple example. Suppose you want to
    build a new house. The first thing you need to do
    is to hire an architect to draw up all the plans,
    so you can get an idea of what your house will
    look like. Once you're happy with the architect's
    job, it's time to hire an engineer who will make
    detailed blueprints that specify construction
    details (like where to put the master beams, the
    power cables, the plumbing, etc.). The engineer
    then passes all those blueprints to qualified
    professional workers (construction workers,
    electricians, plumbers, etc) who will actually
    build the house. We could say that OGSA (the
    definition) is the architect, OGSI (the
    specification) is the engineer, and GT3 (the
    implementation) is the workers.

59
OGSA - GridService
60
GT 3 Architecture I
  • Grid Services, which we have already seen, are
    the 'GT3 Core' layer. Let's take a look at the
    rest of the layers from the bottom up
  • GT3 Security Services Security is an important
    factor in grid-based applications. GT3 Security
    Services can help us restrict access to our Grid
    Services, so only authorized clients can use
    them. For example, we said that only our New
    York, Los Angeles, and Seattle offices could
    access MathService. We want to make sure only
    those offices have access to MathService and, of
    course, we want all the data exchanged between
    MathService and clients to be encrypted so we can
    keep malicious users from intercepting our data.
    Besides the usual security measures (putting the
    web server behind a firewall, etc.) GT3 gives us
    one more layer of security with technologies such
    as SSL and X.509 digital certificates.
  • GT3 Base Services This layer actually includes a
    whole lot of interesting services
  • Managed Job Service Suppose some particular
    operation in MathService might take hours or even
    days to be done. Of course, we don't want to
    simply stand in front of a computer waiting for
    the result to arrive (specially if, after 8 hours
    of waiting, all we get might simply be an error
    message!) We need to be able to check on the
    progress of the operation periodically, and have
    some control over it (pause it, stop it, etc.)
    This is usually called job management (in this
    case, the term 'job' is used instead of
    'operation'), The Managed Job Service allows us
    to treat our invocations like jobs, and manage
    them accordingly.

61
GT 3 Architecture II
  • Index Service Remember from a short introduction
    to Web Services that we usually know what type of
    Web Service we need, but we have no idea of where
    they are. This also happens with Grid Services
    we might know we need a Grid Service which meets
    certain requirements, but we have no idea of what
    its location is. While this was solved in Web
    Services with UDDI, GT3 has its own Index
    Service. For example, we could have several dozen
    MathServices all around the country, each with
    different characteristics (some might be better
    suited for statistical analysis, while others
    might me better for performing simulations).
    Index Service will allow is to query what
    MathService meets our particular requirements.
  • Reliable File Transfer (RFT) Service This
    service allows us to perform large file transfers
    between the client and the Grid Service. For
    example, suppose we have an operation in
    MathService which has to crunch several gigabytes
    of raw data (for a statistical analysis, for
    example). Of course, we're not going to send all
    that information as parameters. We'll be able to
    send it as a file. Furthermore, RFT guarantees
    the transfer will be reliable (hence its name).
    For example, if a file transfer is interrupted
    (due to a netwok failure, for example), RFT
    allows us to restart the file transfer from the
    moment it broke down, instead of starting all
    over again.
  • GT3 Data Services This layer includes Replica
    Management, which is very useful in applications
    that have to deal with very big sets of data.
    When working with large amount of data, we're
    usually not interested in downloading the whole
    thing, we just want to work with a small part of
    all that data. Replica Management keeps track of
    those subsets of data we will be working with.
  • Other Grid Services Other non-GT3 services can
    run on top of the GT3 Architecture.

62
Service Data
63
Service Data
64
Service Data
65
Notification Interfaces
66
Motivation for Notifications
67
Pull-Notifications
68
Notifications in GT3
69
ChallengeAdvanced Grid ApplicationsExample
Knowledge Discoveryin Grid Databases
70
Motivation
Business
Medicine
Scientific experiments
Data and data exploration
cloud
Simulations
Earth observations
71
The Knowledge Discovery Process
Knowledge
OLAP Queries
OLAP
Online Analytical Mining
Evaluation and Presentation
Data Mining
Selection and Transformation
Data Warehouse
Cleaning and Integration
72
The GridMiner Project in Vienna
  • GridMiner A knowledge discovery Grid
    infrastructure (http//www.gridminer.org/)
  • OGSA-based architecture
  • Workflow management
  • Grid-aware data preprocessing and data mining
    services
  • Data mediation service
  • OLAP service
  • GUI
  • Implementation on top of Globus Toolkit 3.0
  • Application Management of patients with
    traumatic brain injuries

73
GridMiner Architecture
GridMiner Workflow
GM DSCE Dynamic Service Control
GridMiner Core
GMPPS Preprocessing
GMDMS Data Mining
GMPRS Presentation
GMDIS Integration
GMOMS OLAM
GridMiner Base
GMMS Mediation
GMIS Information
GMRB Resource Broker
GMCMS OLAP / Cubes
Grid Core
Grid Core Services
Security
File and Database Access Service
Replica Management
Fabric
Grid Resources
Data Sources
74
Collaboration of GM-Services
Example 3
75
The Control Layer
  • Control Layer
  • Provision of the whole knowledge discovery
    process to a client
  • Knowledge discovery process in GridMiner
  • services to execute not known
  • order of service execution
  • sequential and concurrent execution
  • Approaches investigated
  • Data Mining Query Language
  • Standard Workflow Orchestration Approach
    (BPEL4WS, WSFL, GSFL, )
  • Our approach Dynamic Service Control

76
The Control LayerStandard Service Orchestration
Approach (BPEL4WS)
77
Workflow Models
Composition by Service Publisher
Composition by Service Consumer
78
The Control Layer - ApproachesDynamic Service
Control
Client
  • Dynamic Service Control Language (DSCL)
  • based on XML
  • easy to use
  • supports OGSA Grid Services
  • specially design to support knowledge discovery
    processes
  • Dynamic Service Control Engine (DSCE)
  • processes workflow according to DSCL

subscribe
Notification sink
Start, stop, resume
(re)connect
query results
DSCL
notify
DSCE
Service A
Service D
Service B
Service C
OGSA Grid Services
79
Dynamic Service Control Language (DSCL)
  • Features
  • Control flow
  • concurrent execution of activities
  • sequential execution of activities
  • Activities
  • creation of new Grid Service Instances
  • invoking operations on Grid Service Instances
  • querying information of Grid Service Instances
  • destroying of Grid Service Instances

80
DSCL - Structure
dscl
variables
composition
qreate Service
invoke
query SDE
qreate Service
invoke
query SDE
qreate Service
invoke
81
DSCL - Variables
  • Initializing by simple type value
  • Initializing by arrays

xsitypexsdint xmlnsxsihttp//www.w3.org/
2001/XMLSchema-instance xmlnsxsdhttp//www.w
3.org/2001/XMLSchema xmlnsns1http//ogsa.glo
bus.org4711
xsitypesoapencArray soapencarrayTypexsdin
t2 xmlnsxsihttp//www.w3.org/2001/XMLSchem
a-instance xmlnssoapenchttp//schemas.xmlsoa
p.org/soap/encoding/ xmlnsxsdhttp//www.w3.o
rg/2001/XMLSchema xmlnsns1http//ogsa.globus
.org 23 ncitem-112 le
82
DSCL Control Flow
act2.1
act1
act2.2
dscl
variables
composition
sequence
createService activityIDact1
parallel
invoke activityIDact2.1
invoke activityIDact2.2
sequence

83
Grid Database Access
84
Grid Database Access With OGSA-DAI(DAI Data
Access and Integraion)
OGSA-DAI provides Grid Database Service (GDS)
GDS gets a query via Perform Document GDS Engine
process specified activities GDS returns results
85
Grid Database Access With OGSA-DAI
86
Grid Data Mediation Service - Architecture
87
GDMS Example Scenario
  • Heterogeneities
  • Name in A is Alexander Wöhrer
  • Name in C has to be combined
  • Distribution
  • 3 data sources

88
Grid and Web Services Convergence Yes!
Web Services Resource Framework - WSRF
Grid
Web
The definition of WSRF means that Grid and Web
communities can move forward on a common base
First publications on WSRF January 2004
89
WSRF
Service Requestor
Grid Service A
OGSI Grid Service
Service Requestor
Web Services
Resource A
Resource B
Resource C
Web Service and WS-Resource Combination in WS-RSFM
90
Literatur
  • Grid Computing Making the Global Infrastructure
    a Reality.
  • By F. Berman, G. Fox, T. Hey (Eds.), Wiley
    2003
  • www.globus.org
  • www.gridminer.org (unser Forschungsprojekt)
  • Viele Dokumente im Web
Write a Comment
User Comments (0)
About PowerShow.com