Title: MARIO CANNATARO
1Knowledge Discovery and Ontology-based Services
on the Grid
- MARIO CANNATARO
- University Magna Græcia of Catanzaro, Italy
- cannataro_at_unicz.it
Chicago, October 5, 2003
2OUTLINE
- Exploiting data on the Grid
- Services for future Grid
- Knowledge Discovery and Ontologies
- Building knowledge services the KNOWLEDGE GRID
- KNOWLEDGE GRID Architecture and Services
- A case study ontology-based application design
- Conclusions and future work
3EXPLOITING DATA ON THE GRID
- Main challenges for future Grids will be
- the exploitation of overwhelming amount of data
produced by applications and Grid operation - the exploitation of semantic of Grid resources
and services when using them - Grid Intelligence reflects how data and
information available over Grids can be
effectively acquired, represented, exchanged,
integrated, and converted into useful knowledge - Transforming challenges into opportunities
- Knowledge discovery and knowledge management
services to enable new applications and enhance
Grid management
4INTELLIGENT SERVICES FOR FUTURE GRIDS
- Ontologies and metadata are the basic elements
through which new Grid services can be built - Ontologies enable semantic modeling of Grid
resources, services, and users tasks/needs - Resource ontologies and metadata allow
intelligent searching and browsing - Data Mining and Knowledge Management could enable
high level services based on the semantic of
stored data for - application services implementing data analysis
(DataCentric Grids) - Grid management services based on analysis of
usage data - Peer-To-Peer and Ubiquitous Computing may be the
orthogonal key technologies to build basic
services - presence management, resource discovery and
sharing, collaboration and self-configuration
5GRID KNOWLEDGE BASE
- Grid knowledge base is used here to indicate all
the data stored, maintained, and updated by the
Grid, both for application and operation purposes - e.g. Globus metadata, Grid services usage data,
- application data, etc.
- A challenge will be their seamless integration
and utilization. - metadata and ontologies define semantic views on
the Grid knowledge base - each object on the Grid is classified through
one or more ontologies into the knowledge base - Needed building and integrating resource/services
ontologies
6ONTOLOGY-BASED SERVICES
- Leverage the semantic models of Grid objects to
navigate and search the Grid knowledge base - e.g. ontology-based Grid programming
- choosing, accessing and using software and data
components, classified through ontologies, to
build distributed component-based applications on
Grid - e.g. request-resource matchmaking
- find the best match (not necessarily identity)
between job requests and Grid resources, both
modeled through ontologies
7KNOWLEDGE DISCOVERY SERVICES
- They are used to extract knowledge from the data
stored into the Grid knowledge base - ontologies could guide the selection data of
useful - data mining is applied to selected data
- e.g. choosing the best GridFTP connection
parameters (TCP buffer size, parallel streams)
by - clustering GridFTP usage data and
- classifying new transfer requests with data
mining techniques - e.g. Grid-based document management service
- that classify Grid documents
- using ontologies and text mining functions
8SEMANTIC COMPRESSION AND SYNTHESIS
- Semantic (lossy or lossless) compression and
synthesis could be used - to offer different views of the Grid knowledge
base - depending on many factors user/service goals,
scope of resource information - Contents can be reorganized according to
- some aggregating functions (schema extraction and
restructuring), - resulting in a synthetic (compressed) yet
meaningful version - Different views of Grid resources exposing
different levels of details could be provided to
different classes of users/services
9CONTEXT-AWARENESS
- When new devices and resources are allowed to
enter / exit the Grid in a very dynamic way - new services able to adapt themselves to the
environment have to be developed - Ubiquitous Computing and Adaptive Hypermedia
techniques could be used - to face the environment dynamicity and
- to adapt services
10PARALLEL DISTRIBUTED KNOWLEDGE DISCOVERY
SYSTEMS ON GRIDS
- When
- large data sets are coupled with
- geographic distribution of data, users, and
systems, - To mine such data it is necessary to implement
- parallel and distributed knowledge discovery
systems (PDKD). - The basic principles that motivate the
architecture design of grid-aware PDKD systems - Data heterogeneity and large data size
- Algorithm integration and independence
- Grid awareness
- Openness, Scalability, Security and data privacy
11THE KNOWLEDGE GRID
- KNOWLEDGE GRID - a PDKD architecture that
integrates data mining techniques and
computational Grid resources. - In the KNOWLEDGE GRID architecture data mining
tools are integrated with lower-level Grid
mechanisms and services and exploit Data Grid
services. - A KNOWLEDGE GRID application uses
- A set of KNOWLEDGE GRID-enabled computers -
K-GRID nodes - declaring their availability to participate to
some PDKD computation, that are connected by - A Grid infrastructure
- offering basic grid-services (authentication,
data location, service level negotiation) and
implementing the KNOWLEDGE GRID services.
12KNOWLEDGE GRID ENVIRONMENT
KNOWLEDGE GRID services
Basic Grid Infrastucture
K-GRID tools
K-GRID tools
Grid Middleware
Grid Middleware
LAN
Cluster Element
Cluster Element
Cluster Element
Grid Middleware
K-GRID node
Cluster containing data sets and/or DM algorithms
Generic Grid node
K-GRID node
13KNOWLEDGE GRID SERVICES
- The KNOWLEDGE GRID services are organized in two
hierarchic layers - Core K-Grid layer and
- High-level K-Grid layer.
- The former refers to services directly
implemented on the top of generic Grid services. - The latter is used to describe, develop, and
execute PDKD computations over the KNOWLEDGE
GRID.
14KNOWLEDGE GRID ARCHITECTURE
KNOWLEDGE GRID
15APPLICATION COMPOSITION STEPS
Metadata about K-grid resources
KMRs
Search and selection of resources
DAS / TAAS
Metadata about the selected K-grid resources
TMR
Design of the PDKD computation
EPMS
Execution Plan
KEPR
16APPLICATION EXECUTION STEPS
17OBJECTS and LINKS
Objects
Links
Objects represent resources
Links represent relations among resources
18VEGA
Hosts pane
Resources pane
19VEGA
A KGrid application can be composed of several
workspaces
20APPLICATION EXECUTION
21ONTOLOGY-BASED APPLICATION DESIGN
- Problem making easier the development of
distributed Data Mining applications on the
Knowledge Grid - many available pre-existing software tools
- many different implementations and methodologies
- Solution develop a Domain Ontology for the Data
Mining domain, and using it to enhance the design
of component-based distributed data mining
applications - DAMON DAta Mining ONtology
22DAMON GOALS
- For what we are going to use DAMON
- Classify Data Mining (DM) Software on the basis
of some parameters useful to select the more
suitable ones to solve a KDD problem - The Data Mining task performed by the software
- The methodologies that software uses in the data
mining process - The kind of data sources the software works on
- The degree of required interaction with the user
- For what type of questions the ontology should
answer - to assist the user suggesting him/her the
software to use on the basis of the users
requirements/needs - to allow the semantic search (concept-based) of
data mining software and others data mining
resources
23- DAMON taxonomies
- Task the knowledge discovery goal that the
software is intended for - Method a DM methodology used to discover the
Knowledge - Algorithm the way through which a DM task is
performed - Software an implementation of a DM algorithm
- Suite implements a set of DM algorithms
- Data Source the input on which DM algorithms
work to extract new knowledge - Human Interaction specifies how much human
interaction with the discovery process is
required and/or supported
24DAMON A VIEW OVER THE GRID KNOWLEDGE BASE
- The Data Mining knowledge base used to support
knowledge discovery programming has two
conceptual layers - at the top layer the DAMON ontology gives general
information about the Data Mining domain, - at the bottom layer specific information about
installed software components and data sources
are maintained where resources resides. - From an architectural point of view the ontology
is a central resource, whereas specific metadata
are distributed ones. - As an example, DAMON stores the fact that the
C5.0 Software implements the C5 Algorithm, that
uses the Decision Tree method, that is a
Classification method. - The C5.0 Software node of ontology contains the
URLs of the KNOWLEDGE GRID metadata files
describing details about all the installed
instances of that software.
25ONTOLOGY-BASED SERVICES ACCESSING THE GRID
KNOWLEDGE BASE
- DAMON is used as an ontology-based assistant that
helps the KNOWLEDGE GRID application designer in - application formulation and design,
- selection and composition the most suitable Data
Mining components for a specific knowledge
discovery process - semantic search of data mining software.
26SEMANTIC SEARCH OF DATA MINING SOFTWARE
- Ontology-based resources selection.
- Browsing and searching the ontology allows a user
to locate the more appropriate tasks, methods,
algorithms and finally data mining software to be
used in a certain phase of the KDD process. - The user can query very detailed information
about Data Mining resources, using several kinds
of inference that can broaden queries.. - Metadata access.
- The ontology return the metadata URLs of all
selected resources available on the KNOWLEDGE
GRID nodes - metadata are used to access and use software
(technical parameters, policies, location and
configuration, etc.).
27KNOWLEDGE GRID WITH ONTOLOGY SERVICES
KNOWLEDGE GRID
28CONCLUSIONS AND FUTURE WORK
- The increasing volume of data available on the
Grid is a challenge but also an opportunity - Ontology and Knowledge Discovery are key
technologies to extract and synthesize useful
information for applications and Grid management - The KNOWLEDGE GRID allows to develop distributed
data mining applications on the Grid - ontology-based services are used in components
search and selection by using a domain ontology
DAMON - We are defining a set of Grid Services using the
OGSA conventions and mechanisms that export
functionalities and operations of the KNOWLEDGE
GRID
29THANKS
30MAIN REFERENCES
- M. Cannataro, D. Talia, The Knowledge Grid,
Communications of the ACM, 46(1), 2003. - M Cannataro, D. Talia, P. Trunfio, Distributed
Data Mining on the Grid, Future Generation
Computer Systems, 18(8), 2002. - M. Cannataro and C. Comito, A Data Mining
Ontology for Grid Programming, 1st Int. Workshop
on Semantics in Peer-to-Peer and Grid Computing
May 2003