MARIO CANNATARO - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

MARIO CANNATARO

Description:

MARIO CANNATARO. University 'Magna Gr cia' of Catanzaro, Italy. cannataro_at_unicz.it ... Main challenges for future Grids will be ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 31
Provided by: deis4
Category:
Tags: cannataro | mario | magna

less

Transcript and Presenter's Notes

Title: MARIO CANNATARO


1
Knowledge Discovery and Ontology-based Services
on the Grid
  • MARIO CANNATARO
  • University Magna Græcia of Catanzaro, Italy
  • cannataro_at_unicz.it

Chicago, October 5, 2003
2
OUTLINE
  • Exploiting data on the Grid
  • Services for future Grid
  • Knowledge Discovery and Ontologies
  • Building knowledge services the KNOWLEDGE GRID
  • KNOWLEDGE GRID Architecture and Services
  • A case study ontology-based application design
  • Conclusions and future work

3
EXPLOITING DATA ON THE GRID
  • Main challenges for future Grids will be
  • the exploitation of overwhelming amount of data
    produced by applications and Grid operation
  • the exploitation of semantic of Grid resources
    and services when using them
  • Grid Intelligence reflects how data and
    information available over Grids can be
    effectively acquired, represented, exchanged,
    integrated, and converted into useful knowledge
  • Transforming challenges into opportunities
  • Knowledge discovery and knowledge management
    services to enable new applications and enhance
    Grid management

4
INTELLIGENT SERVICES FOR FUTURE GRIDS
  • Ontologies and metadata are the basic elements
    through which new Grid services can be built
  • Ontologies enable semantic modeling of Grid
    resources, services, and users tasks/needs
  • Resource ontologies and metadata allow
    intelligent searching and browsing
  • Data Mining and Knowledge Management could enable
    high level services based on the semantic of
    stored data for
  • application services implementing data analysis
    (DataCentric Grids)
  • Grid management services based on analysis of
    usage data
  • Peer-To-Peer and Ubiquitous Computing may be the
    orthogonal key technologies to build basic
    services
  • presence management, resource discovery and
    sharing, collaboration and self-configuration

5
GRID KNOWLEDGE BASE
  • Grid knowledge base is used here to indicate all
    the data stored, maintained, and updated by the
    Grid, both for application and operation purposes
  • e.g. Globus metadata, Grid services usage data,
  • application data, etc.
  • A challenge will be their seamless integration
    and utilization.
  • metadata and ontologies define semantic views on
    the Grid knowledge base
  • each object on the Grid is classified through
    one or more ontologies into the knowledge base
  • Needed building and integrating resource/services
    ontologies

6
ONTOLOGY-BASED SERVICES
  • Leverage the semantic models of Grid objects to
    navigate and search the Grid knowledge base
  • e.g. ontology-based Grid programming
  • choosing, accessing and using software and data
    components, classified through ontologies, to
    build distributed component-based applications on
    Grid
  • e.g. request-resource matchmaking
  • find the best match (not necessarily identity)
    between job requests and Grid resources, both
    modeled through ontologies

7
KNOWLEDGE DISCOVERY SERVICES
  • They are used to extract knowledge from the data
    stored into the Grid knowledge base
  • ontologies could guide the selection data of
    useful
  • data mining is applied to selected data
  • e.g. choosing the best GridFTP connection
    parameters (TCP buffer size, parallel streams)
    by
  • clustering GridFTP usage data and
  • classifying new transfer requests with data
    mining techniques
  • e.g. Grid-based document management service
  • that classify Grid documents
  • using ontologies and text mining functions

8
SEMANTIC COMPRESSION AND SYNTHESIS
  • Semantic (lossy or lossless) compression and
    synthesis could be used
  • to offer different views of the Grid knowledge
    base
  • depending on many factors user/service goals,
    scope of resource information
  • Contents can be reorganized according to
  • some aggregating functions (schema extraction and
    restructuring),
  • resulting in a synthetic (compressed) yet
    meaningful version
  • Different views of Grid resources exposing
    different levels of details could be provided to
    different classes of users/services

9
CONTEXT-AWARENESS
  • When new devices and resources are allowed to
    enter / exit the Grid in a very dynamic way
  • new services able to adapt themselves to the
    environment have to be developed
  • Ubiquitous Computing and Adaptive Hypermedia
    techniques could be used
  • to face the environment dynamicity and
  • to adapt services

10
PARALLEL DISTRIBUTED KNOWLEDGE DISCOVERY
SYSTEMS ON GRIDS
  • When
  • large data sets are coupled with
  • geographic distribution of data, users, and
    systems,
  • To mine such data it is necessary to implement
  • parallel and distributed knowledge discovery
    systems (PDKD).
  • The basic principles that motivate the
    architecture design of grid-aware PDKD systems
  • Data heterogeneity and large data size
  • Algorithm integration and independence
  • Grid awareness
  • Openness, Scalability, Security and data privacy

11
THE KNOWLEDGE GRID
  • KNOWLEDGE GRID - a PDKD architecture that
    integrates data mining techniques and
    computational Grid resources.
  • In the KNOWLEDGE GRID architecture data mining
    tools are integrated with lower-level Grid
    mechanisms and services and exploit Data Grid
    services.
  • A KNOWLEDGE GRID application uses
  • A set of KNOWLEDGE GRID-enabled computers -
    K-GRID nodes
  • declaring their availability to participate to
    some PDKD computation, that are connected by
  • A Grid infrastructure
  • offering basic grid-services (authentication,
    data location, service level negotiation) and
    implementing the KNOWLEDGE GRID services.

12
KNOWLEDGE GRID ENVIRONMENT
KNOWLEDGE GRID services
Basic Grid Infrastucture
K-GRID tools
K-GRID tools
Grid Middleware
Grid Middleware
LAN
Cluster Element
Cluster Element
Cluster Element
Grid Middleware
K-GRID node
Cluster containing data sets and/or DM algorithms
Generic Grid node
K-GRID node
13
KNOWLEDGE GRID SERVICES
  • The KNOWLEDGE GRID services are organized in two
    hierarchic layers
  • Core K-Grid layer and
  • High-level K-Grid layer.
  • The former refers to services directly
    implemented on the top of generic Grid services.
  • The latter is used to describe, develop, and
    execute PDKD computations over the KNOWLEDGE
    GRID.

14
KNOWLEDGE GRID ARCHITECTURE
KNOWLEDGE GRID
15
APPLICATION COMPOSITION STEPS
Metadata about K-grid resources
KMRs
Search and selection of resources
DAS / TAAS
Metadata about the selected K-grid resources
TMR
Design of the PDKD computation
EPMS
Execution Plan
KEPR
16
APPLICATION EXECUTION STEPS
17
OBJECTS and LINKS
Objects
Links
Objects represent resources
Links represent relations among resources
18
VEGA
Hosts pane
Resources pane
19
VEGA
A KGrid application can be composed of several
workspaces
20
APPLICATION EXECUTION
21
ONTOLOGY-BASED APPLICATION DESIGN
  • Problem making easier the development of
    distributed Data Mining applications on the
    Knowledge Grid
  • many available pre-existing software tools
  • many different implementations and methodologies
  • Solution develop a Domain Ontology for the Data
    Mining domain, and using it to enhance the design
    of component-based distributed data mining
    applications
  • DAMON DAta Mining ONtology

22
DAMON GOALS
  • For what we are going to use DAMON
  • Classify Data Mining (DM) Software on the basis
    of some parameters useful to select the more
    suitable ones to solve a KDD problem
  • The Data Mining task performed by the software
  • The methodologies that software uses in the data
    mining process
  • The kind of data sources the software works on
  • The degree of required interaction with the user
  • For what type of questions the ontology should
    answer
  • to assist the user suggesting him/her the
    software to use on the basis of the users
    requirements/needs
  • to allow the semantic search (concept-based) of
    data mining software and others data mining
    resources

23
  • DAMON taxonomies
  • Task the knowledge discovery goal that the
    software is intended for
  • Method a DM methodology used to discover the
    Knowledge
  • Algorithm the way through which a DM task is
    performed
  • Software an implementation of a DM algorithm
  • Suite implements a set of DM algorithms
  • Data Source the input on which DM algorithms
    work to extract new knowledge
  • Human Interaction specifies how much human
    interaction with the discovery process is
    required and/or supported

24
DAMON A VIEW OVER THE GRID KNOWLEDGE BASE
  • The Data Mining knowledge base used to support
    knowledge discovery programming has two
    conceptual layers
  • at the top layer the DAMON ontology gives general
    information about the Data Mining domain,
  • at the bottom layer specific information about
    installed software components and data sources
    are maintained where resources resides.
  • From an architectural point of view the ontology
    is a central resource, whereas specific metadata
    are distributed ones.
  • As an example, DAMON stores the fact that the
    C5.0 Software implements the C5 Algorithm, that
    uses the Decision Tree method, that is a
    Classification method.
  • The C5.0 Software node of ontology contains the
    URLs of the KNOWLEDGE GRID metadata files
    describing details about all the installed
    instances of that software.

25
ONTOLOGY-BASED SERVICES ACCESSING THE GRID
KNOWLEDGE BASE
  • DAMON is used as an ontology-based assistant that
    helps the KNOWLEDGE GRID application designer in
  • application formulation and design,
  • selection and composition the most suitable Data
    Mining components for a specific knowledge
    discovery process
  • semantic search of data mining software.

26
SEMANTIC SEARCH OF DATA MINING SOFTWARE
  • Ontology-based resources selection.
  • Browsing and searching the ontology allows a user
    to locate the more appropriate tasks, methods,
    algorithms and finally data mining software to be
    used in a certain phase of the KDD process.
  • The user can query very detailed information
    about Data Mining resources, using several kinds
    of inference that can broaden queries..
  • Metadata access.
  • The ontology return the metadata URLs of all
    selected resources available on the KNOWLEDGE
    GRID nodes
  • metadata are used to access and use software
    (technical parameters, policies, location and
    configuration, etc.).

27
KNOWLEDGE GRID WITH ONTOLOGY SERVICES
KNOWLEDGE GRID
28
CONCLUSIONS AND FUTURE WORK
  • The increasing volume of data available on the
    Grid is a challenge but also an opportunity
  • Ontology and Knowledge Discovery are key
    technologies to extract and synthesize useful
    information for applications and Grid management
  • The KNOWLEDGE GRID allows to develop distributed
    data mining applications on the Grid
  • ontology-based services are used in components
    search and selection by using a domain ontology
    DAMON
  • We are defining a set of Grid Services using the
    OGSA conventions and mechanisms that export
    functionalities and operations of the KNOWLEDGE
    GRID

29
THANKS
30
MAIN REFERENCES
  • M. Cannataro, D. Talia,  The Knowledge Grid,
    Communications of the ACM, 46(1), 2003.
  • M Cannataro, D. Talia, P. Trunfio, Distributed
    Data Mining on the Grid, Future Generation
    Computer Systems, 18(8), 2002.
  • M. Cannataro and C. Comito, A Data Mining
    Ontology for Grid Programming, 1st Int. Workshop
    on Semantics in Peer-to-Peer and Grid Computing
    May 2003
Write a Comment
User Comments (0)
About PowerShow.com