Knowledge Management Challenges in Knowledge Discovery Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Knowledge Management Challenges in Knowledge Discovery Systems

Description:

TAKMA'05 Copenhagen, Denmark August 22-26, 2005. 2. TAKMA'05 Copenhagen, Denmark August 22-26, 2005 ... TAKMA'05 Copenhagen, Denmark August 22-26, 2005 ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 24
Provided by: mykolapec
Category:

less

Transcript and Presenter's Notes

Title: Knowledge Management Challenges in Knowledge Discovery Systems


1
Knowledge Management Challenges in Knowledge
Discovery Systems
TAKMA05 Copenhagen, Denmark August 22-26, 2005
  • Mykola Pechenizkiy, Seppo Puuronen Department of
    Computer ScienceUniversity of Jyväskylä Finland
  • Alexey Tsymbal
  • Department of Computer ScienceTrinity College
    DublinIreland

2
Outline
  • Introduction
  • KDD
  • Selection of DM strategy for a problem at hand
  • Meta-learning
  • Our goal
  • To propose a knowledge-driven approach to enhance
    the selection of DM strategies in KDSs.
  • Need for KM
  • What are the challenges
  • KM processes wrt problem of DM strategy selection
  • Further research
  • Discussion

3
Knowledge discovery as a process
I
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.,
Uthurusamy, R., Advances in Knowledge Discovery
and Data Mining, AAAI/MIT Press, 1997.
4
CRISP-DM
http//www.crisp-dm.org/
5
KDD Process Vertical Solutions
Reinartz, T. 1999, Focusing Solutions for Data
Mining. LNAI 1623, Berlin Heidelberg.
6
The Search for Scientific Methods and
Meta-Learning
  • Adequate scientific methods make induction easier
    with a smaller number of examples.
  • The choice of methods needs to be based on a
    higher level induction or on meta-learning in the
    context of machine learning.
  • knowledge concerning the most appropriate method
    for a given goal can be obtained by induction on
    the database of history of science a collection
    of problems of different methods, different goals
    and different degrees of success Laudan
  • Meta-learning can produce rules concerning the
    use of the alternative strategies, methodological
    knowledge, or correct predictions concerning the
    best rank of strategies for a new task.

7
Dynamic Selection of DM Methods
  • in KDSs has been under active study
  • 2 contexts of dynamic selection
  • multi-classifier systems that apply different
    ensemble techniques (Dietterich, 1997).
  • Their general idea is usually to select one
    classifier on the dynamic basis taking into
    account the local performance (e.g.
    generalisation accuracy) in the instance space.
  • multistrategy learning (Michalski)
  • applies a strategy selection approach which takes
    into account the classification problem- related
    characteristics (meta-data).

8
Selection of the most appropriate DM technique
  • Motivation
  • No Free Lunch theorem
  • many empirical studies show
  • one learning strategy can perform significantly
    better than another strategy on a group of
    problems that are characterised by some
    properties (Kiang, 2003).
  • Problem
  • Selection is usually not straightforward.
  • some knowledge is required for making a decision
    about appropriate techniques selection and DM
    strategy construction for a problem at hand.
  • We distinguish 2 levels of knowledge
  • the knowledge extracted from data that represents
    the problem to be mined by means of applying a DM
    technique
  • the higher-level knowledge (from the KDS
    perspective) required for managing techniques
    selection, combination and application gt
    meta-knowledge.

9
Meta-learning
  • or learning to learn the effort to
    automatically induce dependencies
  • learning tasks ? learning strategies.
  • based on the assumptions that it is possible
  • to evaluate and compare learning strategies,
  • to measure the benefits of early learning on
    subsequent learning,
  • to use such evaluations to reason about learning
    strategies
  • select useful ones and disregard the useless or
    misleading strategies (Schmidhuber et al., 1996).

10
in Meta-learning
  • in the context of classifier ensembles, where
    only the data itself is used to make decisions
    about method selection,
  • rather good practical results are shown in
    experiments supported by theoretical studies as
    well
  • in dynamic integration of DM strategies for a
    data set at hand
  • a multistrategy approach based on the ideas of
    constructive induction and conceptual clustering
    (Michalski, 1997)
  • several studies on automatic classifier selection
    via meta-learning (Kalousis, 2002)
  • No practical success!

11
Meta-Learning
12
Problems with Meta-Learning for DM SS
  • Representativeness of meta-data samples
  • Meta-learning space is large
  • Computationally expensive to produce meta-data
    samples
  • Curse of dimensionality
  • Many possible irrelevant features wrt
    collected/produced meta-data
  • Complexity of statistical measures
  • Why do we need to spend time to characterize the
    dataset if we can use this time to try different
    DM approaches and select the best one?

13
Our goal and focus KM perspective
  • to propose a knowledge-driven approach to enhance
    the dynamic integration of DM strategies in
    knowledge discovery systems
  • focus on KM aimed to organise a systematic
    process of knowledge capture and refinement over
    time.
  • We consider the basic knowledge management
    processes of
  • knowledge creation and identification,
  • representation, collection and organization,
  • sharing and integration,
  • adaptation and application
  • with respect to the introduced concept of
    meta-knowledge.

14
Introducing KM to DM SS
  • Generally, the problem of knowledge capture,
    storage, and dissemination is similar to data and
    information management in ISs, and therefore some
    executives prefer to view KM as a natural
    extension to IS functions (Alavi and Leidner,
    1999).
  • Zack (1999) the most practical way to define KM
    is to show on the existing IT infrastructure the
    involvement of
  • (1) knowledge repositories,
  • (2) best-practices and lessons-learned systems,
  • (3) expert networks these are DM experts, and
  • (4) communities of practice these are end-users.

15
Transformations of data and knowledge concepts
(adopted from Spiegler, 2000)
Knowledge is justified belief that increases an
entitys capacity for effective action (Nonaka,
1994). A long history of epistemological debates,
and discussion of knowledge from different
perspectives in Polanyi (1962).
16
Different types of knowing
17
Knowledge distribution and knowledge integration
  • 4 potential sources of knowledge that has to be
    integrated in the repository of KDS system
  • (1) knowledge from an expert in data-mining,
    knowledge discovery, statistics and related
    fields
  • (2) knowledge from a data-mining practitioner
  • (3) knowledge from laboratory experiments on
    synthetic data sets and, finally,
  • (4) knowledge from field experiments on
    real-world problems.
  • Beside this, research and business communities,
    and similar KDSs themselves can organize
    different trusted networks, where participant are
    motivated to share their knowledge.

18
Knowledge Repository Lifecycle (1 of 2)
  • Since the repository is created it tends to grow
    and at some point it naturally begins to collapse
    under its own weight, requiring major
    reorganization.
  • needs for continuously update,
  • some content needs to be deleted (if misleading),
    deactivated or archived (if it is potentially
    useful).
  • if similar contributions are combined,
    generalized and restructured, the content may
    become less fragmented and redundant.
  • The process of filtering knowledge claims into
    accepted or suppressed is important
  • when a plenty of claims are produced
    automatically they need to be filtered
    automatically.

19
Knowledge Repository Lifecycle (2 of 2)
  • knowing when and knowing where contexts
  • when the environment changes, all of the general
    rules without specifying the context could become
    invalid.
  • some knowledge should exist that would guide an
    organization to change the repository when the
    environment calls for it.
  • Some knowledge claims are naturally in constant
    competition with the other claims.
  • Disagreements within the knowledge repository
    need to be resolved by means of generalization of
    some parts and contextualization of the others.
  • In order to increase the quality and validity of
    knowledge, it needs to be continually tested,
    improved or removed.
  • Some basic principles of triggers can be
    introduced

20
Knowledge validity and knowledge quality
  • The contexts knowing when and knowing where
    can be discovered before it appears in a real
    situation.
  • Active learning
  • Zooming in and zooming out procedures
  • Search for balance between generality,
    compactness, interpretability, and
    understandability and sensitiveness to the
    context, exactness, precision, and adequacy of
    (meta-)knowledge.
  • context conditions can be important for knowledge
    quality estimation
  • The quality of knowledge can be estimated by its
    ability to help a KDS produce solutions faster
    and more effectively.
  • Knowledge claims have both a degree of utility
    and a degree of satisfaction.
  • To determine the relative quality of a validated
    knowledge claim, evaluation criteria should be
    defined
  • complexity, usefulness, and predictive power are
    well formalised and easy to estimate
  • understandability, reliability of source,
    explanatory power are rather subjective and
    therefore inaccurate.

21
Limitations
  • The goal of KM here is to make more effective and
    efficient use of available DM techniques.
  • The most important issues in knowledge
    management
  • (1) executive/strategic management,
  • (2) operational management,
  • the identification of available knowledge,
  • seeking ways to capture it in a KM process,
  • and analysing the ability to design an KM
    (sub)system including its tools and applications
  • (3) costs, benefits, and risks management, and
  • (4) standards in the KM technology and
    communication.

22
Further Research
  • Implementation of presented knowledge-driven
    framework for a KDS that contains a limited
    number of DM techniques of a certain type
  • Feature extraction techniques and classification
    techniques
  • Evaluation of the framework in practice for
    real-world problems in a distributed environment

23
Thank You!
  • Feedback is very welcome
  • Questions
  • Suggestions
  • Guidelines
  • Collaboration
  • Contact Info
  • Mykola Pechenizkiy
  • Department of Computer Science and Information
    Systems,
  • University of Jyväskylä, FINLAND
  • E-mail mpechen_at_cs.jyu.fi
  • Tel. 358 14 2602472 Fax 358 14 260 3011
  • http//www.cs.jyu.fi/mpechen
Write a Comment
User Comments (0)
About PowerShow.com