Title: Why Data Mining Research Does Not Contribute to Business
1Why Data Mining Research Does Not Contribute to
Business?
DMBiz05 Porto, Portugal October 3, 2005
- Mykola Pechenizkiy, Seppo Puuronen Department of
Computer ScienceUniversity of Jyväskylä
Finland
-
- Alexey Tsymbal
- Department of Computer ScienceTrinity College
DublinIreland
2Outline
- Introduction and What is our message?
- Where we are? rigor vs. relevance in DM
- Towards the new framework for DM research
- DM System as adaptive Information System (IS)
- DM research as IS Development DM system as
artefact
- DM success model success factors
- Further plans and Discussion
3Our Message
- DM is still a technology having great
expectations to enable organizations to take more
benefit of their huge databases.
- There exist some success stories where
organizations have managed to have competitive
advantage of DM.
- Still the strong focus of most DM-researchers in
technology-oriented topics does not support
expanding the scope in less rigorous but
practically very relevant sub-areas. - Research in the IS discipline has strong
traditions to take into account human and
organizational aspects of systems beside the
technical ones.
4Our Message
- Currently the maturation of DM-supporting
processes which would take into account human and
organizational aspects is still living its
childhood. - DM community might benefit, at least from the
practical point of view, looking at some other
older sub-areas of IT having traditions to
consider solution-driven concepts with a focus
also on human and organizational aspects. - The DM community by becoming more amenable to
research results of the IS community might be
able to increase its collective understanding of
- how DM artifacts are developed conceived,
constructed, and implemented,
- how DM artifacts are used, supported and evolved,
- how DM artifacts impact and are impacted by the
contexts in which they are embedded.
5Existing Frameworks for DM
- Theory-oriented
- Databases
- Statistics
- Machine learning
- Data compression
- Process-oriented
- Fayyads
- CRISP-DM
- Reinartzs
- Reductionist approach of viewing DM as statistics
has advantages of the strong background, and
easy-formulated problems.
- The DM tasks concerning processes like
clustering, regression and classification fit
easily into these approaches.
- More recent (process-oriented) frameworks address
the issues related to a view of DM as a process,
and its iterative and interactive nature
6Rigor and Relevance in DM Research
- Lin in Wu et al. notices that a new successful
industry (as DM) can follow consecutive phases
- discovering a new idea,
- ensuring its applicability,
- producing small-scale systems to test the market,
- better understanding of new technology and
- producing a fully scaled system.
- At the present moment there are several dozens of
DM systems, none of which can be compared to the
scale of a DBMS system.
- This fact indicates that we are still in the 3rd
phase in the DM area!
7Rigor vs Relevance in DM Research
8Where is the focus?
- Still! speeding-up, scaling-up, and increasing
the accuracies of DM techniques.
- Piatetsky-Shapiro we see many papers proposing
incremental refinements in association rules
algorithms, but very few papers describing how
the discovered association rules are used - Lin claims that the RD goals of DM are quite
different
- since research is knowledge-oriented while
development is profit-oriented.
- Thus, DM research is concentrated on the
development of new algorithms or their
enhancements,
- but the DM developers in domain areas are aware
of cost considerations investment in research,
product development, marketing, and product
support. - However, we believe that the study of the DM
development and DM use processes is equally
important as the technological aspects and
therefore such research activities are likely to
emerge within the DM field.
Towards the new framework for DM research
9DMS in the Kernel of an Organization
Environment
- DM is fundamentally application-oriented area
motivated by business and scientific needs to
make sense of mountains of data.
- A DMS is generally used to support or do some
task(s) by human beings in an organizational
environment both having their desires related to
DMS. - Further, the organization has its own environment
that has its own interest related to DMS, e.g.
that privacy of people is not violated.
10The ISs-based paradigm for DM
Ives B., Hamilton S., Davis G. (1980). A
Framework for Research in Computer-based MIS
Management Science, 26(9), 910-934.
Information systems are powerful instruments for
organizational problem solving through formal
information processing
Lyytinen, K., 1987, Different perspectives on
ISs problems and solutions. ACM Computing
Surveys, 19(1), 5-46.
11DM Artifact Development
A multimethodological approach to the
construction of an artefact for DM
Adapted from Nunamaker, W., Chen, M., and
Purdin, T. 1990-91, Systems development in
information systems research, Journal of
Management Information Systems, 7(3), 89-106.
12The Action Research and Design Science Approach
to Artifact Creation
13DM Artifact Use Success Model 1 of 3
Adapted from DM IS Success Models
14DM Artifact Use Success Model 2 of 3
- What are the key factors of successful use and
impact of DMS both at the individual and
organizational levels.
- how the system is used, and also supported and
evolved, and
- how the system impacts and is impacted by the
contexts in which it is embedded.
- Coppock the failure factors of DM-related
projects.
- have nothing to do with the skill of the modeler
or the quality of data.
- But those do include
- persons in charge of the project did not
formulate actionable insights,
- the sponsors of the work did not communicate the
insights derived to key constituents,
- the results don't agree with institutional truths
the leadership, communication skills and
understanding of the culture of the organization
are not less important than the traditionally
emphasized technological job of turning data into
insights
15DM Artifact Use Success Model 3 of 3
- Hermiz communicated his beliefs that there are
the four critical success factors for DM
projects
- (1) having a clearly articulated business problem
that needs to be solved and for which DM is a
proper tool
- (2) insuring that the problem being pursued is
supported by the right type of data of sufficient
quality and in sufficient quantity for DM
- (3) recognizing that DM is a process with many
components and dependencies the entire project
cannot be "managed" in the traditional sense of
the business word - (4) planning to learn from the DM process
regardless of the outcome, and clearly
understanding, that there is no guarantee that
any given DM project will be successful.
16New Research Framework for DM Research
17New Research Framework for DM Research
18Further Work
- Definition of Relevance concept in DM research
- The revision of the book chapter
- Further work on the new framework for DM
research
- Organization of Workshop/Working conf. or ST on
- more social directions in DM research likely
with one of the focuses on IS as a sister
discipline.
- SIAM DM 2006 Interests include
- Human Factors and Social Issues
- ? Ethics of Data ?Mining Intellectual
Ownership? Privacy Models ? Privacy
Preservation Techniques? Risk Analysis ?
User Interfaces? Data and Result Visualization
19Thank You!
- Feedback is very welcome
- Questions
- Suggestions
- Collaboration
- Book chapter draft is available on request from
- Mykola Pechenizkiy
- Department of Computer Science and Information
Systems,
- University of Jyväskylä, FINLAND
- E-mail mpechen_at_cs.jyu.fi
- Tel. 358 14 2602472 Fax 358 14 260 3011
- http//www.cs.jyu.fi/mpechen