An Evaluation of Commercial Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

An Evaluation of Commercial Data Mining

Description:

– PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 21
Provided by: g01d
Category:

less

Transcript and Presenter's Notes

Title: An Evaluation of Commercial Data Mining


1
An Evaluation of Commercial Data Mining
  • Proposed and Presented by
  • Emily Davis
  • Supervisor John Ebden

2
Statement of the Problem
  • An Evaluation of Commercial Data Mining
    Capabilities, for example Oracle9is Data Mining
    Suite.

3
Background
  • Data mining is a relatively new offshoot of
  • database technology which has arisen as a result
  • of the ability of computers to
  • Store vast quantities of data in data warehouses.
  • Implement ingenious algorithms for the mining of
    data.
  • Use these algorithms to analyse these vast
    quantities of data in a reasonable amount of time.

4
  • Data mining discovers the patterns in data that
    represent knowledge.
  • It is of interest what algorithms data mining
    suites use and how well each category of data
    mining algorithm performs on data and what kind
    of results are produced.
  • Another important issue is usability of the
    algorithm.
  • Random Number Example taken from
    http//www.saltspring.com/brochmann/math/mining/mi
    ning1.html

5
  •           data a data b        data
    c
  • 1.00000000 0.71132700 0.15379400
    1.88403600 2.00000000 0.62219935 0.83119106
    3.73797189 3.00000000 0.33872289 0.80881084
    3.10387831 4.00000000 0.54262732 0.35427095
    2.14806749 5.00000000 0.50631348 0.71599532
    3.16061290 6.00000000 0.00132503 0.22447315
    0.67606951 7.00000000 0.76211535 0.94620700
    4.36285170 8.00000000 0.91026206 0.89499186
    4.50549970 9.00000000 0.92640874 0.47156928
    3.26752532 10.0000000 0.49323546 0.27673696
    1.81668179 11.0000000 0.04501477 0.30142353
    0.99430013 12.0000000 0.49180000 0.17909135
    1.52087404 13.0000000 0.06747225 0.85629071
    2.70381663 14.0000000 0.84239974 0.41916601
    2.94229750

6
  • 49.0000000 0.07845276 0.69584199 2.24443147
  • 50.0000000 0.07548299 0.52973340 1.74016616
  • 51.0000000 0.72301849 0.97594044 ????????
  • Data A and B random numbers generated in Excel.
  • Data c 2(data a) 3(data b).

7
  • 51st value calculated by Excel4.37385831
  • Value calculated using Knowledge Miner a
    Macintosh data mining tool
  • 4.34791231 and the equation
  • 1.97(data a) 2.96(data b) 0.0324

8
  • Experiment repeated using three columns of random
    numbers and this equation
  • Data d 23(data a)-4.5(data b)(data a
    data c) .
  • The last five entries for Data D were missing
    from the column.

9
  • These were generated by Excel
  • 14.7314558
  • 12.0720505
  • 22.0008992
  • 7.52633344
  • 5.25167700
  • These are what Knowledge Miner predicted
  • 14.7341613
  • 12.0731391
  • 22.0080223
  • 7.52465867
  • 5.24861860

10
Plan of Action
  • Literature Survey (and other resources)
  • Install Software for Oracle
  • Get to know the Oracle Suite
  • Evaluate Oracle9is Data Mining Suite

11
Install Software for Oracle
  • Including JDeveloper
  • May be extended to the installation of other
    commercial data mining suites eg.
  • DB2s Intelligent Miner
  • Informixs Data Mine

12
Investigate Oracle9is Data Mining Suite
  • Two major algorithm types supervised and
    unsupervised learning.
  • A Medical Example
  • Supervised learning researchers input medical
    profiles into a leukaemia model to predict
    propensity for the disease.
  • Unsupervised learning searches for clusters of
    related information in data sets to reveal
    insights about diseases and patient populations.

13
Get to know the Oracle DM Suite (a major task).
  • Explore JDeveloper, Oracle9is Java based API.
  • JDeveloper complies with JDM (Java Data Mining)
    used by Oracle, Sun, IBM and others.
  • Explore DM4J( Data Mining for Java) the new
    Graphical User Interface for Oracle DM.

14
Addressing the Problem
  • Run the different algorithms available in the
    data mining suite.
  • Document and analyse results in terms of
    performance and effectiveness of algorithm.

15
Expected Results
  • The ability to say conclusively whether Oracle's
    data mining capabilities are inferior or superior
    to anything else in the market place and why this
    can be stated.

16
Possible Extensions to the Project
  • To have sufficient knowledge of the topic to give
    recommendations or feedback
  • to Oracle regarding their data mining suite.
  • to IT customers wanting to purchase data mining
    suites.
  • Explore the field of Random stereograms- could a
    computer see them? If not, why not?

17
Literature Survey
  • Principles of data mining by David Hand, Heikki
    Mannila and Padhraic Smyth, Cambridge
    Massachusetts, MIT Press, 2001 algorithmic
    concepts
  • Data mining concepts and techniques by Jiawei
    Han and Micheline Kamber, San Francisco,
    California, Morgan Kauffmann, 2001 algorithmic
    evaluations
  • Data mining a tutorial- based primer by Richard
    J. Roiger and Michael W. Geatz, Boston,
    Massachusetts, Addison Wesley, 2003 - practical
    knowledge and processing

18
  • Data Mining by Pieter Adriaans and Dolf Zantinge,
    Harlow, England, Addison Wesley, 1996 real life
    application
  • Data Mining and Statistical Analysis Using SQL by
    Robert P. Trueblood and John N. Lovett, Jnr.,
    USA, Apress, 2001 statistical principles
  • Data Mining Using SAS Applications by George
    Fernandez, USA, Chapman and Hall/CRC, 2003 -
    methodologies

19
  • Mastering Data Mining The Art and Science of
    Customer Relationship Management by Michael J.A.
    Berry and Gordon S. Linoff, USA, Wiley Computer
    Publishing, 2000 building effective models
  • Data Preparation for Data Mining by Dorian Pyle,
    San Francisco, California, Morgan Kauffman, 2000
    Demo code,
  • 10 Golden Rules.

20
  • The White Paper Data Mining- Beyond Algorithms
    by Dr Akeel Al-Attar, available at
    http//www.attar.com/tutor/mining.htm
  • Summary from the KDD-03 PanelData Mining The
    Next Ten Years available at http//www.acm.org/sig
    s/sigkdd/explorations/issue5-2/pnl_10yrs_final1.pd
    f
  • Oracle Website
  • Oracle Magazine
Write a Comment
User Comments (0)
About PowerShow.com