Data Mining with Oracle using Classification and Clustering Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining with Oracle using Classification and Clustering Algorithms

Description:

Data Mining with Oracle using Classification and ... The commercial world is fast reacting to the growth & potential ... Robert P. Trueblood and John N. ... – PowerPoint PPT presentation

Number of Views:330
Avg rating:3.0/5.0
Slides: 17
Provided by: G05M
Category:

less

Transcript and Presenter's Notes

Title: Data Mining with Oracle using Classification and Clustering Algorithms


1
Data Mining with Oracle using Classification and
Clustering Algorithms
  • Proposed and Presented by
  • Nhamo Mdzingwa
  • Supervisor John Ebden

2
Presentation Outline
  • Problem Statement
  • Objective
  • Background
  • Expected Results
  • Possible Extensions
  • Plan of action
  • Timeline
  • Literature Survey
  • Questions

3
Problem Statement
  • The commercial world is fast reacting to the
    growth potential in the DM area, as a wide
    range of tools are being marketed as DM suites.
  • Examples of these are
  • Oracle DM
  • DB2s Intelligent Miner
  • Informixs Data Mine
  • SQL Data miner
  • Ghost miner
  • Clementine 9.0 (SPSS)
  • SAS
  • Gornish systems, etc

4
Problem
  • It is vital to know the algorithms a DM suite
    uses and which algorithm to use on a particular
    data set.
  • Secondly, how well each algorithm performs in
    terms of accuracy, efficiency and effectiveness
    when using a particular DM suite e.g. Oracle DM.

5
Objective
  • Investigate two types of algorithms available in
    Oracle for data mining (ODM).
  • Apply the two algorithms to actual data.
  • Analyse
  • Evaluate
  • results in terms of performance.

6
What is Data Mining? (Background)
  • Simply put, DM is knowledge discovery.
  • DM is the process of automatic discovery of
    hidden patterns and relationships within
    enormous amounts of data.
  • It is a powerful new technology that allows
    businesses to make proactive, knowledge-driven
    decisions as it tries to predict the future.
  • Data (represents knowledge) normally stored in
    databases and data warehouses ( typical size in
    tera-bytes).

7
Automatic discovery is implemented by the use of
algorithms provided by DM suites
  • E.g. oracle offers
  • Adaptive Bayes Network supporting decision
    trees (classification)
  • Naive Bayes (classification)
  • Model Seeker (classification)
  • k-Means (clustering)
  • O-Cluster (clustering)
  • Predictive variance (attribute importance)
  • Apriori (association rules)

8
  • Algorithms are grouped as either supervised or
    unsupervised learning strategies.

9
The data mining process involves a series of
steps to define a business problem, gather and
prepare the data, build and evaluate mining
models, and apply the models and disseminate the
new information.
10
Expected Results
  • Aim at conclusively saying which algorithm will
    be most effective and suitable for the process of
    data mining on any dataset - since datasets
    are different.

11
Possible Extensions to the Project
  • testing of the same algorithms with different
    tools offered by other vendors.
  • e.g. testing with the DM suite in SQL and
    checking if the results are similar.
  • If not, investigating why the results are
    different, could be another extension.

12
Plan of Action
  • Carry out a literature search
  • mainly to obtain background knowledge and
    understanding of field.
  • Get to know Oracle DM Suite
  • Do DM tutorials provided by oracle.
  • The server Ora1 is the machine Ill be working
    with.
  • It is already installed with JDeveloper oracle
    10g database, oracle 9i DM.

13
Timeline
Continuation from literature and tutorials done
Investigate Clustering Classification algorithms (theory) 2nd term- 15 to 30 April
Find suitable computerised case studies of the use of above algorithms with or without Oracle. 2nd term- End of May
Search databases for testing (possibilities AIDS data faculty data) 2nd term- End of May
Apply algorithms to data found then Critically Analyse assess results Second semester
Write up paper September vacation and 3rd term
Final project write up Due 7/11
14
Literature Survey
  • Richard J. Roiger and Michael W. Geatz, Data
    mining a tutorial- based primer. Boston,
    Massachusetts, Addison Wesley, 2003
  • This book will provide the necessary background
    and practical knowledge required for the project
    research and also presents different
    methodologies used in data mining that may be
    useful.

15
  • David Hand, Heikki Mannila and Padhraic Smyth,
    Principles of data mining.
  • Cambridge Massachusetts, MIT Press, 2001.
  • Jesus Mena, Data mining your website. Digital
    Press, 1999.
  • Jiawei Han and Micheline Kamber, Data mining
    concepts and techniques
  • San Francisco, California, Morgan Kauffmann,
    2001
  • Robert P. Trueblood and John N. Lovett, Jnr. Data
    Mining and Statistical Analysis Using SQL, USA,
    Apress,
  • http//www.lc.leidenuniv.nl/awcourse/oracle/datami
    ne.920/a95961/preface.htm
  • http//www.oracle.com/technology/products/oracle9i
    /htdocs/o9idm_faq.html
  • http//fas.sfu.ca/cs/research/groups/DB/sections/p
    ublication/kdd/kdd.html .

16
Questions? Thank you
Write a Comment
User Comments (0)
About PowerShow.com