Title: Data Mining with Oracle using Classification and Clustering Algorithms
1Data Mining with Oracle using Classification and
Clustering Algorithms
- Proposed and Presented by
- Nhamo Mdzingwa
- Supervisor John Ebden
2Presentation Outline
- Problem Statement
- Objective
- Background
- Expected Results
- Possible Extensions
- Plan of action
- Timeline
- Literature Survey
- Questions
3Problem Statement
- The commercial world is fast reacting to the
growth potential in the DM area, as a wide
range of tools are being marketed as DM suites. - Examples of these are
- Oracle DM
- DB2s Intelligent Miner
- Informixs Data Mine
- SQL Data miner
- Ghost miner
- Clementine 9.0 (SPSS)
- SAS
- Gornish systems, etc
-
4Problem
- It is vital to know the algorithms a DM suite
uses and which algorithm to use on a particular
data set. - Secondly, how well each algorithm performs in
terms of accuracy, efficiency and effectiveness
when using a particular DM suite e.g. Oracle DM.
5Objective
- Investigate two types of algorithms available in
Oracle for data mining (ODM). - Apply the two algorithms to actual data.
- Analyse
- Evaluate
- results in terms of performance.
6What is Data Mining? (Background)
- Simply put, DM is knowledge discovery.
- DM is the process of automatic discovery of
hidden patterns and relationships within
enormous amounts of data. - It is a powerful new technology that allows
businesses to make proactive, knowledge-driven
decisions as it tries to predict the future. - Data (represents knowledge) normally stored in
databases and data warehouses ( typical size in
tera-bytes).
7Automatic discovery is implemented by the use of
algorithms provided by DM suites
- E.g. oracle offers
- Adaptive Bayes Network supporting decision
trees (classification) - Naive Bayes (classification)
- Model Seeker (classification)
- k-Means (clustering)
- O-Cluster (clustering)
- Predictive variance (attribute importance)
- Apriori (association rules)
8- Algorithms are grouped as either supervised or
unsupervised learning strategies.
9 The data mining process involves a series of
steps to define a business problem, gather and
prepare the data, build and evaluate mining
models, and apply the models and disseminate the
new information.
10Expected Results
- Aim at conclusively saying which algorithm will
be most effective and suitable for the process of
data mining on any dataset - since datasets
are different.
11Possible Extensions to the Project
- testing of the same algorithms with different
tools offered by other vendors. - e.g. testing with the DM suite in SQL and
checking if the results are similar. - If not, investigating why the results are
different, could be another extension.
12Plan of Action
- Carry out a literature search
- mainly to obtain background knowledge and
understanding of field. - Get to know Oracle DM Suite
- Do DM tutorials provided by oracle.
- The server Ora1 is the machine Ill be working
with. - It is already installed with JDeveloper oracle
10g database, oracle 9i DM.
13Timeline
Continuation from literature and tutorials done
Investigate Clustering Classification algorithms (theory) 2nd term- 15 to 30 April
Find suitable computerised case studies of the use of above algorithms with or without Oracle. 2nd term- End of May
Search databases for testing (possibilities AIDS data faculty data) 2nd term- End of May
Apply algorithms to data found then Critically Analyse assess results Second semester
Write up paper September vacation and 3rd term
Final project write up Due 7/11
14Literature Survey
- Richard J. Roiger and Michael W. Geatz, Data
mining a tutorial- based primer. Boston,
Massachusetts, Addison Wesley, 2003 - This book will provide the necessary background
and practical knowledge required for the project
research and also presents different
methodologies used in data mining that may be
useful.
15- David Hand, Heikki Mannila and Padhraic Smyth,
Principles of data mining. - Cambridge Massachusetts, MIT Press, 2001.
- Jesus Mena, Data mining your website. Digital
Press, 1999. - Jiawei Han and Micheline Kamber, Data mining
concepts and techniques - San Francisco, California, Morgan Kauffmann,
2001 - Robert P. Trueblood and John N. Lovett, Jnr. Data
Mining and Statistical Analysis Using SQL, USA,
Apress, - http//www.lc.leidenuniv.nl/awcourse/oracle/datami
ne.920/a95961/preface.htm - http//www.oracle.com/technology/products/oracle9i
/htdocs/o9idm_faq.html - http//fas.sfu.ca/cs/research/groups/DB/sections/p
ublication/kdd/kdd.html .
16Questions? Thank you