Title: Data Mining
1Data Mining
- Database Systems
- Timothy Vu
2Mining
Mining is the extraction of valuable minerals or
other geological materials from the earth,
usually bauxite, coal, diamonds, iron, precious
metals, lead, limestone, nickel, phosphate, rock
salt, tin, and uranium, petroleum, natural gas,
and even water. Often something that is
valuable, rare, or useful.
3What is Data Mining
Data Mining, also known as Knowledge-Discovery in
Databases (KDD), is the process of automatically
searching large volumes of data for patterns. In
order to achieve this, data mining uses
computational techniques from statistics, machine
learning and pattern recognition. Machine
learning - a method for creating computer
programs by the analysis of data sets. Pattern
recognition - classify data (patterns) based on
either a priori knowledge or on statistical
information extracted from the patterns.
4Why Data Mining
- Data mining is a technique that helps individuals
or companies find useful information to make
better decisions from large amounts of data. - Reduce risks
- Find problems and issues
- Save money
- High confidence predictions
- Simplifying information
5Discussion Topics
1 ) Classification 2 )Regression 3)
Association 4) Clustering
6Classifiers
Decision-Tree Classifiers each node has an
associated class and each internal node has a
predicate. Bayesian Classifiers find the
distribution of attribute values for each class
in the training data ( the maximum probability
predicted ). Nuro Net Classifiers Use the
training data to train artificial nuro nets.
7Regression
Regression Deals with the prediction of a value
rather than a class. Linear Regression
Predict values using a polynomial by finding the
curve fitting, meaning finding coefficients that
give the best answer.
8Associations
Finding the association or relationship between
two or more items. Support measure of what
fractions of the pupulation satisifies both the
antecedent and the consequent of the rule.
MILK gt Screwdrivers Confidence how often the
consequent is true when the antecedent is true.
MILK gt Bread
9Clustering
Clustering is the classification of similar
objects into different groups, or more precisely,
the partitioning of a data set into subsets
(clusters), so that the data in each subset
(ideally) share some common trait - often
proximity according to some defined distance
measure.
10Applications of Data Mining
- 1. Predictions
- - Stock Market
- - Earth Quakes
- NBA games
- 2. Association
- - Store Inventory
- Fashion Trends
- 3. Descriptive Patterns
- - Disease Analysis
- - Image Recognition
- - Fraud Detection
11Gather Data
12Electrocardiogram
13Disease Analysis
14References
- Silberschatz, H.F. Korth, S. Sudershan Database
System Concepts, 5th ed., McGraw-Hill, 2006 - Runge , Marschall, Magnus Ohman , and Frank
Netter. Netter's Cardiology (Netter Clinical
Science). W.B. Saunders Company, 2004. - "Data mining". Wikipedia. 4/1/2006
lthttp//en.wikipedia.org/wiki/Data_Mininggt.