Database Management System Recent Advances

About This Presentation

Title:

Database Management System Recent Advances

Description:

Title: No Slide Title Author: Marilyn Turnamian Last modified by: Vicky Created Date: 11/15/1999 4:56:55 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:384

Avg rating:3.0/5.0

Slides: 146

Provided by: Maril134

Category:

more less

Transcript and Presenter's Notes

Title: Database Management System Recent Advances

1
Database Management SystemRecent Advances

By
Prof. Dr.O.P. Vyas
M.Tech.(CS), Ph.D. (
I.I.T. Kharagpur )
DAAD Fellow (
Germany )
AOTS Fellow ( Japan)
Professor Head ( Computer Science)
Pt. R.S. University Raipur (CG)
Visiting Prof. Rostock University Germany

2
Contents ADBMS

Concepts of Association Rule Mining
ARM Basics
Problems with Apriori
Apriori Vs. FP tree
ARM Variants
Classification Rule Mining
Classification techniques
Classifiers
Various classifiers
Classification Prediction
Classification accuracy
Mining Complex data types
Complex data types
Data mining Process Integration with existing
Technology

3
Time Line of Data Mining Development
Time Area Contribution
Late 1700s Stat Bayes theorem of probability
Early 1900s Stat Regression analysis
Early 1920s Stat Maximum likelihood estimate
Early 1940s AI Neural Networks
Early 1950s Nearest neighbor
Early 1950s Single Link
Late 1950s AI Perceptron
Late 1950s Stat Resembling,Bias reduction , jackknife estimator
Early 1960s AI ML started
Early 1960s DB Batch reports
Mid 1960s Decision trees
Mid 1960s Stat Linear models for classification
IR Similarity measures
Time Area Contribution
Mid 1960s IR Clustering
Stat Exploratory data analysis(EDA)
Late 1960s DB Relational data model
Early 1970s IR SMART IR systems
Mid 1970s AI Genetic algorithms
Late 1970s Stat Estimation with incomplete data (EM algorithm)
Late 1970s Stat K- means clustering map
Early 1980s AI Kohonen self organizing map
Mid 1980s AI Decision tree algorithm
Early 1990s DB Association rule algorithms web and search engines
1990s DB Data warehousing
1990s DB Online analytic processing(OLAP)
4
Data Mining Functionalities
5
Association Rules

Retail shops are often interested in associations
between different items that people buy.
Someone who buys bread is quite likely also to
buy milk
A person who bought the book Database System
Concepts is quite likely also to buy the book
Operating System Concepts.
Association information can be used in several
ways.
E.g. when a customer buys a particular book, an
online shop may suggest associated books.
Association rules
bread ? milk
DB-Concepts, OS-Concepts ? Networks
Left hand side antecedent, right hand side
consequent
An association rule is a pattern that states when
Antecedent occurs, Consequent occurs with certain
probability.

6
Association Rules (Cont.)

Rules have an associated support, as well as an
associated confidence.
Support is a measure of what fraction of the
population satisfies both the antecedent and the
consequent of the rule.
E.g. suppose only 0.001 percent of all purchases
include milk and screwdrivers. The support for
the rule is milk ? screwdrivers is low.
We usually want rules with a reasonably high
support
Rules with low support are usually not very
useful
Confidence is a measure of how often the
consequent is true when the antecedent is true.
E.g. the rule bread ? milk has a confidence of
80 percent if 80 percent of the purchases that
include bread also include milk.
Usually want rules with reasonably large
confidence.
A rule with a low confidence is not meaningful.
Note that the confidence of bread ? milk may be
very different from the confidence of milk ?
bread, although both have the same supports.

7
A.R.M model data

A.R.M. was initially applied to Market Basket
Analysis on transaction data of Supermarket
sales.
I i1, i2, , im a set of items.
Transaction t
t a set of items, and t ? I.
Transaction Database T a set of transactions T
t1, t2, , tn.

8
Transaction data supermarket data

Market basket transactions
t1 bread, cheese, milk
t2 apple, eggs, salt, yogurt
tn biscuit, eggs, milk
Concepts
An item an item/article in a basket
I the set of all items sold in the store
A transaction items purchased in a basket it
may have TID (transaction ID)
A transactional dataset A set of transactions

9
Transaction data a set of documents

A text document data set. Each document is
treated as a bag of keywords
doc1 Student, Teach, School
doc2 Student, School
doc3 Teach, School, City, Game
doc4 Baseball, Basketball
doc5 Basketball, Player, Spectator
doc6 Baseball, Coach, Game, Team
doc7 Basketball, Team, City, Game

10
The model rules

A transaction t contains X, a set of items
(itemset) in I, if X ? t.
An association rule is an implication of the
form
X ? Y, where X, Y ? I, and X ?Y ?
An itemset is a set of items.
E.g., X milk, bread, cereal is an itemset.
A k-itemset is an itemset with k items.
E.g., milk, bread, cereal is a 3-itemset

11
Rule strength measures

Support The rule holds with support sup in T
(the transaction data set) if sup of
transactions contain X ? Y.
sup Pr(X ? Y).
Confidence The rule holds in T with confidence
conf if conf of tranactions that contain X also
contain Y.
conf Pr(Y X)
An association rule is a pattern that states when
X occurs, Y occurs with certain probability.

12
Mining Association RulesAn Example
Let us take the Min. support 50 Min. confidence
50

For rule A ? C
support support(A ?C) 50
confidence support(A ?C)/support(A) 66.6

A ? C (50, 66.6)
C ? A (50, 100)
The Apriori principle
Any subset of a frequent itemset must be frequent

13
The Apriori Algorithm

Join Step Ck is generated by joining Lk-1with
itself
Prune Step Any (k-1)-itemset that is not
frequent cannot be a subset of a frequent
k-itemset
Pseudo-code
Ck Candidate itemset of size k
Lk frequent itemset of size k
L1 frequent items
for (k 1 Lk !? k) do begin
Ck1 candidates generated from Lk
for each transaction t in database do
increment the count of all candidates in
Ck1 that are
contained in t
Lk1 candidates in Ck1 with min_support
end
return ?k Lk

14
The Apriori Algorithm Example
Database D
L1
C1
Scan D
C2
C2
L2
Scan D
C3
L3
Scan D
15
Generating rules from frequent itemsets

Frequent itemsets ? association rules
One more step is needed to generate association
rules
For each frequent itemset X,
For each proper nonempty subset A of X,
Let B X - A
A ? B is an association rule if
Confidence(A ? B) minconf,
support(A ? B) support(A?B) support(X)
confidence(A ? B) support(A ? B) / support(A)

16
Generating association rules..

Once the frequent itemsets from transactions in a
database D have been found, it is straightforward
to generate strong association rules from them (
where strong association rules satisfy both
minimum support and minimum confidence).
To recap, in order to obtain A ? B, we need to
have support(A ? B) and support(A)
All the required information for confidence
computation has already been recorded in itemset
generation. No need to see the data T any more.
This step is not as time-consuming as frequent
itemsets generation.

17
Goal and key features

Goal Find all rules that satisfy the
user-specified minimum support (minsup) and
minimum confidence (minconf).
Key Features
Completeness find all rules.
No target item(s) on the right-hand-side
Mining with data on hard disk (not in memory)

18
Mining Association Rules in Large Databases

Association rule mining Association rule can be
classified into categories based on different
criteria such as
1. Based on types of Values handled in the rule,
associations can be classified into Boolean Vs.
quantitative. A Boolean association shows
relationships between discrete (categorical)
objects. A quantitative association is a
multidimensional association. Example of
quantitative association rule, where X is is a
variable representing a customer
Age (x,30.39) income (X,42k48k) ? buys (
X, high resolution TV)
Note that quantitative attributes, age and income
have been discretized
2. Based on dimensions of data involved in the
rule.
Ex. Purchase (X, computer )? Purchase (X,
financial software) is a single dimensional
association rule, and if date/time of purchase
is added, it becomes multidimensional.
3. Multilevel Association Rule Mining
4. Multi Dimensional A.R.M.

19
Mining Multiple-Level Association Rules

Items often form hierarchies
Flexible support settings
Items at the lower level are expected to have
lower support
Exploration of shared multi-level mining (Agrawal
Srikant_at_VLB95, Han Fu_at_VLDB95)

20
Multi-level Association Redundancy Filtering

Some rules may be redundant due to ancestor
relationships between items.
Example
milk ? wheat bread support 8, confidence
70
2 milk ? wheat bread support 2, confidence
72
We say the first rule is an ancestor of the
second rule.
A rule is redundant if its support is close to
the expected value, based on the rules
ancestor.

21
Mining Multi-Dimensional Association

Single-dimensional rules
buys(X, milk) ? buys(X, bread)
Multi-dimensional rules ? 2 dimensions or
predicates
Inter-dimension assoc. rules (no repeated
predicates)
age(X,19-25) ? occupation(X,student) ?
buys(X, coke)
hybrid-dimension assoc. rules (repeated
predicates)
age(X,19-25) ? buys(X, popcorn) ? buys(X,
coke)
Categorical Attributes finite number of possible
values, no ordering among valuesdata cube
approach
Quantitative Attributes numeric, implicit
ordering among valuesdiscretization, clustering,
and gradient approaches

22
Mining Association Rules in Large Databases

Mining single-dimensional Boolean association
rules from transactional databases
The Apriori Algorithm- an influential algorithm
for mining frequent itemsets for boolean
association rules, it uses prior knowledge of
frequent itemset properties.
Apriori employs an iterative approach known as a
level-wise search, where k-itemsets are used to
explore (k1) itemsets.
First the set of frequent 1-itemsets is found.
This set is denoted as L1. L1 is used to find
the set of frequent 2-itemset, L2. And so on
until no more frequent k-itemsets can be found.
The finding of each Lk requires one full scan of
the database.

23
Many ARM algorithms

There are a large number of them!!
They use different strategies and data
structures.
Their resulting sets of rules are all the same.
Given a transaction data set T, and a minimum
support and a minimum confident, the set of
association rules existing in T is uniquely
determined.
Any algorithm should find the same set of rules
although their computational efficiencies and
memory requirements may be different. We study
only one the Apriori Algorithm

24
On Apriori Algorithm

Seems to be very expensive
Level-wise search
K the size of the largest itemset
It makes at most K passes over data
In practice, K is bounded (10).
The algorithm is very fast. Under some
conditions, all rules can be found in linear
time.
Scale up to large data sets
Clearly the space of all association rules is
exponential, O(2m), where m is the number of
items in I.
The mining exploits sparseness of data, and high
minimum support and high minimum confidence
values.
Still, it always produces a huge number of rules,
thousands, tens of thousands, millions, ...

25
UCI KDD Archive http//kdd.ics.uci.edu

This is an online repository of large data sets
which encompasses a wide variety of data types,
analysis tasks, and application areas.
The primary role of this repository is to enable
researchers in knowledge discovery and data
mining to scale existing and future data analysis
algorithms to very large and complex data sets.
. The archive is intended to serve as a permanent
repository of publicly-accessible data sets for
research in KDD and data mining. It complements
the original UCI Machine Learning Archive , which
typically focuses on smaller classification-orient
ed data sets.

26
ARM Implementations

Many implementations of Apriori Algorithm are
available
http//www.cs.bme.hu/bodon/en/apriori/
(APRIORI
implementation of Ferenc Bodon)
http//www.csc.liv.ac.uk/frans/KDD/Software/Aprio
ri-T_GUI/aprioriT_GUI.html
Apriori-T (Apriori Total) is an Association
Rule Mining (ARM) algorithm, developed by the
LUCS-KDD research team The code obtainable from
this page is a GUI version that inludes (for
comparison purpopses) implementations of Brin's
DIC algorithm (Brin et al. 1997) and Toivonon's
negative boarder ARM approach (Toivonen 1996)
http//www.csc.liv.ac.uk/frans/KDD/Software/FPgro
wth/fpGrowth.html
( Implementation of FP growth method )
DBMiner is data mining system which runs on top
of Microsoft SQL server 7.0 Plato system.

27
A.R.M. ImplementationsExample

In DBMiner, three kinds of associations could be
possibly mined
Inter-dimensional association. Associations among
or across two or more dimensions.
Customer-Country("Canada") gt Product-SubCategory(
"Coffee") i.e. Canadian customers are likely to
buy coffee.
2. Intra-dimensional association. Associations
present within one dimension grouped by another
one or several dimensions. For example, if you
want to find out which products customers in
Canada are likely to purchase together
Within Customer-Country("Canada")
Product-ProductName("CarryBags") gt
Product-ProductName("Tents")i.e. Customers in
Canada, who buy carry-bags, are also likely to
buy tents.
3. Hybrid association. Associations combining
elements of both inter- and intra-dimensional
association mining. For example,
Within Customer-Country("Canada")
Product("Carry Bags") gt Product("Tents"),
Time("Q3")i.e. Customers in Canada, who buy
carry-bags, also tend to buy tents and do so most
often in the 3rd quarter of the year (Jul, Aug,
Sep).

28
Visualization of Association Rules Plane Graph
29
Problems with the association mining

Single minsup It assumes that all items in the
data are of the same nature and/or have similar
frequencies.
Not true In many applications, some items appear
very frequently in the data, while others rarely
appear.
E.g., in a supermarket, people buy food
processor and cooking pan much less frequently
than they buy bread and milk.

30
Rare Item Problem

If the frequencies of items vary a great deal, we
will encounter two problems
If minsup is set too high, those rules that
involve rare items will not be found.
To find rules that involve both frequent and rare
items, minsup has to be set very low. This may
cause combinatorial explosion because those
frequent items will be associated with one
another in all possible ways.

31
Is Apriori Fast Enough? Performance Bottlenecks

The core of the Apriori algorithm
Use frequent (k 1)-itemsets to generate
candidate frequent k-itemsets
Use database scan and pattern matching to collect
counts for the candidate itemsets
The bottleneck of Apriori candidate generation
Huge candidate sets
104 frequent 1-itemset will generate 107
candidate 2-itemsets
To discover a frequent pattern of size 100, e.g.,
a1, a2, , a100, one needs to generate 2100 ?
1030 candidates.
Multiple scans of database
Needs (n 1 ) scans, n is the length of the
longest pattern

32
Mining Frequent Patterns Without Candidate
Generation

FP-Tree(Frequent Pattern Tree) Algorithm.
To break the two bottlenecks of Apriori series
algorithms, some works of association rule mining
using tree struc-ture have been designed. FP-Tree
Han et al. 2000, frequent pattern mining, is
another milestone in the development of
association rule mining, which breaks thetwo
bottlenecks of the Apriori.
The frequent itemsets are generated with only two
passes over the database and without any
candidate generation process. FP-Tree was
introduced by Han et al in Han et al. 2000.
By avoiding the candidate generation process and
less passes over the database, FP-Tree is an
order of magnitude faster than the Apriori
algorithm. The frequent patterns generation
process includes two sub processes constructing
the FT-Tree, and generating frequent patterns
from the FP tree.

33
FP Tree

Compress a large database into a compact,
Frequent-Pattern tree (FP-tree) structure
highly condensed, but complete for frequent
pattern mining
avoid costly database scans
Develop an efficient, FP-tree-based frequent
pattern mining method
A divide-and-conquer methodology decompose
mining tasks into smaller ones
Avoid candidate generation sub-database test
only!
Some Researchers have identified that when
dataset is vary sparse then FP Tree has shown
bottlenecks and Apriori has comparatively given
better performance !!

34
Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o f, b 400 b, c, k,
s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 0.5

Steps
Scan DB once, find frequent 1-itemset (single
item pattern)
Order frequent items in frequency descending
order
Scan DB again, construct FP-tree

35
Benefits of the FP-tree Structure

Completeness
never breaks a long pattern of any transaction
preserves complete information for frequent
pattern mining
Compactness
reduce irrelevant informationinfrequent items
are gone
frequency descending ordering more frequent
items are more likely to be shared
never be larger than the original database (if
not count node-links and counts)
Example For Connect-4 DB, compression ratio
could be over 100

36
Mining Frequent Patterns Using FP-tree

General idea (divide-and-conquer)
Recursively grow frequent pattern path using the
FP-tree
Method
For each item, construct its conditional
pattern-base, and then its conditional FP-tree
Repeat the process on each newly created
conditional FP-tree
Until the resulting FP-tree is empty, or it
contains only one path (single path will generate
all the combinations of its sub-paths, each of
which is a frequent pattern)

37
Market Basket Analysis Purpose

The Supermarket revolution when first sparked off
in the 1920s, one could not even dream of
retailing as it exists today. By the 1950s it had
won acclaim and acceptance almost globally. This
is one retailing sector that is spreading very
fast in India. But still majority of retailing
sector including this one is not properly
managed.
Retailing management has been in focus for
marketing strategists since long as organized
retailing is assuming significant attention.
M.B.A is one such effort.
In a supermarket retailing MBA has endeavored to
study and analyze the combination of various
items accumulated in a Shopping Basket and was
intended to establish Associationship between the
various items bought by the customer.
Market basket analysis is a generic term for
methodologies that study the composition of a
basket of products (i.e. a shopping basket)
purchased by a household during a single shopping
trip.
The idea is that market baskets reflect
interdependencies between products or purchases
made in different product categories, and that
these interdependencies can be useful to support
retail marketing decisions.

38
MBA

Our data mining approach to super market business
data will record all the supermarket transactions
in a tabular form and appropriate algorithm will
process the transaction data to provide
significant Associationships of various items.
From a marketing perspective, the research is
motivated by the fact that some recent trends in
retailing pose important challenges to retailers
in order to stay competitive. In fact, on the
level of the retailer, a number of trends can be
identified, including concentration,
internationalization, decreasing profit margins
and an increase in discounting.
Recently, a number of advances in data mining
(association rules) and statistics offer new
opportunities to analyze such data.

39
Data Mining Functionalities
40
Data Mining
Clustering
Association Mining
Classification
Classification mining analyzes a set of training
data (i.e. a set of objects whose class labels
are known) and constructs a model for each class
based on the features in the data. A set of
classification rules are generated by the
classification process, and these can be used to
classify future data, as well as develop a better
understanding of each class in the database.
Techniques
Associative Classification
Application domain
41
Data Mining
Classification
Clustering
Association Mining

Associative Classification
(Combines the Association Classification)
CBA, CMAR, CPAR MCLP
Modifying the algorithms

Classification Techniques
Techniques
Application domain
42
Supervised vs. Unsupervised Learning

Learning training data are analyzed by a
classification algorithm.
Supervised learning (classification) Learning of
the model is supervised in that it is told to
which class each training sample belongs
Supervision The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc.
with the aim of establishing the existence of
classes or clusters in the data

43
Classification vs. Prediction

Classification
classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data.
predicts categorical class labels.
Prediction - Prediction can be viewed as the
construction and use of a model to assess the
class of an unlabeled sample, or to assess the
value or value ranges of an attribute that a
given sample is likely to have.
CLASSIFICATION REGRESSION are two prediction
methods. ( discrete Vs. Continuous)
models continuous-valued functions, i.e.,
predicts unknown or missing values.
Typical Applications-credit approval, target
marketing, medical diagnosis, treatment
effectiveness analysis.

44
Data Classification A Two-Step Process

Model construction describing a set of
predetermined classes
Learning
Each tuple / sample is assumed to belong to a
predefined class, as determined by the class
label attribute.
The set of tuples used for model construction
training set (given data).
The model is represented as classification rules,
decision trees, or mathematical formulae.
Model usage for classifying future or unknown
objects
Classification
Estimate accuracy of the model
The known label of test sample is compared with
the classified result from the model
Accuracy rate is the percentage of test set
samples that are correctly classified by the
model
Test set is independent of training set,
otherwise over-fitting will occur

45
Illustrating Classification Task
46
Classification Process (1) Model Construction (
Learning)
Classification Algorithms
Classification rules IF rank professor OR
years gt 6 THEN tenured yes
47
Classification Process (2) Use the Model in
Prediction ( classification)
(Jeff, Professor, 4)
Tenured?
48
Examples of Classification Task

Predicting tumor cells as benign or malignant
Classifying credit card transactions as
legitimate or fraudulent
Classifying secondary structures of protein as
alpha-helix, beta-sheet, or random coil
Categorizing news stories as finance, weather,
entertainment, sports, etc

49
Data Mining
Clustering
Association Mining
Classification
Classification mining analyzes a set of training
data (i.e. a set of objects whose class labels
are known) and constructs a model for each class
based on the features in the data. A set of
classification rules are generated by the
classification process, and these can be used to
classify future data, as well as develop a better
understanding of each class in the database.
Techniques
Associative Classification
Application domain
50
Data Mining
Classification
Clustering
Association Mining

Associative Classification
(Combines the Association Classification)
CBA, CMAR, CPAR MCLP
Modifying the algorithms

Classification Techniques
Techniques
Application domain
51
Supervised vs. Unsupervised Learning

Learning training data are analyzed by a
classification algorithm.
Supervised learning (classification) Learning of
the model is supervised in that it is told to
which class each training sample belongs
Supervision The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc.
with the aim of establishing the existence of
classes or clusters in the data

52
Classification vs. Prediction

Classification
classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data.
predicts categorical class labels.
Prediction - Prediction can be viewed as the
construction and use of a model to assess the
class of an unlabeled sample, or to assess the
value or value ranges of an attribute that a
given sample is likely to have.
CLASSIFICATION REGRESSION are two prediction
methods. ( discrete Vs. Continuous)
models continuous-valued functions, i.e.,
predicts unknown or missing values.
Typical Applications-credit approval, target
marketing, medical diagnosis, treatment
effectiveness analysis.

53
Data Classification A Two-Step Process

Model construction describing a set of
predetermined classes
Learning
Each tuple / sample is assumed to belong to a
predefined class, as determined by the class
label attribute.
The set of tuples used for model construction
training set (given data).
The model is represented as classification rules,
decision trees, or mathematical formulae.
Model usage for classifying future or unknown
objects
Classification
Estimate accuracy of the model
The known label of test sample is compared with
the classified result from the model
Accuracy rate is the percentage of test set
samples that are correctly classified by the
model
Test set is independent of training set,
otherwise over-fitting will occur

54
Illustrating Classification Task
55
Classification Process (1) Model Construction (
Learning)
Classification Algorithms
Classification rules IF rank professor OR
years gt 6 THEN tenured yes
56
Classification Process (2) Use the Model in
Prediction ( classification)
(Jeff, Professor, 4)
Tenured?
57
Examples of Classification Task

Predicting tumor cells as benign or malignant
Classifying credit card transactions as
legitimate or fraudulent
Classifying secondary structures of protein as
alpha-helix, beta-sheet, or random coil
Categorizing news stories as finance, weather,
entertainment, sports, etc

58
Classification and Prediction

What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification by backpropagation
Classification based on concepts from association
rule mining
Other Classification Methods
Prediction
Classification accuracy
Summary

59
Decision Tree Classifiers Survey paper
60
Bayesian Classification

Bayesian classifiers are statistical classifiers
which predict class membership probabilities such
as the probability that a given sample belongs to
a particular class.
Bayesian classification is based on the Bayes
theorem and it is observed that a simple Bayesian
Classifier known as the naïve Bayesian classifier
to be comparable in performance with decision
tree and Neural network classifiers.
Naïve Bayesian classifier assume that the effect
of an attribute value on a given class is
independent of the values of the other
attributes. (conditional independence).
Bayesian belief networks are graphical models,
which unlike naïve bayesian classifiers allow the
representation of dependencies among subsets of
attributes. Can be Used for classification.

61
Bayesian Classification Why?

Probabilistic learning Calculate explicit
probabilities for hypothesis, among the most
practical approaches to certain types of learning
problems
Incremental Each training example can
incrementally increase/decrease the probability
that a hypothesis is correct. Prior knowledge
can be combined with observed data.
Probabilistic prediction Predict multiple
hypotheses, weighted by their probabilities
Standard Even when Bayesian methods are
computationally intractable, they can provide a
standard of optimal decision making against which
other methods can be measured

62
Bayesian Theorem

Given training data D, posteriori probability of
a hypothesis h, P(hD) follows the Bayes theorem
MAP (maximum posteriori) hypothesis
Practical difficulty require initial knowledge
of many probabilities, significant computational
cost

63
Naïve Bayes Classifier (I)

A simplified assumption attributes are
conditionally independent
Greatly reduces the computation cost, only count
the class distribution.

64
Naive Bayesian Classifier (II)

Given a training set, we can compute the
probabilities

65
Naïve Bayesian Classification

Naïve assumption attribute independence
P(x1,,xkC) P(x1C)P(xkC)
If i-th attribute is categoricalP(xiC) is
estimated as the relative freq of samples having
value xi as i-th attribute in class C
If i-th attribute is continuousP(xiC) is
estimated thru a Gaussian density function
Computationally easy in both cases

66
The independence hypothesis

makes computation possible
yields optimal classifiers when satisfied
but is seldom satisfied in practice, as
attributes (variables) are often correlated.
Attempts to overcome this limitation
Bayesian networks, that combine Bayesian
reasoning with causal relationships between
attributes
Decision trees, that reason on one attribute at
the time, considering most important attributes
first

67
Bayesian Belief Networks (I)
Family History
Smoker
(FH, S)
(FH, S)
(FH, S)
(FH, S)
LC
0.7
0.8
0.5
0.1
LungCancer
Emphysema
LC
0.3
0.2
0.5
0.9
The conditional probability table for the
variable LungCancer
PositiveXRay
Dyspnea
Bayesian Belief Networks
68
Bayesian Belief Networks (II)

Bayesian belief network allows a subset of the
variables conditionally independent
A graphical model of causal relationships
Several cases of learning Bayesian belief
networks
Given both network structure and all the
variables easy
Given network structure but only some variables
When the network structure is not known in advance

69
Classification and Prediction

What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification by backpropagation
Classification based on concepts from association
rule mining
Other Classification Methods
Prediction
Classification accuracy
Summary

70
Classification by Backpropopagation

Backpropagation has been considered as effective
mechanism in field of classification. The
backpropagation algorithm was presented by
Rumelhart, Hinton, and Williams RHW86. One of
the most popularly used backpropagation technique
is a neural network learning algorithm.
In Freuds theory of psychodynamics, the human
brain (10 11) was described as a neural
network, and recent investigations have
corroborated this view.
This analogy therefore offers an interesting
model for the creation of more complex learning
machines, and has led the creation of ANN.
Neural network with their remarkable ability to
derive meaning from complicated or imprecise
data, can be used to extract patterns and detect
trends that are too complex to be noticed by
either humans or other computer techniques.
A trained neural network can be thought of as an
"expert" in the category of information it has
been given to analyze. This expert can then be
used to provide projections given new situations
of interest and answer "what if" questions.

71
Classification by Backpropopagation

Backpropagation has been considered as effective
mechanism in field of classification. The
backpropagation algorithm was presented by
Rumelhart, Hinton, and Williams RHW86. One of
the most popularly used backpropagation technique
is a neural network learning algorithm.
In Freuds theory of psychodynamics, the human
brain (10 11) was described as a neural
network, and recent investigations have
corroborated this view.
This analogy therefore offers an interesting
model for the creation of more complex learning
machines, and has led the creation of ANN.
Neural network with their remarkable ability to
derive meaning from complicated or imprecise
data, can be used to extract patterns and detect
trends that are too complex to be noticed by
either humans or other computer techniques.
A trained neural network can be thought of as an
"expert" in the category of information it has
been given to analyze. This expert can then be
used to provide projections given new situations
of interest and answer "what if" questions.

72
Neural Networks

An Artificial Neural Network (ANN) is an
information processing paradigm that is inspired
by the way biological nervous systems, such as
the brain, process information. The key element
of this paradigm is the novel structure of the
information processing system.
It is composed of a large number of highly
interconnected processing elements (neurones)
working in unison to solve specific problems.
ANNs, like people, learn by example.
An ANN is configured for a specific application,
such as pattern recognition or data
classification, through a learning process.
Learning in biological systems involves
adjustments to the synaptic connections that
exist between the neurones. This is true of ANNs
as well.

73
ANN Advantages

Adaptive learning An ability to learn how to do
tasks based on the data given for training or
initial experience.
Self-Organisation An ANN can create its own
organization or representation of the information
it receives during learning time.
Real Time Operation ANN computations may be
carried out in parallel, and special hardware
devices are being designed and manufactured which
take advantage of this capability.
Fault Tolerance via Redundant Information Coding
Partial destruction of a network leads to the
corresponding degradation of performance.
However, some network capabilities may be
retained even with major network damage.

74
ANN Vs Conventional Computing approach

Neural networks take a different approach to
problem solving than that of conventional
computers. Conventional computers use an
algorithmic approach i.e. the computer follows a
set of instructions in order to solve a problem.
Unless the specific steps that the computer needs
to follow are known the computer cannot solve the
problem. That restricts the problem solving
capability of conventional computers to problems
that we already understand and know how to solve.
But computers would be so much more useful if
they could do things that we don't exactly know
how to do.
Neural networks process information in a similar
way the human brain does. The network is composed
of a large number of highly interconnected
processing elements ( neurones) working in
parallel to solve a specific problem. Neural
networks learn by example. They cannot be
programmed to perform a specific task.
The examples must be selected carefully otherwise
useful time is wasted or even worse, the network
might be functioning incorrectly. The
disadvantage is that because the network finds
out how to solve the problem by itself, its
operation can be unpredictable.

75
ANN Vs. Conventional

On the other hand, conventional computers use a
cognitive approach to problem solving the way
the problem is to solved must be known and stated
in small unambiguous instructions. These
instructions are then converted to a high level
language program and then into machine code that
the computer can understand. These machines are
totally predictable if anything goes wrong is
due to a software or hardware fault.
Neural networks and conventional algorithmic
computers are not in competition but complement
each other. There are tasks, more suited to an
algorithmic approach like arithmetic operations
and tasks that are more suited to neural
networks. Even more, a large number of tasks,
require systems that use a combination of the two
approaches (normally a conventional computer is
used to supervise the neural network) in order to
perform at maximum efficiency.
Neural networks do not perform miracles. But if
used sensibly they can produce some amazing
results.

76
ANN An engineering approach

A simple neuron
An artificial neuron is a device with many
inputs and one output. The neuron has two modes
of operation the training mode and the using
mode.
In the training mode, the neuron can be
trained to fire (or not), for particular input
patterns. In the using mode, when a taught input
pattern is detected at the input, its associated
output becomes the current output. If the input
pattern does not belong in the taught list of
input patterns, the firing rule is used to
determine whether to fire or not.

77
A Neuron

The n-dimensional input vector x is mapped into
variable y by means of the scalar product and a
nonlinear function mapping

78
Network layers

The commonest type of artificial neural network
consists of three groups, or layers, of units a
layer of "input" units is connected to a layer of
"hidden" units, which is connected to a layer of
"output" units.
The activity of the input units represents the
raw information that is fed into the network.
The activity of each hidden unit is determined by
the activities of the input units and the weights
on the connections between the input and the
hidden units.
The behavior of the output units depends on the
activity of the hidden units and the weights
between the hidden and output units.
This simple type of network is interesting
because the hidden units are free to construct
their own representations of the input. The
weights between the input and hidden units
determine when each hidden unit is active, and so
by modifying these weights, a hidden unit can
choose what it represents.

79
Architecture of Neural Networks
Feed-forward networks Feed-forward ANNs (figure
below) allow signals to travel one way only from
input to output. There is no feedback (loops)
i.e. the output of any layer does not affect that
same layer. Feed-forward ANNs tend to be straight
forward networks that associate inputs with
outputs. They are extensively used in pattern
recognition. This type of organization is also
referred to as bottom-up or top-down.
80
ANN Architecture
Feedback networks Feedback networks (figure
below) can have signals traveling in both
directions by introducing loops in the network.
Feedback networks are very powerful and can get
extremely complicated. Feedback networks are
dynamic their 'state' is changing continuously
until they reach an equilibrium point. They
remain at the equilibrium point until the input
changes and a new equilibrium needs to be found.
Feedback architectures are also referred to as
interactive or recurrent, although the latter
term is often used to denote feedback connections
in single-layer organizations.

81
ANN

There are different architectures for Neural
networks, and they each utilize different wiring
and learning strategies. ( Backpropagation algo.
In 1980s)
Advantages
prediction accuracy is generally high
robust, works when training examples contain
errors
output may be discrete, real-valued, or a vector
of several discrete or real-valued attributes
fast evaluation of the learned target function
Criticism
long training time
difficult to understand the learned function
(weights)
not easy to incorporate domain knowledge

82
Network Training

The ultimate objective of training
obtain a set of weights that makes almost all the
tuples in the training data classified correctly
Steps
Initialize weights with random values
Feed the input tuples into the network one by one
For each unit
Compute the net input to the unit as a linear
combination of all the inputs to the unit
Compute the output value using the activation
function
Compute the error
Update the weights and the bias

83
Applications of ANN

Classification A neural network can discover
the distinguishing features needed to perform a
classification task. It can take an object and
accordingly assign the specific class label to
it.ANN have been used in many classification
tasks including
Recognition of printed or handwritten characters.
Classification of SONAR RADAR signals.
Speech recognition A very significant area of
interest involves three modules namely front
end-which samples the speech signals and extracts
the data, the word processor which is used for
finding the probability of words in the
vocabulary that match the features of spoken
words and the sentence processor which determines
if the recognized word makes sense in the
sentence.

84
Multi-Layer Perceptron
Output vector
Output nodes
Hidden nodes
wij
Input nodes
Input vector xi
85
Classification and Prediction

What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification by backpropagation
Classification based on concepts from association
rule mining
Other Classification Methods
Prediction
Classification accuracy
Summary

86
What Is Prediction?

Prediction is similar to classification
First, construct a model
Second, use model to predict unknown value
Major method for prediction is regression
Linear and multiple regression
Non-linear regression
Prediction is different from classification
Classification refers to predict categorical
class label
Prediction models continuous-valued functions

87
Predictive Modeling in Databases

Predictive modeling Predict data values or
construct generalized linear models based on
the database data.
One can only predict value ranges or category
distributions
Method outline
Minimal generalization
Attribute relevance analysis
Generalized linear model construction
Prediction
Determine the major factors which influence the
prediction
Data relevance analysis uncertainty measurement,
entropy analysis, expert judgement, etc.
Multi-level prediction drill-down and roll-up
analysis.
www.sas.com www.spss.com
www.mathsoft.com

88
Association-Based Classification

Can any ideas from association rule mining be
applied to classification?
Several methods for association-based
classification
ARCS( Association rule Clustering System)
Quantitative association mining and clustering of
association rules (Lent et al97) (pg. 310,254)
It beats C4.5 in (mainly) scalability and also
accuracy
Associative classification (Liu et al98)
It mines high support and high confidence rules
in the form of cond_set gt y, where y is a
class label
CAEP (Classification by aggregating emerging
patterns) (Dong et al99)
Emerging patterns (EPs) the itemsets whose
support increases significantly from one class to
another
Mine Eps based on minimum support and growth rate

89
Assignment 1

Suppose there are two classification rules, one
that says people with salaries between 10,000
and 20,000 have a credit rating of good, and
another that says that people with salaries
between 20,000 and 30,000 have a credit rating
of good. Under what conditions can the rules be
replaced without any loss of information, by a
single rule that says that people with salaries
between 10,000 and 30,000 have a credit rating
of good.

No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
90
Assignment 1 (Solutions)

Suppose there are two classification rules, one
that says people with salaries between 10,000
and 20,000 have a credit rating of good, and
another that says that people with salaries
between 20,000 and 30,000 have a credit rating
of good. Under what conditions can the rules be
replaced without any loss of information, by a
single rule that says that people with salaries
between 10,000 and 30,000 have a credit rating
of good.
Solution- Consider the following pair of rules
and their confidence levels
The new rule has to be assigned a
confidence-level which is between the
confidence-levels for rules 1 and 2. Replacing
the original rules by the new rule will result in
a loss of confidence-level information for
classifying persons, since we cannot distinguish
the confidence levels of people earning between
10000 and 20000 from those of people earning
between 20000 and 30000. Therefore we can combine
the two rules without loss of information only if
their confidences are the same.

No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
91
Assignment 2

2. Suppose half of all the transactions in a
clothes shop purchase jeans, and one third of all
transactions in the shop purchase
T-shirts.Suppose also that half of the
transactions that purchase jeans also purchase
T-shirts. Write down all the non-trivial
association rules you can deduce from the above
information, giving support and confidence of
each rule.

92
Assignment 2 (Solutions)

2. Suppose half of all the transactions in a
clothes shop purchase jeans, and one third of all
transactions in the shop purchase
T-shirts.Suppose also that half of the
transactions that purchase jeans also purchase
T-shirts. Write down all the non-trivial
association rules you can deduce from the above
information, giving support and confidence of
each rule.
Solution The rules are as follows, the last rule
can be deduced from the previous ones.

Rule Support Conf.
For all transactions T, true gt buys (T, Jeans) 50 50
For all transactions T, true gt buys ( T, t-shirts) 33 33
For all transactions T, buys ( T, Jeans) gt buys (T, t-shirts) 25 50
For all transactions T, buys( T, t-shirts) gt buys (T, jeans) 25 75
93
Assignment 1

Suppose there are two classification rules, one
that says people with salaries between 10,000
and 20,000 have a credit rating of good, and
another that says that people with salaries
between 20,000 and 30,000 have a credit rating
of good. Under what conditions can the rules be
replaced without any loss of information, by a
single rule that says that people with salaries
between 10,000 and 30,000 have a credit rating
of good.

No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
94
Assignment 1 (Solutions)

Suppose there are two classification rules, one
that says people with salaries between 10,000
and 20,000 have a credit rating of good, and
another that says that people with salaries
between 20,000 and 30,000 have a credit rating
of good. Under what conditions can the rules be
replaced without any loss of information, by a
single rule that says that people with salaries
between 10,000 and 30,000 have a credit rating
of good.
Solution- Consider the following pair of rules
and their confidence levels
The new rule has to be assigned a
confidence-level which is between the
confidence-levels for rules 1 and 2. Replacing
the original rules by the new rule will result in
a loss of confidence-level information for
classifying persons, since we cannot distinguish
the confidence levels of people earning between
10000 and 20000 from those of people earning
between 20000 and 30000. Therefore we can combine
the two rules without loss of information only if
their confidences are the same.

No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
95
Assignment 2

2. Suppose half of all the transactions in a
clothes shop purchase jeans, and one third of all
transactions in the shop purchase
T-shirts.Suppose also that half of the
transactions that purchase jeans also purchase
T-shirts. Write down all the non-trivial
association rules you can deduce from the above
information, giving support and confidence of
each rule.

96
Assignment 2 (Solutions)

2. Suppose half of all the transactions in a
clothes shop purchase jeans, and one third of all
transactions in the shop purchase
T-shirts.Suppose also that half of the
transactions that purchase jeans also purchase
T-shirts. Write down all the non-trivial
association rules you can deduce from the above
information, giving support and confidence of
each rule.
Solution The rules are as follows, the last rule
can be deduced from the previous ones.

What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification by backpropagation
Classification based on concepts from association
rule mining
Other Classification Methods
Prediction
Classification accuracy
Summary

98
Other Classification Methods

k-nearest neighbor classifier
case-based reasoning
Genetic algorithm
Rough set approach
Fuzzy set approaches

99
Instance-Based Methods

Instance-based learning Less commonly used
commercially
Store (all) training examples and delay the
processing (lazy evaluation) until a new
instance must be classified.
Typical approaches
k-nearest neighbor approach
Instances represented as points in a Euclidean
space.
Locally weighted regression
Constructs local approximation
Case-based reasoning
Uses symbolic representations and knowledge-based
inference

100
The k-Nearest Neighbor Algorithm

All instances correspond to points in the n-D
space.
The nearest neighbor are defined in terms of
Euclidean distance.
The target function could be discrete- or real-
valued.
For discrete-valued, the k-NN returns the most
common value among the k training examples
nearest to xq.
Vonoroi diagram the decision surface induced by
1-NN for a typical set of training examples.
(pg.314)