Database Management System Recent Advances - PowerPoint PPT Presentation

1 / 145
About This Presentation
Title:

Database Management System Recent Advances

Description:

Title: No Slide Title Author: Marilyn Turnamian Last modified by: Vicky Created Date: 11/15/1999 4:56:55 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:401
Avg rating:3.0/5.0
Slides: 146
Provided by: Maril134
Category:

less

Transcript and Presenter's Notes

Title: Database Management System Recent Advances


1
Database Management SystemRecent Advances
  • By
  • Prof. Dr.O.P. Vyas
  • M.Tech.(CS), Ph.D. (
    I.I.T. Kharagpur )
  • DAAD Fellow (
    Germany )
  • AOTS Fellow ( Japan)
  • Professor Head ( Computer Science)
  • Pt. R.S. University Raipur (CG)
  • Visiting Prof. Rostock University Germany

2
Contents ADBMS
  • Concepts of Association Rule Mining
  • ARM Basics
  • Problems with Apriori
  • Apriori Vs. FP tree
  • ARM Variants
  • Classification Rule Mining
  • Classification techniques
  • Classifiers
  • Various classifiers
  • Classification Prediction
  • Classification accuracy
  • Mining Complex data types
  • Complex data types
  • Data mining Process Integration with existing
    Technology

3
Time Line of Data Mining Development
Time Area Contribution
Late 1700s Stat Bayes theorem of probability
Early 1900s Stat Regression analysis
Early 1920s Stat Maximum likelihood estimate
Early 1940s AI Neural Networks
Early 1950s Nearest neighbor
Early 1950s Single Link
Late 1950s AI Perceptron
Late 1950s Stat Resembling,Bias reduction , jackknife estimator
Early 1960s AI ML started
Early 1960s DB Batch reports
Mid 1960s Decision trees
Mid 1960s Stat Linear models for classification
IR Similarity measures
Time Area Contribution
Mid 1960s IR Clustering
Stat Exploratory data analysis(EDA)
Late 1960s DB Relational data model
Early 1970s IR SMART IR systems
Mid 1970s AI Genetic algorithms
Late 1970s Stat Estimation with incomplete data (EM algorithm)
Late 1970s Stat K- means clustering map
Early 1980s AI Kohonen self organizing map
Mid 1980s AI Decision tree algorithm
Early 1990s DB Association rule algorithms web and search engines
1990s DB Data warehousing
1990s DB Online analytic processing(OLAP)
4
Data Mining Functionalities
5
Association Rules
  • Retail shops are often interested in associations
    between different items that people buy.
  • Someone who buys bread is quite likely also to
    buy milk
  • A person who bought the book Database System
    Concepts is quite likely also to buy the book
    Operating System Concepts.
  • Association information can be used in several
    ways.
  • E.g. when a customer buys a particular book, an
    online shop may suggest associated books.
  • Association rules
  • bread ? milk
    DB-Concepts, OS-Concepts ? Networks
  • Left hand side antecedent, right hand side
    consequent
  • An association rule is a pattern that states when
    Antecedent occurs, Consequent occurs with certain
    probability.

6
Association Rules (Cont.)
  • Rules have an associated support, as well as an
    associated confidence.
  • Support is a measure of what fraction of the
    population satisfies both the antecedent and the
    consequent of the rule.
  • E.g. suppose only 0.001 percent of all purchases
    include milk and screwdrivers. The support for
    the rule is milk ? screwdrivers is low.
  • We usually want rules with a reasonably high
    support
  • Rules with low support are usually not very
    useful
  • Confidence is a measure of how often the
    consequent is true when the antecedent is true.
  • E.g. the rule bread ? milk has a confidence of
    80 percent if 80 percent of the purchases that
    include bread also include milk.
  • Usually want rules with reasonably large
    confidence.
  • A rule with a low confidence is not meaningful.
  • Note that the confidence of bread ? milk may be
    very different from the confidence of milk ?
    bread, although both have the same supports.

7
A.R.M model data
  • A.R.M. was initially applied to Market Basket
    Analysis on transaction data of Supermarket
    sales.
  • I i1, i2, , im a set of items.
  • Transaction t
  • t a set of items, and t ? I.
  • Transaction Database T a set of transactions T
    t1, t2, , tn.

8
Transaction data supermarket data
  • Market basket transactions
  • t1 bread, cheese, milk
  • t2 apple, eggs, salt, yogurt
  • tn biscuit, eggs, milk
  • Concepts
  • An item an item/article in a basket
  • I the set of all items sold in the store
  • A transaction items purchased in a basket it
    may have TID (transaction ID)
  • A transactional dataset A set of transactions

9
Transaction data a set of documents
  • A text document data set. Each document is
    treated as a bag of keywords
  • doc1 Student, Teach, School
  • doc2 Student, School
  • doc3 Teach, School, City, Game
  • doc4 Baseball, Basketball
  • doc5 Basketball, Player, Spectator
  • doc6 Baseball, Coach, Game, Team
  • doc7 Basketball, Team, City, Game

10
The model rules
  • A transaction t contains X, a set of items
    (itemset) in I, if X ? t.
  • An association rule is an implication of the
    form
  • X ? Y, where X, Y ? I, and X ?Y ?
  • An itemset is a set of items.
  • E.g., X milk, bread, cereal is an itemset.
  • A k-itemset is an itemset with k items.
  • E.g., milk, bread, cereal is a 3-itemset

11
Rule strength measures
  • Support The rule holds with support sup in T
    (the transaction data set) if sup of
    transactions contain X ? Y.
  • sup Pr(X ? Y).
  • Confidence The rule holds in T with confidence
    conf if conf of tranactions that contain X also
    contain Y.
  • conf Pr(Y X)
  • An association rule is a pattern that states when
    X occurs, Y occurs with certain probability.

12
Mining Association RulesAn Example
Let us take the Min. support 50 Min. confidence
50
  • For rule A ? C
  • support support(A ?C) 50
  • confidence support(A ?C)/support(A) 66.6
  • A ? C (50, 66.6)
  • C ? A (50, 100)
  • The Apriori principle
  • Any subset of a frequent itemset must be frequent

13
The Apriori Algorithm
  • Join Step Ck is generated by joining Lk-1with
    itself
  • Prune Step Any (k-1)-itemset that is not
    frequent cannot be a subset of a frequent
    k-itemset
  • Pseudo-code
  • Ck Candidate itemset of size k
  • Lk frequent itemset of size k
  • L1 frequent items
  • for (k 1 Lk !? k) do begin
  • Ck1 candidates generated from Lk
  • for each transaction t in database do
  • increment the count of all candidates in
    Ck1 that are
    contained in t
  • Lk1 candidates in Ck1 with min_support
  • end
  • return ?k Lk

14
The Apriori Algorithm Example
Database D
L1
C1
Scan D
C2
C2
L2
Scan D
C3
L3
Scan D
15
Generating rules from frequent itemsets
  • Frequent itemsets ? association rules
  • One more step is needed to generate association
    rules
  • For each frequent itemset X,
  • For each proper nonempty subset A of X,
  • Let B X - A
  • A ? B is an association rule if
  • Confidence(A ? B) minconf,
  • support(A ? B) support(A?B) support(X)
  • confidence(A ? B) support(A ? B) / support(A)

16
Generating association rules..
  • Once the frequent itemsets from transactions in a
    database D have been found, it is straightforward
    to generate strong association rules from them (
    where strong association rules satisfy both
    minimum support and minimum confidence).
  • To recap, in order to obtain A ? B, we need to
    have support(A ? B) and support(A)
  • All the required information for confidence
    computation has already been recorded in itemset
    generation. No need to see the data T any more.
  • This step is not as time-consuming as frequent
    itemsets generation.

17
Goal and key features
  • Goal Find all rules that satisfy the
    user-specified minimum support (minsup) and
    minimum confidence (minconf).
  • Key Features
  • Completeness find all rules.
  • No target item(s) on the right-hand-side
  • Mining with data on hard disk (not in memory)

18
Mining Association Rules in Large Databases
  • Association rule mining Association rule can be
    classified into categories based on different
    criteria such as
  • 1. Based on types of Values handled in the rule,
    associations can be classified into Boolean Vs.
    quantitative. A Boolean association shows
    relationships between discrete (categorical)
    objects. A quantitative association is a
    multidimensional association. Example of
    quantitative association rule, where X is is a
    variable representing a customer
  • Age (x,30.39) income (X,42k48k) ? buys (
    X, high resolution TV)
  • Note that quantitative attributes, age and income
    have been discretized
  • 2. Based on dimensions of data involved in the
    rule.
  • Ex. Purchase (X, computer )? Purchase (X,
    financial software) is a single dimensional
    association rule, and if date/time of purchase
    is added, it becomes multidimensional.
  • 3. Multilevel Association Rule Mining
  • 4. Multi Dimensional A.R.M.

19
Mining Multiple-Level Association Rules
  • Items often form hierarchies
  • Flexible support settings
  • Items at the lower level are expected to have
    lower support
  • Exploration of shared multi-level mining (Agrawal
    Srikant_at_VLB95, Han Fu_at_VLDB95)

20
Multi-level Association Redundancy Filtering
  • Some rules may be redundant due to ancestor
    relationships between items.
  • Example
  • milk ? wheat bread support 8, confidence
    70
  • 2 milk ? wheat bread support 2, confidence
    72
  • We say the first rule is an ancestor of the
    second rule.
  • A rule is redundant if its support is close to
    the expected value, based on the rules
    ancestor.

21
Mining Multi-Dimensional Association
  • Single-dimensional rules
  • buys(X, milk) ? buys(X, bread)
  • Multi-dimensional rules ? 2 dimensions or
    predicates
  • Inter-dimension assoc. rules (no repeated
    predicates)
  • age(X,19-25) ? occupation(X,student) ?
    buys(X, coke)
  • hybrid-dimension assoc. rules (repeated
    predicates)
  • age(X,19-25) ? buys(X, popcorn) ? buys(X,
    coke)
  • Categorical Attributes finite number of possible
    values, no ordering among valuesdata cube
    approach
  • Quantitative Attributes numeric, implicit
    ordering among valuesdiscretization, clustering,
    and gradient approaches

22
Mining Association Rules in Large Databases
  • Mining single-dimensional Boolean association
    rules from transactional databases
  • The Apriori Algorithm- an influential algorithm
    for mining frequent itemsets for boolean
    association rules, it uses prior knowledge of
    frequent itemset properties.
  • Apriori employs an iterative approach known as a
    level-wise search, where k-itemsets are used to
    explore (k1) itemsets.
  • First the set of frequent 1-itemsets is found.
    This set is denoted as L1. L1 is used to find
    the set of frequent 2-itemset, L2. And so on
    until no more frequent k-itemsets can be found.
  • The finding of each Lk requires one full scan of
    the database.

23
Many ARM algorithms
  • There are a large number of them!!
  • They use different strategies and data
    structures.
  • Their resulting sets of rules are all the same.
  • Given a transaction data set T, and a minimum
    support and a minimum confident, the set of
    association rules existing in T is uniquely
    determined.
  • Any algorithm should find the same set of rules
    although their computational efficiencies and
    memory requirements may be different. We study
    only one the Apriori Algorithm

24
On Apriori Algorithm
  • Seems to be very expensive
  • Level-wise search
  • K the size of the largest itemset
  • It makes at most K passes over data
  • In practice, K is bounded (10).
  • The algorithm is very fast. Under some
    conditions, all rules can be found in linear
    time.
  • Scale up to large data sets
  • Clearly the space of all association rules is
    exponential, O(2m), where m is the number of
    items in I.
  • The mining exploits sparseness of data, and high
    minimum support and high minimum confidence
    values.
  • Still, it always produces a huge number of rules,
    thousands, tens of thousands, millions, ...

25
UCI KDD Archive http//kdd.ics.uci.edu
  • This is an online repository of large data sets
    which encompasses a wide variety of data types,
    analysis tasks, and application areas.
  • The primary role of this repository is to enable
    researchers in knowledge discovery and data
    mining to scale existing and future data analysis
    algorithms to very large and complex data sets.
  • . The archive is intended to serve as a permanent
    repository of publicly-accessible data sets for
    research in KDD and data mining. It complements
    the original UCI Machine Learning Archive , which
    typically focuses on smaller classification-orient
    ed data sets.

26
ARM Implementations
  • Many implementations of Apriori Algorithm are
    available
  • http//www.cs.bme.hu/bodon/en/apriori/
  • (APRIORI
    implementation of Ferenc Bodon)
  • http//www.csc.liv.ac.uk/frans/KDD/Software/Aprio
    ri-T_GUI/aprioriT_GUI.html
  • Apriori-T (Apriori Total) is an Association
    Rule Mining (ARM) algorithm, developed by the
    LUCS-KDD research team The code obtainable from
    this page is a GUI version that inludes (for
    comparison purpopses) implementations of Brin's
    DIC algorithm (Brin et al. 1997) and Toivonon's
    negative boarder ARM approach (Toivonen 1996)
  • http//www.csc.liv.ac.uk/frans/KDD/Software/FPgro
    wth/fpGrowth.html
  • ( Implementation of FP growth method )
  • DBMiner is data mining system which runs on top
    of Microsoft SQL server 7.0 Plato system.

27
A.R.M. ImplementationsExample
  • In DBMiner, three kinds of associations could be
    possibly mined
  • Inter-dimensional association. Associations among
    or across two or more dimensions.
  • Customer-Country("Canada") gt Product-SubCategory(
    "Coffee") i.e. Canadian customers are likely to
    buy coffee.
  • 2. Intra-dimensional association. Associations
    present within one dimension grouped by another
    one or several dimensions. For example, if you
    want to find out which products customers in
    Canada are likely to purchase together
  • Within Customer-Country("Canada")    
    Product-ProductName("CarryBags") gt
    Product-ProductName("Tents")i.e. Customers in
    Canada, who buy carry-bags, are also likely to
    buy tents.
  • 3. Hybrid association. Associations combining
    elements of both inter- and intra-dimensional
    association mining. For example,
  • Within Customer-Country("Canada")    
    Product("Carry Bags") gt Product("Tents"),
    Time("Q3")i.e. Customers in Canada, who buy
    carry-bags, also tend to buy tents and do so most
    often in the 3rd quarter of the year (Jul, Aug,
    Sep).

28
Visualization of Association Rules Plane Graph
29
Problems with the association mining
  • Single minsup It assumes that all items in the
    data are of the same nature and/or have similar
    frequencies.
  • Not true In many applications, some items appear
    very frequently in the data, while others rarely
    appear.
  • E.g., in a supermarket, people buy food
    processor and cooking pan much less frequently
    than they buy bread and milk.

30
Rare Item Problem
  • If the frequencies of items vary a great deal, we
    will encounter two problems
  • If minsup is set too high, those rules that
    involve rare items will not be found.
  • To find rules that involve both frequent and rare
    items, minsup has to be set very low. This may
    cause combinatorial explosion because those
    frequent items will be associated with one
    another in all possible ways.

31
Is Apriori Fast Enough? Performance Bottlenecks
  • The core of the Apriori algorithm
  • Use frequent (k 1)-itemsets to generate
    candidate frequent k-itemsets
  • Use database scan and pattern matching to collect
    counts for the candidate itemsets
  • The bottleneck of Apriori candidate generation
  • Huge candidate sets
  • 104 frequent 1-itemset will generate 107
    candidate 2-itemsets
  • To discover a frequent pattern of size 100, e.g.,
    a1, a2, , a100, one needs to generate 2100 ?
    1030 candidates.
  • Multiple scans of database
  • Needs (n 1 ) scans, n is the length of the
    longest pattern

32
Mining Frequent Patterns Without Candidate
Generation
  • FP-Tree(Frequent Pattern Tree) Algorithm.
  • To break the two bottlenecks of Apriori series
    algorithms, some works of association rule mining
    using tree struc-ture have been designed. FP-Tree
    Han et al. 2000, frequent pattern mining, is
    another milestone in the development of
    association rule mining, which breaks thetwo
    bottlenecks of the Apriori.
  • The frequent itemsets are generated with only two
    passes over the database and without any
    candidate generation process. FP-Tree was
    introduced by Han et al in Han et al. 2000.
  • By avoiding the candidate generation process and
    less passes over the database, FP-Tree is an
    order of magnitude faster than the Apriori
    algorithm. The frequent patterns generation
    process includes two sub processes constructing
    the FT-Tree, and generating frequent patterns
    from the FP tree.

33
FP Tree
  • Compress a large database into a compact,
    Frequent-Pattern tree (FP-tree) structure
  • highly condensed, but complete for frequent
    pattern mining
  • avoid costly database scans
  • Develop an efficient, FP-tree-based frequent
    pattern mining method
  • A divide-and-conquer methodology decompose
    mining tasks into smaller ones
  • Avoid candidate generation sub-database test
    only!
  • Some Researchers have identified that when
    dataset is vary sparse then FP Tree has shown
    bottlenecks and Apriori has comparatively given
    better performance !!

34
Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o f, b 400 b, c, k,
s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 0.5
  • Steps
  • Scan DB once, find frequent 1-itemset (single
    item pattern)
  • Order frequent items in frequency descending
    order
  • Scan DB again, construct FP-tree

35
Benefits of the FP-tree Structure
  • Completeness
  • never breaks a long pattern of any transaction
  • preserves complete information for frequent
    pattern mining
  • Compactness
  • reduce irrelevant informationinfrequent items
    are gone
  • frequency descending ordering more frequent
    items are more likely to be shared
  • never be larger than the original database (if
    not count node-links and counts)
  • Example For Connect-4 DB, compression ratio
    could be over 100

36
Mining Frequent Patterns Using FP-tree
  • General idea (divide-and-conquer)
  • Recursively grow frequent pattern path using the
    FP-tree
  • Method
  • For each item, construct its conditional
    pattern-base, and then its conditional FP-tree
  • Repeat the process on each newly created
    conditional FP-tree
  • Until the resulting FP-tree is empty, or it
    contains only one path (single path will generate
    all the combinations of its sub-paths, each of
    which is a frequent pattern)

37
Market Basket Analysis Purpose
  • The Supermarket revolution when first sparked off
    in the 1920s, one could not even dream of
    retailing as it exists today. By the 1950s it had
    won acclaim and acceptance almost globally. This
    is one retailing sector that is spreading very
    fast in India. But still majority of retailing
    sector including this one is not properly
    managed.
  • Retailing management has been in focus for
    marketing strategists since long as organized
    retailing is assuming significant attention.
    M.B.A is one such effort.
  • In a supermarket retailing MBA has endeavored to
    study and analyze the combination of various
    items accumulated in a Shopping Basket and was
    intended to establish Associationship between the
    various items bought by the customer.
  • Market basket analysis is a generic term for
    methodologies that study the composition of a
    basket of products (i.e. a shopping basket)
    purchased by a household during a single shopping
    trip.
  • The idea is that market baskets reflect
    interdependencies between products or purchases
    made in different product categories, and that
    these interdependencies can be useful to support
    retail marketing decisions.

38
MBA
  • Our data mining approach to super market business
    data will record all the supermarket transactions
    in a tabular form and appropriate algorithm will
    process the transaction data to provide
    significant Associationships of various items.
  • From a marketing perspective, the research is
    motivated by the fact that some recent trends in
    retailing pose important challenges to retailers
    in order to stay competitive. In fact, on the
    level of the retailer, a number of trends can be
    identified, including concentration,
    internationalization, decreasing profit margins
    and an increase in discounting.
  • Recently, a number of advances in data mining
    (association rules) and statistics offer new
    opportunities to analyze such data.

39
Data Mining Functionalities
40
Data Mining
Clustering
Association Mining
Classification
Classification mining analyzes a set of training
data (i.e. a set of objects whose class labels
are known) and constructs a model for each class
based on the features in the data. A set of
classification rules are generated by the
classification process, and these can be used to
classify future data, as well as develop a better
understanding of each class in the database.
Techniques
Associative Classification
Application domain
41
Data Mining
Classification
Clustering
Association Mining
  • Associative Classification
  • (Combines the Association Classification)
  • CBA, CMAR, CPAR MCLP
  • Modifying the algorithms

Classification Techniques
Techniques
Application domain
42
Supervised vs. Unsupervised Learning
  • Learning training data are analyzed by a
    classification algorithm.
  • Supervised learning (classification) Learning of
    the model is supervised in that it is told to
    which class each training sample belongs
  • Supervision The training data (observations,
    measurements, etc.) are accompanied by labels
    indicating the class of the observations
  • New data is classified based on the training set
  • Unsupervised learning (clustering)
  • The class labels of training data is unknown
  • Given a set of measurements, observations, etc.
    with the aim of establishing the existence of
    classes or clusters in the data

43
Classification vs. Prediction
  • Classification
  • classifies data (constructs a model) based on the
    training set and the values (class labels) in a
    classifying attribute and uses it in classifying
    new data.
  • predicts categorical class labels.
  • Prediction - Prediction can be viewed as the
    construction and use of a model to assess the
    class of an unlabeled sample, or to assess the
    value or value ranges of an attribute that a
    given sample is likely to have.
  • CLASSIFICATION REGRESSION are two prediction
    methods. ( discrete Vs. Continuous)
  • models continuous-valued functions, i.e.,
    predicts unknown or missing values.
  • Typical Applications-credit approval, target
    marketing, medical diagnosis, treatment
    effectiveness analysis.

44
Data Classification A Two-Step Process
  • Model construction describing a set of
    predetermined classes
  • Learning
  • Each tuple / sample is assumed to belong to a
    predefined class, as determined by the class
    label attribute.
  • The set of tuples used for model construction
    training set (given data).
  • The model is represented as classification rules,
    decision trees, or mathematical formulae.
  • Model usage for classifying future or unknown
    objects
  • Classification
  • Estimate accuracy of the model
  • The known label of test sample is compared with
    the classified result from the model
  • Accuracy rate is the percentage of test set
    samples that are correctly classified by the
    model
  • Test set is independent of training set,
    otherwise over-fitting will occur

45
Illustrating Classification Task
46
Classification Process (1) Model Construction (
Learning)
Classification Algorithms
Classification rules IF rank professor OR
years gt 6 THEN tenured yes
47
Classification Process (2) Use the Model in
Prediction ( classification)
(Jeff, Professor, 4)
Tenured?
48
Examples of Classification Task
  • Predicting tumor cells as benign or malignant
  • Classifying credit card transactions as
    legitimate or fraudulent
  • Classifying secondary structures of protein as
    alpha-helix, beta-sheet, or random coil
  • Categorizing news stories as finance, weather,
    entertainment, sports, etc

49
Data Mining
Clustering
Association Mining
Classification
Classification mining analyzes a set of training
data (i.e. a set of objects whose class labels
are known) and constructs a model for each class
based on the features in the data. A set of
classification rules are generated by the
classification process, and these can be used to
classify future data, as well as develop a better
understanding of each class in the database.
Techniques
Associative Classification
Application domain
50
Data Mining
Classification
Clustering
Association Mining
  • Associative Classification
  • (Combines the Association Classification)
  • CBA, CMAR, CPAR MCLP
  • Modifying the algorithms

Classification Techniques
Techniques
Application domain
51
Supervised vs. Unsupervised Learning
  • Learning training data are analyzed by a
    classification algorithm.
  • Supervised learning (classification) Learning of
    the model is supervised in that it is told to
    which class each training sample belongs
  • Supervision The training data (observations,
    measurements, etc.) are accompanied by labels
    indicating the class of the observations
  • New data is classified based on the training set
  • Unsupervised learning (clustering)
  • The class labels of training data is unknown
  • Given a set of measurements, observations, etc.
    with the aim of establishing the existence of
    classes or clusters in the data

52
Classification vs. Prediction
  • Classification
  • classifies data (constructs a model) based on the
    training set and the values (class labels) in a
    classifying attribute and uses it in classifying
    new data.
  • predicts categorical class labels.
  • Prediction - Prediction can be viewed as the
    construction and use of a model to assess the
    class of an unlabeled sample, or to assess the
    value or value ranges of an attribute that a
    given sample is likely to have.
  • CLASSIFICATION REGRESSION are two prediction
    methods. ( discrete Vs. Continuous)
  • models continuous-valued functions, i.e.,
    predicts unknown or missing values.
  • Typical Applications-credit approval, target
    marketing, medical diagnosis, treatment
    effectiveness analysis.

53
Data Classification A Two-Step Process
  • Model construction describing a set of
    predetermined classes
  • Learning
  • Each tuple / sample is assumed to belong to a
    predefined class, as determined by the class
    label attribute.
  • The set of tuples used for model construction
    training set (given data).
  • The model is represented as classification rules,
    decision trees, or mathematical formulae.
  • Model usage for classifying future or unknown
    objects
  • Classification
  • Estimate accuracy of the model
  • The known label of test sample is compared with
    the classified result from the model
  • Accuracy rate is the percentage of test set
    samples that are correctly classified by the
    model
  • Test set is independent of training set,
    otherwise over-fitting will occur

54
Illustrating Classification Task
55
Classification Process (1) Model Construction (
Learning)
Classification Algorithms
Classification rules IF rank professor OR
years gt 6 THEN tenured yes
56
Classification Process (2) Use the Model in
Prediction ( classification)
(Jeff, Professor, 4)
Tenured?
57
Examples of Classification Task
  • Predicting tumor cells as benign or malignant
  • Classifying credit card transactions as
    legitimate or fraudulent
  • Classifying secondary structures of protein as
    alpha-helix, beta-sheet, or random coil
  • Categorizing news stories as finance, weather,
    entertainment, sports, etc

58
Classification and Prediction
  • What is classification? What is prediction?
  • Issues regarding classification and prediction
  • Classification by decision tree induction
  • Bayesian Classification
  • Classification by backpropagation
  • Classification based on concepts from association
    rule mining
  • Other Classification Methods
  • Prediction
  • Classification accuracy
  • Summary

59
Decision Tree Classifiers Survey paper
60
Bayesian Classification
  • Bayesian classifiers are statistical classifiers
    which predict class membership probabilities such
    as the probability that a given sample belongs to
    a particular class.
  • Bayesian classification is based on the Bayes
    theorem and it is observed that a simple Bayesian
    Classifier known as the naïve Bayesian classifier
    to be comparable in performance with decision
    tree and Neural network classifiers.
  • Naïve Bayesian classifier assume that the effect
    of an attribute value on a given class is
    independent of the values of the other
    attributes. (conditional independence).
  • Bayesian belief networks are graphical models,
    which unlike naïve bayesian classifiers allow the
    representation of dependencies among subsets of
    attributes. Can be Used for classification.

61
Bayesian Classification Why?
  • Probabilistic learning Calculate explicit
    probabilities for hypothesis, among the most
    practical approaches to certain types of learning
    problems
  • Incremental Each training example can
    incrementally increase/decrease the probability
    that a hypothesis is correct. Prior knowledge
    can be combined with observed data.
  • Probabilistic prediction Predict multiple
    hypotheses, weighted by their probabilities
  • Standard Even when Bayesian methods are
    computationally intractable, they can provide a
    standard of optimal decision making against which
    other methods can be measured

62
Bayesian Theorem
  • Given training data D, posteriori probability of
    a hypothesis h, P(hD) follows the Bayes theorem
  • MAP (maximum posteriori) hypothesis
  • Practical difficulty require initial knowledge
    of many probabilities, significant computational
    cost

63
Naïve Bayes Classifier (I)
  • A simplified assumption attributes are
    conditionally independent
  • Greatly reduces the computation cost, only count
    the class distribution.

64
Naive Bayesian Classifier (II)
  • Given a training set, we can compute the
    probabilities

65
Naïve Bayesian Classification
  • Naïve assumption attribute independence
  • P(x1,,xkC) P(x1C)P(xkC)
  • If i-th attribute is categoricalP(xiC) is
    estimated as the relative freq of samples having
    value xi as i-th attribute in class C
  • If i-th attribute is continuousP(xiC) is
    estimated thru a Gaussian density function
  • Computationally easy in both cases

66
The independence hypothesis
  • makes computation possible
  • yields optimal classifiers when satisfied
  • but is seldom satisfied in practice, as
    attributes (variables) are often correlated.
  • Attempts to overcome this limitation
  • Bayesian networks, that combine Bayesian
    reasoning with causal relationships between
    attributes
  • Decision trees, that reason on one attribute at
    the time, considering most important attributes
    first

67
Bayesian Belief Networks (I)
Family History
Smoker
(FH, S)
(FH, S)
(FH, S)
(FH, S)
LC
0.7
0.8
0.5
0.1
LungCancer
Emphysema
LC
0.3
0.2
0.5
0.9
The conditional probability table for the
variable LungCancer
PositiveXRay
Dyspnea
Bayesian Belief Networks
68
Bayesian Belief Networks (II)
  • Bayesian belief network allows a subset of the
    variables conditionally independent
  • A graphical model of causal relationships
  • Several cases of learning Bayesian belief
    networks
  • Given both network structure and all the
    variables easy
  • Given network structure but only some variables
  • When the network structure is not known in advance

69
Classification and Prediction
  • What is classification? What is prediction?
  • Issues regarding classification and prediction
  • Classification by decision tree induction
  • Bayesian Classification
  • Classification by backpropagation
  • Classification based on concepts from association
    rule mining
  • Other Classification Methods
  • Prediction
  • Classification accuracy
  • Summary

70
Classification by Backpropopagation
  • Backpropagation has been considered as effective
    mechanism in field of classification. The
    backpropagation algorithm was presented by
    Rumelhart, Hinton, and Williams RHW86. One of
    the most popularly used backpropagation technique
    is a neural network learning algorithm.
  • In Freuds theory of psychodynamics, the human
    brain (10 11) was described as a neural
    network, and recent investigations have
    corroborated this view.
  • This analogy therefore offers an interesting
    model for the creation of more complex learning
    machines, and has led the creation of ANN.
  • Neural network with their remarkable ability to
    derive meaning from complicated or imprecise
    data, can be used to extract patterns and detect
    trends that are too complex to be noticed by
    either humans or other computer techniques.
  • A trained neural network can be thought of as an
    "expert" in the category of information it has
    been given to analyze. This expert can then be
    used to provide projections given new situations
    of interest and answer "what if" questions.

71
Classification by Backpropopagation
  • Backpropagation has been considered as effective
    mechanism in field of classification. The
    backpropagation algorithm was presented by
    Rumelhart, Hinton, and Williams RHW86. One of
    the most popularly used backpropagation technique
    is a neural network learning algorithm.
  • In Freuds theory of psychodynamics, the human
    brain (10 11) was described as a neural
    network, and recent investigations have
    corroborated this view.
  • This analogy therefore offers an interesting
    model for the creation of more complex learning
    machines, and has led the creation of ANN.
  • Neural network with their remarkable ability to
    derive meaning from complicated or imprecise
    data, can be used to extract patterns and detect
    trends that are too complex to be noticed by
    either humans or other computer techniques.
  • A trained neural network can be thought of as an
    "expert" in the category of information it has
    been given to analyze. This expert can then be
    used to provide projections given new situations
    of interest and answer "what if" questions.

72
Neural Networks
  • An Artificial Neural Network (ANN) is an
    information processing paradigm that is inspired
    by the way biological nervous systems, such as
    the brain, process information. The key element
    of this paradigm is the novel structure of the
    information processing system.
  • It is composed of a large number of highly
    interconnected processing elements (neurones)
    working in unison to solve specific problems.
    ANNs, like people, learn by example.
  • An ANN is configured for a specific application,
    such as pattern recognition or data
    classification, through a learning process.
    Learning in biological systems involves
    adjustments to the synaptic connections that
    exist between the neurones. This is true of ANNs
    as well.

73
ANN Advantages
  • Adaptive learning An ability to learn how to do
    tasks based on the data given for training or
    initial experience.
  • Self-Organisation An ANN can create its own
    organization or representation of the information
    it receives during learning time.
  • Real Time Operation ANN computations may be
    carried out in parallel, and special hardware
    devices are being designed and manufactured which
    take advantage of this capability.
  • Fault Tolerance via Redundant Information Coding
    Partial destruction of a network leads to the
    corresponding degradation of performance.
    However, some network capabilities may be
    retained even with major network damage.

74
ANN Vs Conventional Computing approach
  • Neural networks take a different approach to
    problem solving than that of conventional
    computers. Conventional computers use an
    algorithmic approach i.e. the computer follows a
    set of instructions in order to solve a problem.
    Unless the specific steps that the computer needs
    to follow are known the computer cannot solve the
    problem. That restricts the problem solving
    capability of conventional computers to problems
    that we already understand and know how to solve.
    But computers would be so much more useful if
    they could do things that we don't exactly know
    how to do.
  • Neural networks process information in a similar
    way the human brain does. The network is composed
    of a large number of highly interconnected
    processing elements ( neurones) working in
    parallel to solve a specific problem. Neural
    networks learn by example. They cannot be
    programmed to perform a specific task.
  • The examples must be selected carefully otherwise
    useful time is wasted or even worse, the network
    might be functioning incorrectly. The
    disadvantage is that because the network finds
    out how to solve the problem by itself, its
    operation can be unpredictable.

75
ANN Vs. Conventional
  • On the other hand, conventional computers use a
    cognitive approach to problem solving the way
    the problem is to solved must be known and stated
    in small unambiguous instructions. These
    instructions are then converted to a high level
    language program and then into machine code that
    the computer can understand. These machines are
    totally predictable if anything goes wrong is
    due to a software or hardware fault.
  • Neural networks and conventional algorithmic
    computers are not in competition but complement
    each other. There are tasks, more suited to an
    algorithmic approach like arithmetic operations
    and tasks that are more suited to neural
    networks. Even more, a large number of tasks,
    require systems that use a combination of the two
    approaches (normally a conventional computer is
    used to supervise the neural network) in order to
    perform at maximum efficiency.
  • Neural networks do not perform miracles. But if
    used sensibly they can produce some amazing
    results.

76
ANN An engineering approach
  • A simple neuron
  • An artificial neuron is a device with many
    inputs and one output. The neuron has two modes
    of operation the training mode and the using
    mode.
  • In the training mode, the neuron can be
    trained to fire (or not), for particular input
    patterns. In the using mode, when a taught input
    pattern is detected at the input, its associated
    output becomes the current output. If the input
    pattern does not belong in the taught list of
    input patterns, the firing rule is used to
    determine whether to fire or not.

77
A Neuron
  • The n-dimensional input vector x is mapped into
    variable y by means of the scalar product and a
    nonlinear function mapping

78
Network layers
  • The commonest type of artificial neural network
    consists of three groups, or layers, of units a
    layer of "input" units is connected to a layer of
    "hidden" units, which is connected to a layer of
    "output" units.
  • The activity of the input units represents the
    raw information that is fed into the network.
  • The activity of each hidden unit is determined by
    the activities of the input units and the weights
    on the connections between the input and the
    hidden units.
  • The behavior of the output units depends on the
    activity of the hidden units and the weights
    between the hidden and output units.
  • This simple type of network is interesting
    because the hidden units are free to construct
    their own representations of the input. The
    weights between the input and hidden units
    determine when each hidden unit is active, and so
    by modifying these weights, a hidden unit can
    choose what it represents.

79
Architecture of Neural Networks
Feed-forward networks Feed-forward ANNs (figure
below) allow signals to travel one way only from
input to output. There is no feedback (loops)
i.e. the output of any layer does not affect that
same layer. Feed-forward ANNs tend to be straight
forward networks that associate inputs with
outputs. They are extensively used in pattern
recognition. This type of organization is also
referred to as bottom-up or top-down.
80
ANN Architecture
Feedback networks Feedback networks (figure
below) can have signals traveling in both
directions by introducing loops in the network.
Feedback networks are very powerful and can get
extremely complicated. Feedback networks are
dynamic their 'state' is changing continuously
until they reach an equilibrium point. They
remain at the equilibrium point until the input
changes and a new equilibrium needs to be found.
Feedback architectures are also referred to as
interactive or recurrent, although the latter
term is often used to denote feedback connections
in single-layer organizations.

81
ANN
  • There are different architectures for Neural
    networks, and they each utilize different wiring
    and learning strategies. ( Backpropagation algo.
    In 1980s)
  • Advantages
  • prediction accuracy is generally high
  • robust, works when training examples contain
    errors
  • output may be discrete, real-valued, or a vector
    of several discrete or real-valued attributes
  • fast evaluation of the learned target function
  • Criticism
  • long training time
  • difficult to understand the learned function
    (weights)
  • not easy to incorporate domain knowledge

82
Network Training
  • The ultimate objective of training
  • obtain a set of weights that makes almost all the
    tuples in the training data classified correctly
  • Steps
  • Initialize weights with random values
  • Feed the input tuples into the network one by one
  • For each unit
  • Compute the net input to the unit as a linear
    combination of all the inputs to the unit
  • Compute the output value using the activation
    function
  • Compute the error
  • Update the weights and the bias

83
Applications of ANN
  • Classification A neural network can discover
    the distinguishing features needed to perform a
    classification task. It can take an object and
    accordingly assign the specific class label to
    it.ANN have been used in many classification
    tasks including
  • Recognition of printed or handwritten characters.
  • Classification of SONAR RADAR signals.
  • Speech recognition A very significant area of
    interest involves three modules namely front
    end-which samples the speech signals and extracts
    the data, the word processor which is used for
    finding the probability of words in the
    vocabulary that match the features of spoken
    words and the sentence processor which determines
    if the recognized word makes sense in the
    sentence.

84
Multi-Layer Perceptron
Output vector
Output nodes
Hidden nodes
wij
Input nodes
Input vector xi
85
Classification and Prediction
  • What is classification? What is prediction?
  • Issues regarding classification and prediction
  • Classification by decision tree induction
  • Bayesian Classification
  • Classification by backpropagation
  • Classification based on concepts from association
    rule mining
  • Other Classification Methods
  • Prediction
  • Classification accuracy
  • Summary

86
What Is Prediction?
  • Prediction is similar to classification
  • First, construct a model
  • Second, use model to predict unknown value
  • Major method for prediction is regression
  • Linear and multiple regression
  • Non-linear regression
  • Prediction is different from classification
  • Classification refers to predict categorical
    class label
  • Prediction models continuous-valued functions

87
Predictive Modeling in Databases
  • Predictive modeling Predict data values or
    construct generalized linear models based on
    the database data.
  • One can only predict value ranges or category
    distributions
  • Method outline
  • Minimal generalization
  • Attribute relevance analysis
  • Generalized linear model construction
  • Prediction
  • Determine the major factors which influence the
    prediction
  • Data relevance analysis uncertainty measurement,
    entropy analysis, expert judgement, etc.
  • Multi-level prediction drill-down and roll-up
    analysis.
  • www.sas.com www.spss.com
    www.mathsoft.com

88
Association-Based Classification
  • Can any ideas from association rule mining be
    applied to classification?
  • Several methods for association-based
    classification
  • ARCS( Association rule Clustering System)
    Quantitative association mining and clustering of
    association rules (Lent et al97) (pg. 310,254)
  • It beats C4.5 in (mainly) scalability and also
    accuracy
  • Associative classification (Liu et al98)
  • It mines high support and high confidence rules
    in the form of cond_set gt y, where y is a
    class label
  • CAEP (Classification by aggregating emerging
    patterns) (Dong et al99)
  • Emerging patterns (EPs) the itemsets whose
    support increases significantly from one class to
    another
  • Mine Eps based on minimum support and growth rate

89
Assignment 1
  • Suppose there are two classification rules, one
    that says people with salaries between 10,000
    and 20,000 have a credit rating of good, and
    another that says that people with salaries
    between 20,000 and 30,000 have a credit rating
    of good. Under what conditions can the rules be
    replaced without any loss of information, by a
    single rule that says that people with salaries
    between 10,000 and 30,000 have a credit rating
    of good.

No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
90
Assignment 1 (Solutions)
  • Suppose there are two classification rules, one
    that says people with salaries between 10,000
    and 20,000 have a credit rating of good, and
    another that says that people with salaries
    between 20,000 and 30,000 have a credit rating
    of good. Under what conditions can the rules be
    replaced without any loss of information, by a
    single rule that says that people with salaries
    between 10,000 and 30,000 have a credit rating
    of good.
  • Solution- Consider the following pair of rules
    and their confidence levels
  • The new rule has to be assigned a
    confidence-level which is between the
    confidence-levels for rules 1 and 2. Replacing
    the original rules by the new rule will result in
    a loss of confidence-level information for
    classifying persons, since we cannot distinguish
    the confidence levels of people earning between
    10000 and 20000 from those of people earning
    between 20000 and 30000. Therefore we can combine
    the two rules without loss of information only if
    their confidences are the same.

No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
91
Assignment 2
  • 2. Suppose half of all the transactions in a
    clothes shop purchase jeans, and one third of all
    transactions in the shop purchase
    T-shirts.Suppose also that half of the
    transactions that purchase jeans also purchase
    T-shirts. Write down all the non-trivial
    association rules you can deduce from the above
    information, giving support and confidence of
    each rule.

92
Assignment 2 (Solutions)
  • 2. Suppose half of all the transactions in a
    clothes shop purchase jeans, and one third of all
    transactions in the shop purchase
    T-shirts.Suppose also that half of the
    transactions that purchase jeans also purchase
    T-shirts. Write down all the non-trivial
    association rules you can deduce from the above
    information, giving support and confidence of
    each rule.
  • Solution The rules are as follows, the last rule
    can be deduced from the previous ones.

Rule Support Conf.
For all transactions T, true gt buys (T, Jeans) 50 50
For all transactions T, true gt buys ( T, t-shirts) 33 33
For all transactions T, buys ( T, Jeans) gt buys (T, t-shirts) 25 50
For all transactions T, buys( T, t-shirts) gt buys (T, jeans) 25 75
93
Assignment 1
  • Suppose there are two classification rules, one
    that says people with salaries between 10,000
    and 20,000 have a credit rating of good, and
    another that says that people with salaries
    between 20,000 and 30,000 have a credit rating
    of good. Under what conditions can the rules be
    replaced without any loss of information, by a
    single rule that says that people with salaries
    between 10,000 and 30,000 have a credit rating
    of good.

No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
94
Assignment 1 (Solutions)
  • Suppose there are two classification rules, one
    that says people with salaries between 10,000
    and 20,000 have a credit rating of good, and
    another that says that people with salaries
    between 20,000 and 30,000 have a credit rating
    of good. Under what conditions can the rules be
    replaced without any loss of information, by a
    single rule that says that people with salaries
    between 10,000 and 30,000 have a credit rating
    of good.
  • Solution- Consider the following pair of rules
    and their confidence levels
  • The new rule has to be assigned a
    confidence-level which is between the
    confidence-levels for rules 1 and 2. Replacing
    the original rules by the new rule will result in
    a loss of confidence-level information for
    classifying persons, since we cannot distinguish
    the confidence levels of people earning between
    10000 and 20000 from those of people earning
    between 20000 and 30000. Therefore we can combine
    the two rules without loss of information only if
    their confidences are the same.

No. Rule Conf.
1. For all persons P, 10000 lt P.salary lt 20000 gt P.credit good 60
2 For all persons P, 20000 lt P.salary lt 30000 gt P.credit good 90
95
Assignment 2
  • 2. Suppose half of all the transactions in a
    clothes shop purchase jeans, and one third of all
    transactions in the shop purchase
    T-shirts.Suppose also that half of the
    transactions that purchase jeans also purchase
    T-shirts. Write down all the non-trivial
    association rules you can deduce from the above
    information, giving support and confidence of
    each rule.

96
Assignment 2 (Solutions)
  • 2. Suppose half of all the transactions in a
    clothes shop purchase jeans, and one third of all
    transactions in the shop purchase
    T-shirts.Suppose also that half of the
    transactions that purchase jeans also purchase
    T-shirts. Write down all the non-trivial
    association rules you can deduce from the above
    information, giving support and confidence of
    each rule.
  • Solution The rules are as follows, the last rule
    can be deduced from the previous ones.

Rule Support Conf.
For all transactions T, true gt buys (T, Jeans) 50 50
For all transactions T, true gt buys ( T, t-shirts) 33 33
For all transactions T, buys ( T, Jeans) gt buys (T, t-shirts) 25 50
For all transactions T, buys( T, t-shirts) gt buys (T, jeans) 25 75
97
Classification and Prediction
  • What is classification? What is prediction?
  • Issues regarding classification and prediction
  • Classification by decision tree induction
  • Bayesian Classification
  • Classification by backpropagation
  • Classification based on concepts from association
    rule mining
  • Other Classification Methods
  • Prediction
  • Classification accuracy
  • Summary

98
Other Classification Methods
  • k-nearest neighbor classifier
  • case-based reasoning
  • Genetic algorithm
  • Rough set approach
  • Fuzzy set approaches

99
Instance-Based Methods
  • Instance-based learning Less commonly used
    commercially
  • Store (all) training examples and delay the
    processing (lazy evaluation) until a new
    instance must be classified.
  • Typical approaches
  • k-nearest neighbor approach
  • Instances represented as points in a Euclidean
    space.
  • Locally weighted regression
  • Constructs local approximation
  • Case-based reasoning
  • Uses symbolic representations and knowledge-based
    inference

100
The k-Nearest Neighbor Algorithm
  • All instances correspond to points in the n-D
    space.
  • The nearest neighbor are defined in terms of
    Euclidean distance.
  • The target function could be discrete- or real-
    valued.
  • For discrete-valued, the k-NN returns the most
    common value among the k training examples
    nearest to xq.
  • Vonoroi diagram the decision surface induced by
    1-NN for a typical set of training examples.

  • (pg.314)

.
_
_
_
.
_
.

.
Write a Comment
User Comments (0)
About PowerShow.com