Knowledge Discovery - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Knowledge Discovery

Description:

Name is the only distinct predictor? Decision trees continue to work as more data accumulates ... by targeting a set of consumers likely to buy a new product. ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 29
Provided by: mgmt1
Category:

less

Transcript and Presenter's Notes

Title: Knowledge Discovery


1
Knowledge Discovery Data Mining
  • process of extracting previously unknown, valid,
    and actionable (understandable) information from
    large databases
  • Data mining is a step in the KDD process of
    applying data analysis and discovery algorithms
  • Machine learning, pattern recognition,
    statistics, databases, data visualization.
  • Traditional techniques may be inadequate
  • large data

2
Why Mine Data?
  • Huge amounts of data being collected and
    warehoused
  • Walmart records 20 millions per day
  • health care transactions multi-gigabyte
    databases
  • Mobil Oil geological data of over 100 terabytes
  • Affordable computing
  • Competitive pressure
  • gain an edge by providing improved, customized
    services
  • information as a product in its own right

3
  • Knowledge discovery in databases (KDD) is the
    non-trivial process of identifying valid,
    potentially useful and ultimately understandable
    patterns in data

Data Mining
Clean, Collect, Summarize
Data Preparation
Training Data
Data Warehouse
Model Patterns
Verification, Evaluation
Operational Databases
4
Data mining
  • Pattern
  • 1212121?
  • 12 pattern is found often enough So, with some
    confidence we can say ? is 2
  • If 1 then 2 follows
  • Pattern ? Model
  • Confidence
  • 12121?
  • 12121231212123121212?
  • 1211212? 3
  • Models are created using historical data by
    detecting patterns. It is a calculated guess
    about likelihood of repetition of pattern.

5
Data mining algorithm components
  • Model representation
  • descriptions of discovered patterns
  • overly limited representation -- unable to
    capture data patterns
  • (decision trees, rules, linear/non-linear
    regression classification,
  • nearest neighbor and case-based reasoning
    methods, graphical
  • dependency models)
  • Model evaluation criteria
  • how well a pattern (model) meets goals (fit
    function)
  • eg., accuracy
  • Search method
  • parameter search optimization of of parameters
    for a given model representation
  • model search considers a family of models
  • Different methods suit different problems.
    Proper problem formulation crucial.

6
  • Note Models and patterns A pattern can be
    thought of as an instantiation of a model. Eg.
    f(x) 3 x2 x is a pattern whereas f(x) ax2
    bx is considered a model.
  • Data mining involves fitting models to and
    determining patterns from observed data.

7
  • Where are Models Used?
  • Selection
  • Business trying to select prospective customers
    (Profitability)
  • A model that predicts the LD usage based on
    credit history.
  • Acquisition
  • Selection is who would you like to invite to a
    party. Acquisition is about getting them to
    agree. Putting together a plan that will make
    them say yes. Again a model.
  • Retention
  • Keeping your flock together! Sensing it before
    they jump the ship.
  • 4. Extension
  • Extending services to existing customers.
    Cross-selling

8
Knowledge Discovery Process
  • Goal
  • understanding the application domain, and goals
    of KDD effort
  • Data selection, acquisition, integration
  • Data cleaning
  • noise, missing data, outliers, etc.
  • Exploratory data analysis
  • dimensionality modeling, transformations
  • selection of appropriate model for analysis,
    hypotheses to test
  • Data mining
  • selecting appropriate method that match set goals
    (classification, regression, clustering, etc)
  • selecting algorithm
  • Testing and verification
  • Interpretation
  • Consolidation and use

9
Issues and challenges
  • large data
  • number of variables (features), number of cases
    (examples)
  • multi gigabyte, terabyte databases
  • efficient algorithms, parallel processing
  • high dimensionality
  • large number of features exponential increase in
    search space
  • potential for spurious patterns
  • dimensionality reduction
  • Overfitting
  • models noise in training data, rather than just
    the general patterns
  • Changing data, missing and noisy data
  • Use of domain knowledge
  • utilizing knowledge on complex data
    relationships, known facts
  • Understandability of patterns

10
Data Mining
  • Prediction Methods
  • using some variables to predict unknown or future
    values of other variables
  • It uses database fields (predictors) for
    prediction model, using the field values we can
    make predictions
  • Descriptive Methods
  • finding human-interpretable patterns describing
    the data

11
Data Mining Techniques
  • Classification
  • Clustering
  • Association Rule Discovery
  • Sequential Pattern Discovery
  • Regression
  • Deviation Detection

12
Classification
  • Data defined in terms of attributes, one of which
    is the class
  • Find a model for class attribute as a function of
    the values of other(predictor) attributes, such
    that previously unseen records can be assigned a
    class as accurately as possible.
  • Training Data used to build the model
  • Test data used to validate the model (determine
    accuracy of the model)
  • Given data is usually divided into training and
    test sets.

13
ClassificationExample
14
Classification Direct Marketing
  • Old Tech

50 C 50 NC
New Tech
Old Tech
30 C 50 NC
20 C
lt 2 years
gt 2 years
25 C 10 NC
5 C 40 NC
Age lt55
Age gt 55
20 C 0 NC
5 C 10 NC
15
Classification Decision Tree
  • It divides up the data on each branch point
    without losing any of the data
  • The number of C NC is conserved
  • Easy and intuitive to build
  • It builds the tree by asking all possible
    questions, at each stage it picks the best one
    that splits the data in two segments. Recursively
    applies at all levels.
  • The tree stops
  • Segment contains only one record or predefined
    min. records.
  • The segment is organized on single prediction
    value
  • The improvement is not sufficient to warrant a
    split. i.e. the question improves from 90 C to 89
    C

16
Classification Decision Tree
  • The decision tree algorithm requires sufficient
    discriminating data for tree to grow

Name Age Eyes Salary Churned?
Steve 27 Blue 80,000 Yes
Alex 27 Blue 80,000 No
Name is the only distinct predictor? Decision
trees continue to work as more data accumulates
17
Classification Decision Tree
  • How to choose a good predictor?
  • Usually chose a numeric measure of goodness
  • Best predictor decreases the disorder of data set

A N C Y
B Y D Y
F Y E N
G N H N
A Y E N
B Y F N
C Y G N
D Y H N
For age lt50 100 predictor For salary gt300000
each segment has 50 split ID3 and CART are
good algorithms for decision tree building
18
Classification Direct Marketing
  • Goal Reduce cost of soliciting (mailing) by
    targeting a set of consumers likely to buy a new
    product.
  • Data
  • for similar product introduced earlier
  • we know which customers decided to buy and which
    did not buy, not buy class attribute
  • collect various demographic, lifestyle, and
    company related information about all such
    customers - as possible predictor variables.
  • Learn classifier model

19
Classification Fraud detection
  • Goal Predict fraudulent cases in credit card
    transactions.
  • Data
  • Use credit card transactions and information on
    its account-holder as input variables
  • label past transactions as fraud or fair.
  • Learn a model for the class of transactions
  • Use the model to detect fraud by observing credit
    card transactions on a given account.

20
Clustering
  • Given a set of data points, each having a set of
    attributes, and a similarity measure among them,
    find clusters such that
  • data points in one cluster are more similar to
    one another
  • data points in separate clusters are less similar
    to one another.
  • Similarity measures
  • Euclidean distance if attributes are continuous
  • Problem specific measures

21
Clustering Market Segmentation
  • Goal subdivide a market into distinct subsets of
    customers where any subset may conceivably be
    selected as a market target to be reached with a
    distinct marketing mix.
  • Approach
  • collect different attributes on customers based
    on geographical, and lifestyle related
    information
  • identify clusters of similar customers
  • measure the clustering quality by observing
    buying patterns of customers in same cluster vs.
    those from different clusters.

22
Association Rule Discovery
  • Given a set of records, each of which contain
    some number of items from a given collection
  • produce dependency rules which will predict
    occurrence of an item based on occurences of
    other items

23
Association RulesApplication
  • Marketing and Sales Promotion
  • Consider discovered rule
  • Bagels, --gt Potato Chips
  • Potato Chips as consequent can be used to
    determine what may be done to boost sales
  • Bagels as an antecedent can be used to see which
    products may be affected if bagels are
    discontinued
  • Can be used to see which products should be sold
    with Bagels to promote sale of Potato Chips

24
Association Rules Application
  • Supermarket shelf management
  • Goal to identify items which are bought together
    (by sufficiently many customers)
  • Approach process point-of-sale data (collected
    with barcode scanners) to find dependencies among
    items.
  • Example
  • If a customer buys Diapers and Milk, then he is
    very likely to buy Beer
  • so stack six-packs next to diapers?

25
Sequential Pattern Discovery
  • Given set of objects, each associated with its
    own timeline of events, find rules that predict
    strong sequential dependencies among different
    events, of the form (A B) (C) (D E) --gt (F)
  • xg max allowed time between consecutive
  • event-sets
  • ng min required time between consecutive
  • event sets
  • ws window-size, max time difference between
  • earliest and latest events in an event-set
    (events
  • within an event-set may occur in any order)
  • ms max allowed time between earliest and
  • latest events of the sequence.

26
Sequential Pattern Discovery Examples
  • sequences in which customers purchase
    goods/services
  • understanding long term customer behavior --
    timely promotions.
  • In point-of--sale transaction sequences
  • Computer bookstore
  • (Intro to Visual C) (C Primer) --gt (Perl
    for Dummies,
    TCL/TK)
  • Athletic Apparel Store
  • (Shoes) (Racket, Racketball) --gt (Sports Jacket)

27
Regression
  • Predict a value of a given continuous valued
    variable (dependent variable) based on values of
    other variables (independent variables)
  • Statistics, Neural networks, Genetic algorithms
  • Examples
  • predicting sales volumes of new product based on
    advertising expenditure
  • Time series prediction of stock market indices.

28
Visualization
  • complement to other DM techniques like
    Segmentation,etc.
Write a Comment
User Comments (0)
About PowerShow.com