Data Mining - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Data Mining

Description:

Thanks to Raymond J. Mooney in the University of Texas at Austin, Isabelle Guyon ... Predict who is likely to attrite next month. ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 85
Provided by: raym120
Category:
Tags: attrite | data | mining

less

Transcript and Presenter's Notes

Title: Data Mining


1
Data Mining Machine LearningIntroduction
  • Intelligent Systems Lab.
  • Soongsil University

Thanks to Raymond J. Mooney in the University of
Texas at Austin, Isabelle Guyon
2
Artificial Intelligence (AI) Research Areas
Learning Algorithms Inference Mechanisms Knowledge
Representation Intelligent System Architecture
Research
Intelligent Agents Information Retrieval Electroni
c Commerce Data Mining Bioinformatics Natural
Language Proc. Expert Systems
Artificial Intelligence
Application
Rationalism (Logical) Empiricism
(Statistical) Connectionism (Neural) Evolutionary
(Genetic) Biological (Molecular)
Paradigm
3
Artificial Intelligence (AI) Paradigms
4
What is Machine Learning?
Trained machine
  • Learning
  • algorithm

TRAINING DATA
Answer
?
Query
5
Definition of learning
  • Definition A computer program is said to learn
    from experience E with respect to some class of
    tasks T and performance measure P, if its
    performance at tasks in T, as measured by P,
    improves with experience E

Task, T
Experience, E
Task
Learned Program
Learning Program
Program
Performance
Performance, P
6
What is Learning?
  • Herbert Simon Learning is any process by which
    a system improves performance from experience.

7
Machine Learning
  • Supervised Learning
  • Estimate an unknown mapping from known input-
    output pairs
  • Learn fw from training set D(x,y) s.t.
  • Classification y is discrete
  • Regression y is continuous
  • Unsupervised Learning
  • Only input values are provided
  • Learn fw from D(x) s.t.
  • Clustering

8
Why Machine Learning?
  • Recent progress in algorithms and theory
  • Growing flood of online data
  • Computational power is available
  • Knowledge engineering bottleneck. Develop
    systems that are too difficult/expensive to
    construct manually because they require specific
    detailed skills or knowledge tuned to a specific
    task
  • Budding industry

9
Niches using machine learning
  • Data mining from large databases.
  • Market basket analysis (e.g. diapers and beer)
  • Medical records ? medical knowledge
  • Software applications we cant program by hand
  • Autonomous driving
  • Speech recognition
  • Self customizing programs to individual users.
  • Spam mail filter
  • Personalized tutoring
  • Newsreader that learns user interests

10
Trends leading to Data Flood
  • More data is generated
  • Bank, telecom, other business transactions ...
  • Scientific data astronomy, biology, etc
  • Web, text, and e-commerce

11
Big Data Examples
  • Europe's Very Long Baseline Interferometry (VLBI)
    has 16 telescopes, each of which produces 1
    Gigabit/second of astronomical data over a 25-day
    observation session
  • storage and analysis a big problem
  • ATT handles billions of calls per day
  • so much data, it cannot be all stored -- analysis
    has to be done on the fly, on streaming data

12
Largest databases in 2007
  • Commercial databases
  • ATT 312 TB
  • World Data Centre for Climate 220 TB
  • YouTube 45TB of videos
  • Amazon 42 TB (250,000 full textbooks)
  • Central Intelligence Agency (CIA) ?

13
Data Growth
In 2 years, the size of the largest database
TRIPLED!
14
Machine Learning / Data Mining Application areas
  • Science
  • astronomy, bioinformatics, drug discovery,
  • Business
  • CRM (Customer Relationship management), fraud
    detection, e-commerce, manufacturing,
    sports/entertainment, telecom, targeted
    marketing, health care,
  • Web
  • search engines, advertising, web and text mining,
  • Government
  • surveillance, crime detection, profiling tax
    cheaters,

15
Data Mining for Customer Modeling
  • Customer Tasks
  • attrition prediction
  • targeted marketing
  • cross-sell, customer acquisition
  • credit-risk
  • fraud detection
  • Industries
  • banking, telecom, retail sales,

16
Customer Attrition Case Study
  • Situation Attrition rate at for mobile phone
    customers is around 25-30 a year !
  • With this in mind, what is our task?
  • Assume we have customer information for the past
    N months.

17
Customer Attrition Case Study
  • Task
  • Predict who is likely to attrite next month.
  • Estimate customer value and what is the
    cost-effective offer to be made to this customer.

18
Customer Attrition Results
  • Verizon Wireless built a customer data warehouse
  • Identified potential attriters
  • Developed multiple, regional models
  • Targeted customers with high propensity to accept
    the offer
  • Reduced attrition rate from over 2/month to
    under 1.5/month (huge impact, with gt30 M
    subscribers)
  • (Reported in 2003)

19
Assessing Credit Risk Case Study
  • Situation Person applies for a loan
  • Task Should a bank approve the loan?
  • Note People who have the best credit dont need
    the loans, and people with worst credit are not
    likely to repay. Banks best customers are in
    the middle

20
Credit Risk - Results
  • Banks develop credit models using variety of
    machine learning methods.
  • Mortgage and credit card proliferation are the
    results of being able to successfully predict if
    a person is likely to default on a loan
  • Widely deployed in many countries

21
Successful e-commerce Case Study
  • Task Recommend other books (products) this
    person is likely to buy
  • Amazon does clustering based on books bought
  • customers who bought Advances in Knowledge
    Discovery and Data Mining, also bought Data
    Mining Practical Machine Learning Tools and
    Techniques with Java Implementations
  • Recommendation program is quite successful

22
Security and Fraud Detection - Case Study
  • Credit Card Fraud Detection
  • Detection of Money laundering
  • FAIS (US Treasury)
  • Securities Fraud
  • NASDAQ KDD system
  • Phone fraud
  • ATT, Bell Atlantic, British Telecom/MCI
  • Bio-terrorism detection at Salt Lake Olympics 2002

23
Example ProblemHandwritten Digit Recognition
Handcrafted rules will result in large no.
of rules and Exceptions Better to have a
machine that learns from a large training set
Wide variability of same numeral
24
Chess Game
In 1997, Deep Blue(IBM) beat Garry Kasparov(?).
Let IBMs stock increase by 18 billion at
that year
25
Some Successful Applications ofMachine Learning
  • Learning to drive an
  • autonomous vehicle
  • Train computer-controlled vehicles
  • to steer correctly
  • Drive at 70 mph for 90 miles on public
  • highways
  • Associate steering commands with
  • image sequence
  • 1200 computer-generated images as
  • training examples
  • Half-hour training

An additional information from previous image
indicating the darkness or lightness of the road
26
Some Successful Applications ofMachine Learning
  • Learning to recognize spoken words
  • Speech recognition/synthesis
  • Natural language understanding/generation
  • Machine translation

27
Example 1 visual object categorization
  • A classification problem predict category y
    based on image x.
  • Little chance to hand-craft a solution, without
    learning.
  • Applications robotics, HCI, web search (a real
    image Google..)

28
Face Recognition - 1
Given multiple angles/ views of a person, learn
to identify them. Learn to distinguish male from
female faces.
29
Face Recognition - 2
Learn to recongnize emotions, gestures Li, Ye,
Kambhametta, 2003
30
Robot
Sony AIBO robot Available on June 1, 1999
Weight 1.6 KG Adaptive learning and
growth capabilities Simulate emotion such as
happiness and anger
31
Robot
Honda ASIMO (Advanced Step in Innovate
MObility) Born on 31 October, 2001
Height 120 CM, Weight 52 KG
http//blog.makezine.com/archive/2009/08/asimo_avo
ids_moving_obstacles.html?CMPOTC-0D6B48984890
32
Biomedical / Biometrics
  • Medicine
  • Screening
  • Diagnosis and prognosis
  • Drug discovery
  • Security
  • Face recognition
  • Signature / fingerprint
  • DNA fingerprinting

33
Computer / Internet
  • Computer interfaces
  • Troubleshooting wizards
  • Handwriting and speech
  • Brain waves
  • Internet
  • Spam filtering
  • Text categorization
  • Text translation
  • Recommendation

7
34
Classification
  • Assign object/event to one of a given finite set
    of categories.
  • Medical diagnosis
  • Credit card applications or transactions
  • Fraud detection in e-commerce
  • Worm detection in network packets
  • Spam filtering in email
  • Recommended articles in a newspaper
  • Recommended books, movies, music, or jokes
  • Financial investments
  • DNA sequences
  • Spoken words
  • Handwritten letters
  • Astronomical images

35
Problem Solving / Planning / Control
  • Performing actions in an environment in order to
    achieve a goal.
  • Solving calculus problems
  • Playing checkers, chess, or backgammon
  • Driving a car or a jeep
  • Flying a plane, helicopter, or rocket
  • Controlling an elevator
  • Controlling a character in a video game
  • Controlling a mobile robot

36
Applications
37
Disciplines Related with Machine Learning
  • Artificial intelligence
  • ?? ?? ??, ????, ????, ????? ??
  • Bayesian methods
  • ?? ????? ??, naĂŻve Bayes classifier, unobserved
    ?? ? ??
  • Computational complexity theory
  • ?? ??, ?? ???? ??, ??? ? ?? ??? ??? ??? ??
  • Control theory
  • ?? ??? ??? ????? ????? ?? ?? ??? ??

38
Disciplines Related with Machine Learning (2)
  • Information theory
  • Entropy? Information Content? ??, Minimum
    Description Length, Optimal Code ? Optimal
    Training? ??
  • Philosophy
  • Occams Razor, ???? ??? ??
  • Psychology and neurobiology
  • Neural network models
  • Statistics
  • ??? ??? ??? ???? ??? ???, ????, ??? ??

39
Definition of learning
  • Definition A computer program is said to learn
    from experience E with respect to some class of
    tasks T and performance measure P, if its
    performance at tasks in T, as measured by P,
    improves with experience E

40
Example checkers
Task T Playing checkers. Performance measure P
of games won. Training experience E Practice
games by playing against itself.
41
Example Recognizing handwritten letters
Task T Recognizing and classifying handwritten
words within images. Performance
measure P words correctly classified. Training
experience E A database of handwritten words
with given
classifications.
42
Example Robot driving
Task T Driving on public four-lane highway using
vision sensors. Performance measure P Average
distance traveled before an error (as judged by a
human overseer). Training experience E A
sequence of images and steering commands recorded
while observing a human driver.
43
Designing a learning system
Task T Playing checkers. Performance measure P
of games won. Training experience E Practice
games by playing against itself.
What does this mean? and what can we learn
from it?
44
Measuring Performance
  • Classification Accuracy
  • Solution correctness
  • Solution quality (length, efficiency)
  • Speed of performance

45
Designing a Learning System
  • 1. Choose the training experience
  • 2. Choose exactly what is to be learned, i.e. the
    target function.
  • 3. Choose how to represent the target function.
  • 4. Choose a learning algorithm to infer the
    target function from the experience.

Learner
Environment/ Experience
Knowledge
Performance Element
46
Designing a Learning System
1. Choosing the Training Experience
  • Key Attributes
  • Direct/indirect feedback ? ????? ?
  • Direct feedback checkers state and correct move
  • Indirect feedback move sequence and final
    outcomes
  • Credit assignment problem
  • Degree of controlling the sequence of training
    example
  • Learner? ?? ??? ?? ? teacher? ??? ?? ??
  • Distribution of examples
  • Train examples? ??? Test examples? ??? ??
  • ???? ??? ???? ???? ?? ??? ? ???? ?
  • ??? ??? ??? ?? (The Checkers World Champion? ??
    ??? ??? ??? ?? ?? ?)

47
Training vs. Test Distribution
  • Generally assume that the training and test
    examples are independently drawn from the same
    overall distribution of data.
  • IID Independently and identically distributed
  • If examples are not independent, requires
    collective classification.
  • (e.g. communication network, financial
    transaction network, social network? ?? ?? ????
    ?)
  • If test distribution is different, requires
    transfer learning. that is, achieving cumulative
    learning

48
?? Transfer learning
  • Transfer learning is what happens when someone
    finds it much easier to learn to play chess
    having already learned to play checkers
  • or to recognize tables having already learned
    to recognize chairs
  • or to learn Spanish having already learned
    Italian.
  • Achieving significant levels of transfer learning
    across tasks -- that is, achieving cumulative
    learning -- is perhaps the central problem facing
    machine learning.

49
Training Experience
  • Direct experience Given sample input and output
    pairs for a useful target function.
  • Checker boards labeled with the correct move,
    e.g. extracted from record of expert play
  • Indirect experience Given feedback which is not
    direct I/O pairs for a useful target function.
  • Potentially arbitrary sequences of game moves and
    their final game results.
  • Credit/Blame Assignment Problem How to assign
    credit blame to individual moves given only
    indirect feedback?

50
Source of Training Data
  • Provided random examples outside of the learners
    control. (??? ???? ??? ??)
  • Negative examples available or only positive?
  • Good training examples selected by a benevolent
    teacher. (Teacher ? ??)
  • Near miss examples
  • Learner can query an oracle about class of an
    unlabeled example in the environment. (??? ?? ??)
  • Learner can construct an arbitrary example and
    query an oracle for its label. (??? ???? ??? ??)
  • Learner can design and run experiments directly
    in the environment without any human guidance.
  • (??? ???? ???? ??? ???? ???)

51
Designing a Learning System
  • 1. Choose the training experience
  • 2. Choose exactly what is to be learned, i.e. the
    target function.
  • 3. Choose how to represent the target function.
  • 4. Choose a learning algorithm to infer the
    target function from the experience.

Learner
Environment/ Experience
Knowledge
Performance Element
52
Designing a Learning System
2. Choosing a Target Function
  • ?? ??? ?????, ?? ???? ??? ??? ?? ?? ? ??? ?
  • ?? ??????, ??? ????? ??? ????? ???? ??? ??,
    ??? ??? ???? ????.
  • Could learn a function
  • 1. ChooseMove B ?M(??? ???)
  • Or
  • 2. Evaluation function, V B ? R
  • ? ??? ??? ?? ??? ????? ?? ??? ???.
  • V? ??? ???? ?? ??? ??? ?? ????? ???? ??
  • ??? ?? ?? ??? ?? ? ?? ???? ???? ??.

53
Designing a Learning System
2. Choosing the Target Function
  • A function that chooses the best move M for any B
  • ChooseMove B ?M
  • Difficult to learn
  • It is useful to reduce the problem of improving
    performance P at task T, to the problem of
    learning some particular target function.
  • An evaluation function that assigns a numerical
    score to any B
  • V B ? R

54
The start of the learning work
  • Instead of learning ChooseMove we establish a
    value function
  • target function, V B ? R
  • that maps any legal board state in B to some real
    value in R.
  • ?? Position??? ? Position? ???? ?? Score ? ???
    ??? ???? ????? ??.
  • 1. if b is a final board state that is won, then
    V (b) 100.
  • 2. if b is a final board state that is lost, then
    V (b) -100.
  • 3. if b is a final board state that is drawn,
    then V (b) 0.
  • 4. if b is not a final board state, then V (b)

55
The start of the learning work
  • Instead of learning ChooseMove we establish a
    value function
  • target function, V B ? R
  • that maps any legal board state in B to some real
    value in R.
  • .
  • ?? Position??? ? Position? ???? ?? Score ? ???
    ??? ???? ????? ??.
  • 1. if b is a final board state that is won, then
    V (b) 100.
  • 2. if b is a final board state that is lost, then
    V (b) -100.
  • 3. if b is a final board state that is drawn,
    then V (b) 0.
  • 4. if b is not a final board state, then V (b)
    V (b),
  • ??? b ??? ? ?? ??? ??? ?? (the best final
    board state)
  • (? ???? ???? ?? ??? ??)
  • Unfortunately, this did not take us any further!

56
Approximating V(b)
  • Computing V(b) is intractable since it involves
    searching the complete exponential game tree.
  • Therefore, this definition is said to be
    non-operational.
  • An operational definition can be computed in
    reasonable (polynomial) time.
  • Need to learn an operational approximation to the
    ideal evaluation function.

57
Designing a Learning System
  • 1. Choose the training experience
  • 2. Choose exactly what is to be learned, i.e. the
    target function.
  • 3. Choose how to represent the target function.
  • 4. Choose a learning algorithm to infer the
    target function from the experience.

Learner
Environment/ Experience
Knowledge
Performance Element
58
3. Choosing a Representation for the Target
Function
  • Describing the function
  • Tables
  • Rules
  • Polynomial functions
  • Neural nets
  • Trade-off in choice
  • Expressive power
  • Size of training data
  • ? ?? ??? ? ?? ??? ?? ??? ? ?? ??. ? ?? ?? ????
    ?? ??? ?? ? ??.
  • ??? ??? ??? ?? ??? ? ?? ??? ???? ? ??? ???? ??.

59
Approximate representation
w1 - w6 weights
60
Linear Function for Representing V(b)
  • Use a linear approximation of the evaluation
    function.

(b) w0 w1x1 w2x2 w3x3 w4x4 w5x5
w6x6
61
Designing a Learning System
  • 1. Choose the training experience
  • 2. Choose exactly what is to be learned, i.e. the
    target function.
  • 3. Choose how to represent the target function.
  • 4. Choose a learning algorithm to infer the
    target function from the experience.

Learner
Environment/ Experience
Knowledge
Performance Element
62
4. Choosing a Function Approximation Algorithm
  • A training example is represented as an ordered
    pair ltb, Vtrain(b) gt
  • b board state
  • Vtrain(b) training value for b
  • Instance black has won the game
  • ltltx13, x20, x31, x40, x50,
    x60gt, 100gt
  • (x2 0) indicates that white has
    no remaining pieces.
  • Estimating training values for intermediate board
    states
  • Vtrain(b) ? (Successor(b))
  • current approximation to V, (? the learned
    function, hypothesis)
  • Successor(b) the next board state, ? b1 state
  • ??? b ????? ?? training value? ? ????? ???
    ??(b1)? ??? ???? ??

63
DESIGNING A LEARNING SYSTEM
Estimating Training Values
64
???? Temporal Difference Learning
  • Estimate training values for intermediate
    (non-terminal) board positions by the estimated
    value of their successor in an actual game trace.
  • where successor(b) is the next board position
  • where it is the programs move in actual
    play.
  • Values towards the end of the game are initially
    more accurate and continued training slowly
    backs up accurate values to earlier board
    positions.

65
How to learn?
66
How to learn?
67
How to change the weights?
68
How to change the weights?
69
Obtaining Training Values
  • Direct supervision may be available for the
    target function.
  • With indirect feedback, training values can be
    estimated using temporal difference learning
    (used in reinforcement learning where supervision
    is delayed reward).

70
Learning Algorithm
  • Uses training values for the target function to
    induce a hypothesized definition that fits these
    examples and hopefully generalizes to unseen
    examples.
  • In statistics, learning to approximate a
    continuous function is called regression.
  • Attempts to minimize some measure of error (loss
    function) such as mean squared error

71
The LMS(Least Mean Square) weight update rule
  • Due to mathematical reasoning, the following
    update rule is very sensible.

72
LMS Discussion
  • Intuitively, LMS executes the following rules
  • ??? ??? ??(the output for an example ) ? ?????,
    ??? ?? ???.
  • ??? ??? ?? ? ?? ?? ???, ?? features? ??
    ???? weight?? ???. ??? ???? ??? ??? ???? ??.
  • ??? ??? ?? ? ?? ?? ???, ?? features? ??
    ???? weight?? ???. ??? ???? ??? ??? ???? ??.
  • Under the proper weak assumptions, LMS can be
    proven to eventually converge to a set of weights
    that minimizes the mean squared error.

73
Lessons Learned about Learning
  • Learning? ?
  • ??? target function ? ???(approximation) ??
    ?? direct or indirect experience? ????.
  • Function approximation ?? ?
  • a space of hypotheses?? training data?? ?? ???
    ??(hypotheses)? ???? Search? ?? ?
  • Different learning methods assume different
    hypothesis spaces (representation languages)
    and/or employ different search techniques.

74
Various Function Representations
  • Numerical functions
  • Linear regression
  • Neural networks
  • Support vector machines
  • Symbolic functions
  • Decision trees
  • Rules in propositional logic
  • Rules in first-order predicate logic
  • Instance-based functions
  • Nearest-neighbor
  • Case-based
  • Probabilistic Graphical Models
  • NaĂŻve Bayes
  • Bayesian networks
  • Hidden-Markov Models (HMMs)
  • Probabilistic Context Free Grammars (PCFGs)
  • Markov networks

75
Various Search Algorithms
  • Gradient descent
  • Perceptron
  • Backpropagation
  • Dynamic Programming
  • HMM Learning
  • Probabilistic Context Free Grammars (PCFGs)
    Learning
  • Divide and Conquer
  • Decision tree induction
  • Rule learning
  • Evolutionary Computation
  • Genetic Algorithms (GAs)
  • Genetic Programming (GP)
  • Neuro-evolution

76
Evaluation of Learning Systems
  • Experimental
  • Conduct controlled cross-validation experiments
    to compare various methods on a variety of
    benchmark datasets.
  • Gather data on their performance, e.g. test
    accuracy, training-time, testing-time.
  • Analyze differences for statistical significance.
  • Theoretical
  • Analyze algorithms mathematically and prove
    theorems about their
  • Computational complexity
  • Ability to fit training data
  • Sample complexity (number of training examples
    needed to learn an accurate function)

77
Core parts of the machine learning
( )
(Initial game board)
(Game history)
Many machine learning systems can be usefully
characterized in terms of these four generic
modules.
78
Four Components of a Learning System(1)
  • Performance system
  • - Solve the given performance task
  • - Use the learned target function
  • - New problem ? trace of its solution
  • Critic
  • - Output a set of training examples of the
    target function

79
Four Components of a Learning System (2)
  • Generalizer
  • Input training example
  • Output hypothesis (estimate of the target
    function)
  • Generalizes from the specific training examples
  • Hypothesizes a general function
  • Experiment generator
  • Input - current hypothesis
  • Output - a new problem
  • Picks new practice problem maximizing the
    learning rate

80
History of Machine Learning
  • 1950s
  • Samuels checker player
  • Selfridges Pandemonium
  • 1960s
  • Neural networks Perceptron
  • Pattern recognition
  • Learning in the limit theory
  • Minsky and Papert prove limitations of Perceptron
  • 1970s
  • Symbolic concept induction
  • Winstons arch learner
  • Expert systems and the knowledge acquisition
    bottleneck
  • Quinlans ID3
  • Michalskis AQ and soybean diagnosis
  • Scientific discovery with BACON
  • Mathematical discovery with AM

81
History of Machine Learning (cont.)
  • 1980s
  • Advanced decision tree and rule learning
  • Explanation-based Learning (EBL)
  • Learning and planning and problem solving
  • Utility problem
  • Analogy
  • Cognitive architectures
  • Resurgence of neural networks (connectionism,
    backpropagation)
  • Valiants PAC Learning Theory
  • Focus on experimental methodology
  • 1990s
  • Data mining
  • Adaptive software agents and web applications
  • Text learning
  • Reinforcement learning (RL)
  • Inductive Logic Programming (ILP)
  • Ensembles Bagging, Boosting, and Stacking
  • Bayes Net learning

82
History of Machine Learning (cont.)
  • 2000s
  • Support vector machines
  • Kernel methods
  • Graphical models
  • Statistical relational learning
  • Transfer learning
  • Sequence labeling
  • Collective classification and structured outputs
  • Computer Systems Applications
  • Compilers
  • Debugging
  • Graphics
  • Security (intrusion, virus, and worm detection)
  • E mail management
  • Personalized assistants that learn
  • Learning in robotics and vision

83
Remind
  • Learning as search in a space of possible
    hypotheses
  • Learning methods are characterized by their
    search strategies and by the underlying structure
    of the search spaces.

84
Issues in Machine Learning
  • ?? ????? ?? general target function? ?? ??? ????
    ??
  • ??? data? ??? ??? ??? ?
  • ??? ?? ??? ??? ??? ??? ??? ??? ??? ?
  • ? ?? ???? ????? ???(Complexity)? ???? ??
  • Function approximation? ?? ??? ??? ??? ?
  • Target function? ??? ???? ?? ?? ???? ?
Write a Comment
User Comments (0)
About PowerShow.com