CS364 Artificial Intelligence Machine Learning - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

CS364 Artificial Intelligence Machine Learning

Description:

Evaluate which of the acquisition methods would be most appropriate in a given ... Categorise and evaluate AI techniques according to different criteria such as ... – PowerPoint PPT presentation

Number of Views:651
Avg rating:3.0/5.0
Slides: 49
Provided by: portalS3
Category:

less

Transcript and Presenter's Notes

Title: CS364 Artificial Intelligence Machine Learning


1
CS364 Artificial Intelligence Machine Learning
  • Matthew Casey

portal.surrey.ac.uk/computing/resources/l3/cs364
2
Learning Outcomes
  • Describe methods for acquiring human knowledge
  • Through experience
  • Evaluate which of the acquisition methods would
    be most appropriate in a given situation
  • Limited data available through example

3
Learning Outcomes
  • Describe techniques for representing acquired
    knowledge in a way that facilitates automated
    reasoning over the knowledge
  • Generalise experience to novel situations
  • Categorise and evaluate AI techniques according
    to different criteria such as applicability and
    ease of use, and intelligently participate in the
    selection of the appropriate techniques and
    tools, to solve simple problems
  • Strategies to overcome the knowledge engineering
    bottleneck

4
What is Learning?
  • The action of receiving instruction or acquiring
    knowledge
  • A process which leads to the modification of
    behaviour or the acquisition of new abilities or
    responses, and which is additional to natural
    development by growth or maturation

Source Oxford English Dictionary Online
http//www.oed.com/, accessed October 2003
5
Machine Learning
  • Negnevitsky
  • In general, machine learning involves adaptive
    mechanisms that enable computers to learn from
    experience, learn by example and learn by
    analogy (2005165)
  • Callan
  • A machine or software tool would not be viewed
    as intelligent if it could not adapt to changes
    in its environment (2003225)
  • Luger
  • Intelligent agents must be able to change
    through the course of their interactions with the
    world (2002351)

6
Types of Learning
  • Inductive learning
  • Learning from examples
  • Evolutionary/genetic learning
  • Shaping a population of individual solutions
    through survival of the fittest
  • Emergent learning
  • Learning through social interaction game of life

7
Inductive Learning
  • Supervised learning
  • Training examples with a known classification
    from a teacher
  • Unsupervised learning
  • No pre-classification of training examples
  • Competitive learning learning through
    competition on training examples

8
Key Concepts
  • Learn from experience
  • Through examples, analogy or discovery
  • To adapt
  • Changes in response to interaction
  • Generalisation
  • To use experience to form a response to novel
    situations

9
Generalisation
10
Machine Learning
  • Techniques and algorithms that adapt through
    experience
  • Used for
  • Interpretation / visualisation summarise data
  • Prediction time series / stock data
  • Classification malignant or benign tumours
  • Regression curve fitting
  • Discovery data mining / pattern discovery

11
Why?
  • Complexity of task / amount of data
  • Other techniques fail or are computationally
    expensive
  • Problems that cannot be defined
  • Discovery of patterns / data mining
  • Knowledge Engineering Bottleneck
  • Cost and difficulty of building expert systems
    using traditional techniques (Luger 2002351)

12
Common Techniques
  • Least squares
  • Decision trees
  • Support vector machines
  • Boosting
  • Neural networks
  • K-means
  • Genetic algorithms

Hastie, T., Tibshirani, R. Friedman, J.H.
(2001). The Elements of Statistical Learning
Data Mining, Inference, and Prediction. New
York Springer-Verlag.
13
Neural Networks
  • Developed from models of the biology of behaviour
  • Human brain contains of the order of 1010
    neurons, each connecting to 104 others
  • Parallel processing
  • Inhibitory and excitatory
  • Supervised
  • Perceptron, radial basis function
  • Unsupervised
  • Self-organising map

http//cwabacon.pearsoned.com/bookbind/pubbooks/pi
nel_ab/chapter3/deluxe.html
14
Neural Networks
  • Output is a complex (often non-linear)
    combination of the inputs

15
Decision Trees
  • A map of the reasoning process, good at solving
    classification problems (Negnevitsky, 2005)
  • A decision tree represents a number of different
    attributes and values
  • Nodes represent attributes
  • Branches represent values of the attributes
  • Path through a tree represents a decision
  • Tree can be associated with rules

16
Example 1
  • Consider one rule for an ice-cream seller (Callan
    2003241)
  • IF Outlook Sunny
  • AND Temperature Hot
  • THEN Sell

17
Example 1
Branch
Root node
Node
Leaf
18
Construction
  • Concept learning
  • Inducing concepts from examples
  • We can intuitively construct a decision tree for
    a small set of examples
  • Different algorithms used to construct a tree
    based upon the examples
  • Most popular ID3 (Quinlan, 1986)

19
Example 2
Draw a tree based upon the following data
20
Example 2 Tree 1
Y
True
False
3,4 Negative
1,2 Positive
21
Example 2 Tree 2
X
True
False
2,4 Y
1,3 Y
True
False
True
False
2 Positive
4 Negative
1 Positive
3 Negative
22
Which Tree?
  • Different trees can be constructed from the same
    set of examples
  • Which tree is the best?
  • Based upon choice of attributes at each node in
    the tree
  • A split in the tree (branches) should correspond
    to the predictor with the maximum separating
    power
  • Examples can be contradictory
  • Real-life is noisy

23
Example 3
  • Callan (2003242-247)
  • Locating a new bar

24
Choosing Attributes
  • Entropy
  • Measure of disorder (high is bad)
  • For c classification categories
  • Attribute a that has value v
  • Probability of v being in category i is pi
  • Entropy E is

25
Entropy Example
  • Choice of attributes
  • City/Town, University, Housing Estate, Industrial
    Estate, Transport and Schools
  • City/Town is either Y or N
  • For Y 7 positive examples, 3 negative
  • For N 4 positive examples, 6 negative

26
Entropy Example
  • City/Town as root node
  • For c2 (positive and negative) classification
    categories
  • Attribute aCity/Town that has value vY
  • Probability of vY being in category positive
  • Probability of vY being in category negative

27
Entropy Example
  • City/Town as root node
  • For c2 (positive and negative) classification
    categories
  • Attribute aCity/Town that has value vY
  • Entropy E is

28
Entropy Example
  • City/Town as root node
  • For c2 (positive and negative) classification
    categories
  • Attribute aCity/Town that has value vN
  • Probability of vN being in category positive
  • Probability of vN being in category negative

29
Entropy Example
  • City/Town as root node
  • For c2 (positive and negative) classification
    categories
  • Attribute aCity/Town that has value vN
  • Entropy E is

30
Choosing Attributes
  • Information gain
  • Expected reduction in entropy (high is good)
  • Entropy of whole example set T is E(T)
  • Examples with av, v is jth value are Tj,
  • Entropy E(av)E(Tj)
  • Gain is

31
Information Gain Example
  • For root of tree there are 20 examples
  • For c2 (positive and negative) classification
    categories
  • Probability of being positive with 11 examples
  • Probability of being negative with 9 examples

32
Information Gain Example
  • For root of tree there are 20 examples
  • For c2 (positive and negative) classification
    categories
  • Entropy of all training examples E(T) is

33
Information Gain Example
  • City/Town as root node
  • 10 examples for aCity/Town and value vY
  • 10 examples for aCity/Town and value vN

34
Example 4
  • Calculate the information gain for the Transport
    attribute

35
Information Gain Example
36
Choosing Attributes
  • Chose root node as the attribute that gives the
    highest Information Gain
  • In this case attribute Transport with 0.266
  • Branches from root node then become the values
    associated with the attribute
  • Recursive calculation of attributes/nodes
  • Filter examples by attribute value

37
Recursive Example
  • With Transport as the root node
  • Select examples where Transport is Average
  • (1, 3, 6, 8, 11, 15, 17)
  • Use only these examples to construct this branch
    of the tree
  • Repeat for each attribute (Poor, Good)

38
Final Tree
Transport
7,12,16,19,20 Positive
8 Negative
6 Negative
5,9,14 Positive
2,4,10,13,18 Negative
Callan 2003243
39
ID3
  • Procedure Extend(Tree d, Examples T)
  • Choose best attribute a for root of d
  • Calculate E(av) and Gain(T,a) for each attribute
  • Attribute with highest Gain(T,a) is selected as
    best
  • Assign best attribute a to root of d
  • For each value v of attribute a
  • Create branch for va resulting in sub-tree dj
  • Assign to Tj training examples from T where va
  • Recurse sub-tree with Extend(dj, Tj)

40
Issues
  • Use prior knowledge where available
  • Not all the examples may be needed to construct a
    tree
  • Test generalisation of tree during training and
    stop when desired performance is achieved
  • Prune the tree once constructed
  • Examples may be noisy
  • Examples may contain irrelevant attributes

41
Extracting Rules
  • We can extract rules from decision trees
  • Create one rule for each root-to-leaf path
  • Simplify by combining rules
  • Other techniques are not so transparent
  • Neural networks are often described as black
    boxes it is difficult to understand what the
    network is doing
  • Extraction of rules from trees can help us to
    understand the decision process

42
Rules Example
Transport
G
A
P
1,3,6,8,11,15,17 Housing Estate
2,4,5,9,10,13,14,18 Industrial Estate
7,12,16,19,20 Positive
L
M
S
N
Y
N
11,17 Industrial Estate
1,3,15 University
8 Negative
6 Negative
5,9,14 Positive
2,4,10,13,18 Negative
Y
N
Y
N
17 Negative
11 Positive
15 Negative
1,3 Positive
Callan 2003243
43
Rules Example
  • IF Transport is AverageAND Housing Estate is
    LargeAND Industrial Estate is YesTHEN Positive
  • IF Transport is GoodTHEN Positive

44
Summary
  • What are the benefits/drawbacks of machine
    learning?
  • Are the techniques simple?
  • Are they simple to implement?
  • Are they computationally cheap?
  • Do they learn from experience?
  • Do they generalise well?
  • Can we understand how knowledge is represented?
  • Do they provide perfect solutions?

45
Source Texts
  • Negnevitsky, M. (2005). Artificial Intelligence
    A Guide to Intelligent Systems. 2nd Edition.
    Essex, UK Pearson Education Limited.
  • Chapter 6, pp. 165-168, chapter 9, pp. 349-360.
  • Callan, R. (2003). Artificial Intelligence,
    Basingstoke, UK Palgrave MacMillan.
  • Part 5, chapters 11-17, pp. 225-346.
  • Luger, G.F. (2002). Artificial Intelligence
    Structures Strategies for Complex Problem
    Solving. 4th Edition. London, UK Addison-Wesley.
  • Part IV, chapters 9-11, pp. 349-506.

46
Journals
  • Artificial Intelligence
  • http//www.elsevier.com/locate/issn/00043702
  • http//www.sciencedirect.com/science/journal/00043
    702

47
Articles
  • Quinlan, J.R. (1986). Induction of Decision
    Trees. Machine Learning, vol. 1, pp.81-106.
  • Quinlan, J.R. (1993). C4.5 Programs for Machine
    Learning. San Mateo, CA Morgan Kaufmann
    Publishers.

48
Websites
  • UCI Machine Learning Repository
  • Example data sets for benchmarking
  • http//www.ics.uci.edu/mlearn/MLRepository.html
  • Genetic Algorithms Archive
  • Programs, data, bibliographies, etc.
  • http//www.aic.nrl.navy.mil/galist/
  • Wonders of Math Game of Life
  • Game of life applet and details
  • http//www.math.com/students/wonders/life/life.htm
    l
Write a Comment
User Comments (0)
About PowerShow.com