CSCI 548/B480: Introduction to Bioinformatics Fall 2002 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

CSCI 548/B480: Introduction to Bioinformatics Fall 2002

Description:

to Bioinformatics. Classification A Two-Step Process ... to Bioinformatics '... the metaphor underlying genetic algorithms is that of natural evolution. ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 41
Provided by: Jeffre
Category:

less

Transcript and Presenter's Notes

Title: CSCI 548/B480: Introduction to Bioinformatics Fall 2002


1
CSCI 548/B480 Introduction to
BioinformaticsFall 2002
Topic 5 Machine Intelligence - Learning and
Evolution
  • Dr. Jeffrey Huang, Assistant Professor
  • Department of Computer and Information Science,
    IUPUI
  • E-mail huang_at_cs.iupui.edu

2
Machine Intelligence
  • Machine Learning
  • The subfield of AI concerned with intelligent
    systems that learn.
  • The computational study of algorithms that
    improve performance based on experience.
  • The attempt to build intelligent entities
  • We must understand intelligent entities first
  • Computational Brain
  • Mathematics
  • Philosophy staked most of the ideas of AI but to
    make it a formal science the mathematical
    formalization is needed in
  • Computation
  • Logic
  • Probability

3
Behavior-Based AI vs. Knowledge Based
  • Definitions of Machine Learning
  • Reasoning
  • The effort to make computers think and solve
    problem
  • The study of mental faculties through the use of
    computational models
  • Behavior
  • Make machines to perform human actions requiring
    intelligence
  • Seeks to explain intelligent behavior in terms of
    computational processes
  • Agents

4
Operational Agents
  • Operational Views of Intelligence
  • The ability to perform intellectual tasks
  • Prove theorems, play chess, solve puzzle
  • Focus on what goes on between the ears
  • Emphasize the ability to build and effectively
    use mental models
  • The ability to perform intellectually challenging
    real world tasks
  • Medical diagnosis, tax advising, financial
    investing
  • Introduce new issues such as critical
    interactions with the world, model grounding,
    uncertainty
  • The ability to survive, adapt, and function in a
    constantly changing world
  • Autonomous agents
  • Vision, locomotion, and manipulation, many I/O
    issues
  • Self-assessment, learning, curiosity, etc.

5
Building Intelligent Artifacts
  • Symbolic Approaches
  • Construct goal-oriented symbol manipulation
    systems
  • Focus on high end abstract thinking
  • Non-symbolic approaches
  • Build performance-oriented systems
  • Focus on behavior
  • Need both in tightly coupled form
  • Difficult in building such systems
  • Growing need to automate this process
  • Good approach Evolutionary Algorithms

6
  • Behavior-Based AI
  • Behavior-Based AI vs. Knowledge-Based
  • "Situated" in environment
  • Multiple competencies ('routines')
  • Autonomy
  • Adaptation and Competition
  • Artificial Life (A-Life)
  • Agents Reactive Behavior
  • Abstracting the logical principles of living
    organism
  • Collective Behavior Competition and Cooperation

7
Classification vs. Prediction
  • Classification
  • predicts categorical class labels
  • classifies data (constructs a model) based on the
    training set and the values (class labels) in a
    classifying attribute and uses it in classifying
    new data
  • Prediction
  • models continuous-valued functions, i.e.,
    predicts unknown or missing values

8
ClassificationA Two-Step Process
  • Model construction describing a set of
    predetermined classes
  • Each tuple/sample is assumed to belong to a
    predefined class, as determined by the class
    label attribute
  • The set of tuples used for model construction
    training set
  • The model is represented as classification rules,
    decision trees, or mathematical formulae
  • Model usage for classifying future or unknown
    objects
  • Estimate accuracy of the model
  • The known label of test sample is compared with
    the classified result from the model
  • Accuracy rate is the percentage of test set
    samples that are correctly classified by the
    model
  • Test set is independent of training set,
    otherwise over-fitting will occur

9
Classification Process
Model Construction
Classification Algorithms
IF rank professor OR years gt 6 THEN tenured
yes
Use the Model in Prediction
(Jeff, Professor, 2)
Tenured?
10
Supervised vs. Unsupervised Learning
  • Supervised learning (classification)
  • Supervision The training data (observations,
    measurements, etc.) are accompanied by labels
    indicating the class of the observations
  • New data is classified based on the training set
  • Unsupervised learning (clustering)
  • The class labels of training data is unknown
  • Given a set of measurements, observations, etc.
    with the aim of establishing the existence of
    classes or clusters in the data

11
Classification and Prediction
  • Data Preparation
  • Data cleaning
  • Preprocess data in order to reduce noise and
    handle missing values
  • Relevance analysis (feature selection)
  • Remove the irrelevant or redundant attributes
  • Data transformation
  • Generalize and/or normalize data
  • Evaluating Classification Methods
  • Predictive accuracy
  • Speed and scalability
  • time to construct the model
  • time to use the model
  • Robustness handling noise and missing values
  • Scalability efficiency in disk-resident
    databases
  • Interpretability understanding and insight
    provided by the model
  • Goodness of rules
  • decision tree size
  • compactness of classification rules

12
From Learning to Evolutionary
  • Optimization
  • Accomplishing abstract task Solving problem
  • searching through a space of potential
    solution
  • finding the best solution
  • ? an optimization process
  • Classical Exhaustive Methods??
  • Large Space?? Special machine learning technique
  • Evolution Algorithms
  • Stochastic Algorithms
  • Search methods model some phenomena
  • Genetic Inheritance
  • Darwinian strife for survival

13
  • the metaphor underlying genetic algorithms is
    that of natural evolution. In evolution, the
    problem each species faces is one of searching
    for beneficial adaptations to a complicated and
    changing environment. The knowledge that each
    species has gained is embodied in the makeup of
    chromosomes of its members
  • - L. David and M. Steenstrup, Genetic Algorithms
    and Simulated Annealing, pp. 1-11, Kaufmann,
    1987

14
The Essence Components
  • Genetic representation for potential solutions to
    the problem
  • A way to create an Initial population of
    potential solutions
  • An evaluation function that plays the ole of the
    environment, rating solutions in term of their
    fitness
  • i.e. the use of fitness to determine survival
    and reproductive rates
  • Genetic operators that alter the composition of
    children

15
Evolutionary Algorithm Search Procedure
16
Historical Background
  • Three paradigms emerged in the 1960s
  • Genetic Algorithms
  • Introduced by Holland (MSU) ? De Jong (GMU)
  • Envisioned for broad range of adaptive systems
  • Evolution Strategies
  • Introduced by Rechenberg
  • Focused on real-valued parameter optimization
  • Evolutionary Programming
  • Introduced by Fogel and Koza
  • Applied to AI and machine learning problem
  • Today
  • Wide variety of evolutionary algorithms
  • Applied to many area of science and engineering

17
Examples of Evolutionary AI
  • Parameter Tuning
  • Pervasiveness of parameterized models
  • Complex behavioral changes due to non-linear
    interactions
  • Example
  • Weights of an Artificial Neural networks
  • Parameters of a heuristic evolution function
  • Parameter of a rule induction system
  • Parameter of membership functions
  • Goal evolve over time useful set of discrete/
    continuous parameter

18
  • Evolving Structure
  • Effect behavior change via more complex
    structures
  • Example
  • Selecting/constructing the topology of ANNs
  • Selecting/constructing the feature sets
  • Selecting/constructing plans/scenarios
  • Selecting/constructing membership functions
  • Goal evolve useful structure over time
  • Evolving Programs
  • Goal acquire new behaviors and adapt existing
    ones
  • Example
  • Acquire/adapt behavioral rules sets
  • Acquire/adapt arm/joint control programs
  • Acquire/adapt task-oriented programming code

19
How Does Genetic Algorithm Work?
  • A simple example of function optimization
  • Find max f(x)x2, for x? 0, 4
  • Representation
  • Genotype (chromosome) internally points in the
    search space are represented as (binary) string
    over some alphabet
  • Phenotype the expressed traits of an individual
  • With a precision for x in 0,4 of 10-4 it
    needs14 bits
  • 8,000 ? 213 lt 10,000 lt 214 ? 16,000
  • Simple fixed length binary
  • Assigned 0.0 to the string 00 0000 0000 0000
  • Assign 0.0 bin2dec(binary string)4/(214 -1)
  • the string 00 0000 0000 0001 and so on
  • Phenotype 4.0 genotype 11 1111 1111 1111

20
00000000000000 00000000000001 11111111111111
0.0 4/(214 -1) 4.0
genotype
Phenotype
  • Initial population
  • Create a population (pop_size) of chromosomes,
    where each chromosome is a binary vector of 14
    bits
  • All 14 bits for each chromosome are initialized
    randomly
  • Evaluation function
  • Evaluation function eval for binary vectors v is
    equal to the function f
  • eval(v) f(x)
  • ex eval(v1) f(x1) fitness1

21
  • Parameters
  • pop_size 24,
  • Prob. of Xover, pc 0.6,
  • Prob. of mutation, pm 0.01
  • Recombination using genetic operations
  • Crossover (pc)
  • v1 01111100010011 gt v1 01110101011100
  • v2 00010101011100 gt v2 00011100010011
  • Mutation (pm)
  • v2 00011100010011 gt v2 00011110010011

22
  • Selection M(t) from M(t1) using roulette wheel
  • Total fitness of the population
  • Probability of selection probi for each
    chromosome vi
  • Cumulative prob qi
  • Generate random numbers rj, from 0,1, where j
    1pop_size
  • Select chromosome vi such that qi-1 lt rj lt qi

23
(No Transcript)
24
Homing to the Optimal Solution
25
Best-so-far Curve
26
Optimal Feature Subset
  • Search for the Subsets of Discriminatory Features
  • Combination optimization problem
  • Two general approaches to identifying optimal
    subsets of features
  • Abstract measurement for important properties of
    good feature sets
  • Orthogonality (ex. PCA), information content, low
    variance
  • Less expensive process
  • Fall in suboptimal performance if the abstract
    measures do not correlate well with actual
    performance
  • Building a classifier from the feature subset and
    evaluating its performance on actual
    classification tasks.
  • Better classification performance
  • the cost of building and testing classifiers
    prohibits any kind of systematic evaluation of
    feature subsets
  • suboptimal in practice large numbers of
    candidate features cannot be handled by any form
    of systematic search
  • 2N possible candidate subsets of N features.

27
Inductive Learning
  • Learning From Examples
  • Decision Tree (DT)
  • Information Theory (IT)
  • Question what are the BEST attributes
    (Features) for building the decision tree?
  • Answer BEST attribute is the one that it is
    MOST informative and for whom
    ambiguity/uncertainty is least
  • Solution Measure (information) contents using
    the expected amount of information provided by
    the attribute

28
Classification by Decision Tree Induction
  • Decision tree
  • A flow-chart-like tree structure
  • Internal node denotes a test on an attribute
  • Branch represents an outcome of the test
  • Leaf nodes represent class labels or class
  • distribution
  • Decision tree generation consists of two phases
  • Tree construction
  • At start, all the training examples are at the
    root
  • Partition examples recursively based on selected
    attributes
  • Tree pruning
  • Identify and remove branches that reflect noise
    or outliers
  • Use of decision tree Classifying an unknown
    sample
  • Test the attribute values of the sample against
    the decision tree

Exs. Class Size Color Surface
1 A Small Yellow Smooth
2 A Medium Red Smooth
3 A Medium Red Smooth
4 A Big Red Rough
5 B Medium Yellow Smooth
6 B Medium Yellow Smooth
29
  • Entropy
  • Define an entropy function H such that
  • where pi the probability associated with ith
    class
  • For a feature, the entropy is calculated for each
    value.
  • The sum of the entropy weighted by the
    probability of each value is the entropy for that
    feature
  • Example Toss a fair coin
  • if the coin is not fair, i.e. Pheads 99,
    then
  • So, by tossing the coin you get very little
    (extra) information (that you didnt expect)

30
  • In general, if you have p positive examples, and
    n negative examples
  • For p n ? H 1
  • i.e. originally there is most uncertainty on the
    eventual outcome (picking up an example) and most
    to gain by picking the example.

31
Decision Tree Induction
  • Basic algorithm (a greedy algorithm)
  • Tree is constructed in a top-down recursive
    divide-and-conquer manner
  • At start, all the training examples are at the
    root
  • Attributes are categorical (if continuous-valued,
    they are discretized in advance)
  • Examples are partitioned recursively based on
    selected attributes
  • Test attributes are selected on the basis of a
    heuristic or statistical measure (e.g.,
    information gain)
  • Conditions for stopping partitioning
  • All samples for a given node belong to the same
    class
  • There are no remaining attributes for further
    partitioning
  • Majority voting is employed for classifying the
    leaf
  • There are no samples left

32
Algorithm
  • Select a random subset W (called the window) from
    the training set T
  • Build a DT for the current W
  • Select the best feature which minimizes the
    entropy H (or max. gain)
  • Categorize training instances (examples) into
    subsets by this feature
  • Repeat this process recursively until each subset
    contains instances of one kind (class) or some
    statistical criterion is satisfied
  • Scan the entire training set for exceptions to
    the DT
  • If exceptions are found insert some of them into
    W and repeat from step 2

33
  • Information Gain
  • The information gain from the ? attribute test is
    defined as the difference between the original
    information requirement and the new requirement
  • Note that the Remainder(?) is an weighted (by
    attribute values) entropy function
  • Maximize Gain(?) ? Minimize Remainder(?) and
    then ? is the most informative attribute
    (question)

34
The ID3 Algorithm and Quinlans C4.5
  • C4.5
  • Tutorial http//yoda.cis.temple.edu8080/UGAIWWW/
    lectures/C45/
  • Matlab program http//www.cs.wisc.edu/olvi/uwmp/
    msmt.html
  • See 5/ C5.0
  • Tutorial http//borba.ncc.up.pt/niaad/Software/c5
    0/c50manual.html
  • Software for Win2000 http//www.rulequest.com/dow
    nload.html

35
Exs. Class Size Color Surface
1 A Small Yellow Smooth
2 A Medium Red Smooth
3 A Medium Red Smooth
4 A Big Red Rough
5 B Medium Yellow Smooth
6 B Medium Yellow Smooth
  • Example

36
  • Noise and Overfitting
  • Question what about two or more examples with
    the same description but different
    classifications?
  • Answer Each leaf node reports either MAJORITY
    classification or relative frequencies
  • Question what about irrelevant attributes (noise
    and overfitting)?
  • Answer Tree pruning
  • Solution An information gain close to zero is a
    good clue to irrelevance, actual number of ()
    and (-) exs. In each subset i, pi and ni vs.
    expected numbers pi and ni assuming true
    irrelevance
  • Where p and n are the total number of positive
    and negative exs to start with.
  • Total deviation (regarding statistical
    significant)
  • Under the null hypothesis, D chi-squared
    distribution

37
Extracting Classification Rules from Trees
  • Represent the knowledge in the form of IF-THEN
    rules
  • One rule is created for each path from the root
    to a leaf
  • Each attribute-value pair along a path forms a
    conjunction
  • The leaf node holds the class prediction
  • Rules are easier for humans to understand
  • Example
  • IF age lt30 AND student no THEN
    buys_computer no
  • IF age lt30 AND student yes THEN
    buys_computer yes
  • IF age 3140 THEN buys_computer yes
  • IF age gt40 AND credit_rating excellent
    THEN buys_computer yes
  • IF age gt40 AND credit_rating fair THEN
    buys_computer no

38
Decision Tree
  • Avoid Overfitting in Classification
  • The generated tree may overfit the training data
  • Too many branches, some may reflect anomalies due
    to noise or outliers
  • Result is in poor accuracy for unseen samples
  • Two approaches to avoid overfitting
  • Prepruning Halt tree construction earlydo not
    split a node if this would result in the goodness
    measure falling below a threshold
  • Difficult to choose an appropriate threshold
  • Postpruning Remove branches from a fully grown
    treeget a sequence of progressively pruned trees
  • Use a set of data different from the training
    data to decide which is the best pruned tree

39
  • Approaches to Determine the Final Tree Size
  • Separate training (2/3) and testing (1/3) sets
  • Use cross validation, e.g., 10-fold cross
    validation
  • Use all the data for training
  • but apply a statistical test (e.g., chi-square)
    to estimate whether expanding or pruning a node
    may improve the entire distribution
  • Use minimum description length (MDL) principle
  • halting growth of the tree when the encoding is
    minimized

40
Decision Tree
  • Enhancements to basic decision tree induction
  • Allow for continuous-valued attributes
  • Dynamically define new discrete-valued attributes
    that partition the continuous attribute value
    into a discrete set of intervals
  • Handle missing attribute values
  • Assign the most common value of the attribute
  • Assign probability to each of the possible values
  • Attribute construction
  • Create new attributes based on existing ones that
    are sparsely represented
  • This reduces fragmentation, repetition, and
    replication
Write a Comment
User Comments (0)
About PowerShow.com