Evolutionary Search

About This Presentation

Title:

Evolutionary Search

Description:

Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004 – PowerPoint PPT presentation

Number of Views:142

Avg rating:3.0/5.0

Slides: 72

Provided by: peopleCsU9

Learn more at: http://people.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evolutionary Search

1
Evolutionary Search

Artificial Intelligence
CSPP 56553
January 28, 2004

2
Agenda

Motivation
Evolving a solution
Genetic Algorithms
Modeling search as evolution
Mutation
Crossover
Survival of the fittest
Survival of the most diverse
Conclusions

3
Genetic Algorithms Applications

Search parameter space for optimal assignment
Not guaranteed to find optimal, but can approach
Classic optimization problems
E.g. Traveling Salesman Problem
Program design (Genetic Programming)
Aircraft carrier landings

4
Genetic Algorithms Procedure

Create an initial population (1 chromosome)
Mutate 1 genes in 1 chromosomes
Produce one offspring for each chromosome
Mate 1 pairs of chromosomes with crossover
Add mutated offspring chromosomes to pop
Create new population
Best randomly selected (biased by fitness)

5
Fitness

Natural selection Most fit survive
Fitness Probability of survival to next gen
Question How do we measure fitness?
Standard method Relate fitness to quality
0-1 1-9

Chromosome Quality Fitness

1 4 3 1 1 2 1 1
4 3 2 1
0.4 0.3 0.2 0.1
6
Crossover

Genetic design
Identify sets of features 2 genes
floursugar1-9
Population How many chromosomes?
1 initial, 4 max
Mutation How frequent?
1 gene randomly selected, randomly mutated
Crossover Allowed? Yes, select random mates
cross at middle
Duplicates? No
Survival Standard method

7
Basic Cookie GACrossover Results

Results are for 1000 random trials
Initial state 1 1-1, quality 1 chromosome
On average, reaches max quality (9) in 14
generations
Conclusion
Faster with crossover combine good in each gene
Key Global max achievable by maximizing each
dimension independently - reduce dimensionality

8
Solving the Moat Problem

Problem
No single step mutation can reach optimal values
using standard fitness (quality0 gt
probability0)
Solution A
Crossover can combine fit parents in EACH gene
However, still slow 155 generations on average

9
Questions

How can we avoid the 0 quality problem?
How can we avoid local maxima?

10
Rethinking Fitness

Goal Explicit bias to best
Remove implicit biases based on quality scale
Solution Rank method
Ignore actual quality values except for ranking
Step 1 Rank candidates by quality
Step 2 Probability of selecting ith candidate,
given that i-1 candidate not selected, is
constant p.
Step 2b Last candidate is selected if no other
has been
Step 3 Select candidates using the probabilities

11
Rank Method
Chromosome Quality Rank Std. Fitness
Rank Fitness
1 4 1 3 1 2 5 2 7 5
4 3 2 1 0
1 2 3 4 5
0.4 0.3 0.2 0.1 0.0
0.667 0.222 0.074 0.025 0.012
Results Average over 1000 random runs on Moat
problem - 75 Generations (vs 155 for standard
method) No 0 probability entries Based on rank
not absolute quality
12
Diversity

Diversity
Degree to which chromosomes exhibit different
genes
Rank Standard methods look only at quality
Need diversity escape local min, variety for
crossover
As good to be different as to be fit

13
Rank-Space Method

Combines diversity and quality in fitness
Diversity measure
Sum of inverse squared distances in genes
Diversity rank Avoids inadvertent bias
Rank-space
Sort on sum of diversity AND quality ranks
Best lower left high diversity quality

14
Rank-Space Method
W.r.t. highest ranked 5-1
Chromosome Q D D Rank Q Rank
Comb Rank R-S Fitness
4 3 2 1 0
1 5 3 4 2
1 2 3 4 5
0.667 0.025 0.222 0.012 0.074
0.04 0.25 0.059 0.062 0.05
1 4 3 1 1 2 1 1 7 5
1 4 2 5 3
Diversity rank breaks ties After select others,
sum distances to both Results Average (Moat) 15
generations
15
Genetic Algorithms

Evolution mechanisms as search technique
Produce offspring with variation
Mutation, Crossover
Select fittest to continue to next generation
Fitness Probability of survival
Standard Quality values only
Rank Quality rank only
Rank-space Rank of sum of quality diversity
ranks
Large population can be robust to local max

16
Machine LearningNearest Neighbor Information
Retrieval Search

Artificial Intelligence
CSPP 56553
January 28, 2004

17
Agenda

Machine learning Introduction
Nearest neighbor techniques
Applications Robotic motion, Credit rating
Information retrieval search
Efficient implementations
k-d trees, parallelism
Extensions K-nearest neighbor
Limitations
Distance, dimensions, irrelevant attributes

18
Machine Learning

Learning Acquiring a function, based on past
inputs and values, from new inputs to values.
Learn concepts, classifications, values
Identify regularities in data

19
Machine Learning Examples

Pronunciation
Spelling of word gt sounds
Speech recognition
Acoustic signals gt sentences
Robot arm manipulation
Target gt torques
Credit rating
Financial data gt loan qualification

20
Machine Learning Characterization

Distinctions
Are output values known for any inputs?
Supervised vs unsupervised learning
Supervised training consists of inputs true
output value
E.g. letterspronunciation
Unsupervised training consists only of inputs
E.g. letters only
Course studies supervised methods

21
Machine Learning Characterization

Distinctions
Are output values discrete or continuous?
Discrete Classification
E.g. Qualified/Unqualified for a loan application
Continuous Regression
E.g. Torques for robot arm motion
Characteristic of task

22
Machine Learning Characterization

Distinctions
What form of function is learned?
Also called inductive bias
Graphically, decision boundary
E.g. Single, linear separator
Rectangular boundaries - ID trees
Vornoi spacesetc

- - -
23
Machine Learning Functions

Problem Can the representation effectively model
the class to be learned?
Motivates selection of learning algorithm

For this function, Linear discriminant is
GREAT! Rectangular boundaries (e.g. ID
trees) TERRIBLE! Pick the right representation!
- - - - - - - - -

24
Machine Learning Features

Inputs
E.g.words, acoustic measurements, financial data
Vectors of features
E.g. word letters
cat L1c L2 a L3 t
Financial data F1 late payments/yr Integer
F2 Ratio of income to
expense Real

25
Machine Learning Features

Question
Which features should be used?
How should they relate to each other?
Issue 1 How do we define relation in feature
space if features have different scales?
Solution Scaling/normalization
Issue 2 Which ones are important?
If differ in irrelevant feature, should ignore

26
Complexity Generalization

Goal Predict values accurately on new inputs
Problem
Train on sample data
Can make arbitrarily complex model to fit
BUT, will probably perform badly on NEW data
Strategy
Limit complexity of model (e.g. degree of equn)
Split training and validation sets
Hold out data to check for overfitting

27
Nearest Neighbor

Memory- or case- based learning
Supervised method Training
Record labeled instances and feature-value
vectors
For each new, unlabeled instance
Identify nearest labeled instance
Assign same label
Consistency heuristic Assume that a property is
the same as that of the nearest reference case.

28
Nearest Neighbor Example

Problem Robot arm motion
Difficult to model analytically
Kinematic equations
Relate joint angles and manipulator positions
Dynamics equations
Relate motor torques to joint angles
Difficult to achieve good results modeling
robotic arms or human arm
Many factors measurements

29
Nearest Neighbor Example

Solution
Move robot arm around
Record parameters and trajectory segment
Table torques, positions,velocities, squared
velocities, velocity products, accelerations
To follow a new path
Break into segments
Find closest segments in table
Get those torques (interpolate as necessary)

30
Nearest Neighbor Example

Issue Big table
First time with new trajectory
Closest isnt close
Table is sparse - few entries
Solution Practice
As attempt trajectory, fill in more of table
After few attempts, very close

31
Roadmap

Problem
Matching Topics and Documents
Methods
Classic Vector Space Model
Challenge I Beyond literal matching
Expansion Strategies
Challenge II Authoritative source
Page Rank
Hubs Authorities

32
Matching Topics and Documents

Two main perspectives
Pre-defined, fixed, finite topics
Text Classification
Arbitrary topics, typically defined by statement
of information need (aka query)
Information Retrieval

33
Three Steps to IR

Three phases
Indexing Build collection of document
representations
Query construction
Convert query text to vector
Retrieval
Compute similarity between query and doc
representation
Return closest match

34
Matching Topics and Documents

Documents are about some topic(s)
Question Evidence of aboutness?
Words !!
Possibly also meta-data in documents
Tags, etc
Model encodes how words capture topic
E.g. Bag of words model, Boolean matching
What information is captured?
How is similarity computed?

35
Models for Retrieval and Classification

Plethora of models are used
Here
Vector Space Model

36
Vector Space Information Retrieval

Task
Document collection
Query specifies information need free text
Relevance judgments 0/1 for all docs
Word evidence Bag of words
No ordering information

37
Vector Space Model
Tv
Program
Computer
Two documents computer program, tv program
Query computer program matches 1 st doc
exact distance2 vs 0 educational
program matches both equally distance1
38
Vector Space Model

Represent documents and queries as
Vectors of term-based features
Features tied to occurrence of terms in
collection
E.g.
Solution 1 Binary features t1 if present, 0
otherwise
Similiarity number of terms in common
Dot product

39
Question

Whats wrong with this?

40
Vector Space Model II

Problem Not all terms equally interesting
E.g. the vs dog vs Levow
Solution Replace binary term features with
weights
Document collection term-by-document matrix
View as vector in multidimensional space
Nearby vectors are related
Normalize for vector length

41
Vector Similarity Computation

Similarity Dot product
Normalization
Normalize weights in advance
Normalize post-hoc

42
Term Weighting

Aboutness
To what degree is this term what document is
about?
Within document measure
Term frequency (tf) occurrences of t in doc j
Specificity
How surprised are you to see this term?
Collection frequency
Inverse document frequency (idf)

43
Term Selection Formation

Selection
Some terms are truly useless
Too frequent, no content
E.g. the, a, and,
Stop words ignore such terms altogether
Creation
Too many surface forms for same concepts
E.g. inflections of words verb conjugations,
plural
Stem terms treat all forms as same underlying

44
Key Issue

All approaches operate on term matching
If a synonym, rather than original term, is used,
approach fails
Develop more robust techniques
Match concept rather than term
Expansion approaches
Add in related terms to enhance matching
Mapping techniques
Associate terms to concepts
Aspect models, stemming

45
Expansion Techniques

Can apply to query or document
Thesaurus expansion
Use linguistic resource thesaurus, WordNet to
add synonyms/related terms
Feedback expansion
Add terms that should have appeared
User interaction
Direct or relevance feedback
Automatic pseudo relevance feedback

46
Query Refinement

Typical queries very short, ambiguous
Cat animal/Unix command
Add more terms to disambiguate, improve
Relevance feedback
Retrieve with original queries
Present results
Ask user to tag relevant/non-relevant
push toward relevant vectors, away from nr
ß?1 (0.75,0.25) r rel docs, s non-rel docs
Roccio expansion formula

47
Compression Techniques

Reduce surface term variation to concepts
Stemming
Map inflectional variants to root
E.g. see, sees, seen, saw -gt see
Crucial for highly inflected languages Czech,
Arabic
Aspect models
Matrix representations typically very sparse
Reduce dimensionality to small key aspects
Mapping contextually similar terms together
Latent semantic analysis

48
Authoritative Sources

Based on vector space alone, what would you
expect to get searching for search engine?
Would you expect to get Google?

49
Issue

Text isnt always best indicator of content
Example
search engine
Text search -gt review of search engines
Term doesnt appear on search engine pages
Term probably appears on many pages that point to
many search engines

50
Hubs Authorities

Not all sites are created equal
Finding better sites
Question What defines a good site?
Authoritative
Not just content, but connections!
One that many other sites think is good
Site that is pointed to by many other sites
Authority

51
Conferring Authority

Authorities rarely link to each other
Competition
Hubs
Relevant sites point to prominent sites on topic
Often not prominent themselves
Professional or amateur
Good Hubs Good Authorities

52
Computing HITS

Finding Hubs and Authorities
Two steps
Sampling
Find potential authorities
Weight-propagation
Iteratively estimate best hubs and authorities

53
Sampling

Identify potential hubs and authorities
Connected subsections of web
Select root set with standard text query
Construct base set
All nodes pointed to by root set
All nodes that point to root set
Drop within-domain links
1000-5000 pages

54
Weight-propagation

Weights
Authority weight
Hub weight
All weights are relative
Updating
Converges
Pages with high x good authorities y good hubs

55
Googles PageRank

Identifies authorities
Important pages are those pointed to by many
other pages
Better pointers, higher rank
Ranks search results
tpage pointing to A C(t) number of outbound
links
ddamping measure
Actual ranking on logarithmic scale
Iterate

56
Contrasts

Internal links
Large sites carry more weight
If well-designed
HA ignores site-internals
Outbound links explicitly penalized
Lots of tweaks.

57
Web Search

Search by content
Vector space model
Word-based representation
Aboutness and Surprise
Enhancing matches
Simple learning model
Search by structure
Authorities identified by link structure of web
Hubs confer authority

58
Nearest Neighbor Example II

Credit Rating
Classifier Good / Poor
Features
L late payments/yr
R Income/Expenses

Name L R G/P
A 0 1.2 G
B 25 0.4 P
C 5 0.7 G
D 20 0.8 P
E 30 0.85 P
F 11 1.2 G
G 7 1.15 G
H 15 0.8 P
59
Nearest Neighbor Example II
Name L R G/P
A 0 1.2 G
A
F
B 25 0.4 P
1
G
R
E
C 5 0.7 G
H
D
C
D 20 0.8 P
E 30 0.85 P
B
F 11 1.2 G
G 7 1.15 G
10
30
20
L
H 15 0.8 P
60
Nearest Neighbor Example II
Name L R G/P
I 6 1.15
G
A
F
K
J 22 0.45
1
G
I
P
E
??
K 15 1.2
D
H
R
C
B
J
Distance Measure
Sqrt ((L1-L2)2 sqrt(10)(R1-R2)2)) -
Scaled distance
10
30
20
L
61
Efficient Implementations

Classification cost
Find nearest neighbor O(n)
Compute distance between unknown and all
instances
Compare distances
Problematic for large data sets
Alternative
Use binary search to reduce to O(log n)

62
Efficient Implementation K-D Trees

Divide instances into sets based on features
Binary branching E.g. gt value
2d leaves with d split path n
d O(log n)
To split cases into sets,
If there is one element in the set, stop
Otherwise pick a feature to split on
Find average position of two middle objects on
that dimension
Split remaining objects based on average position
Recursively split subsets

63
K-D Trees Classification
Yes
No
No
Yes
Yes
No
No
Yes
No
Yes
No
No
Yes
Yes
Poor
Good
Good
Poor
Good
Good
Poor
Good
64
Efficient ImplementationParallel Hardware