Title: Collective Specialization for Evolutionary Design of Agent Controllers
1 Collective Specialization for Evolutionary
Design of Agent Controllers
- A.E. Eiben, G.S. Nitschke, M.C. Schut
- Computational Intelligence Group
- Department of Computer Science, Faculty of
SciencesDe Boelelaan 1081a, 1081 HV Amsterdam
The Netherlands - gusz_at_cs.vu.nl, nitschke_at_cs.vu.nl, schut_at_cs.vu.nl
2Introduction
- Challenge Developing agent controllers for a
group of simulated aerial explorers. - Task Search and find (collective gathering)
missions. - 1st goal Demonstrate that specialization
(behavior exhibited at agent group level) is
beneficial for this task. - 2nd goal Use neuro-evolution as an adaptive
mechanism to derive specialization (to increase
task performance). - Compare Heuristic benchmark methods and other
adaptive methods with respect to specialized
behavior in given task.
3Background
- Collective behavior systems Where behavior or
morphology (or both) is specialized increases
efficiency and performance in many collective
behavior tasks. - Neuro-evolution An approach that combines neural
networks and evolutionary computation techniques. - Neuro-evolution collective behavior applications
RoboCup, multi-agent computer games, and
multi-robot system controllers.
4 Hypotheses
Hypothesis 1 There exist particular types of
environments where specialization increases task
performance. Hypothesis 2 The collective
specialization neuro-evolution method is
appropriate for deriving specialized groups that
yield a high task performance.
5Adaptive MethodsConventional Neuro-Evolution
- One genotype population (evolved offline)
- Genotype Population 1000 neurons
- Each tested in complete controller (9 other
neurons) - All neurons evaluated before recombination
- Best 20 of neurons selected (each cloned 5
times) to replace population - End of 100 test scenarios Best 20 of neurons
selected as controllers (cloned 5 times) to run
in full simulation
6Adaptive MethodsCollective Neuro-Evolution
- N genotype populations (evolved online)
- 100 genotype populations of 100 neurons
- 10 neurons randomly selected from best 20 of a
population ? controller - Mutation (Cauchy distribution) applied to best
20, each cloned 5 times to replace current
population - Agent controllers updated when agent received
fitness reward
7Experimental Setup
- Agent Group 100 explorers 1 Lander (recharge
station) - Environment Discrete
- 200 x 200 x 200 voxels
- 40000 red rocks
Task Maximize number of features of interest
(red rocks) discovered, given U energy, T time,
to complete task.
8Experimental Setup (cont.)
- Non-Adaptive experiment set
- Specialized Agent Groups
- Non-Specialized Agent Groups
- Adaptive experiment set
- Conventional Neuro-Evolution
- Collective Neuro-Evolution
9Experimental Setup (cont.)
- Test environments
- 10 distributions (ranging from low to high
structure of red rocks) - 4 clusters (fixed positions), with radius r
(variable) used Gaussian mixture model data
generator (Paalanen and Kälviäinen, 2006)
1 Unstructured
10 Structured
10 Agent Design Sensors and Actuators
- Agent types classified according to
probabilities to select one of four actions a
Detect, b Evaluate, c Move, d Communicate
Where ? ( a, b, c, d ) 1.0
a Detect (sensor) b Evaluate
(sensor) c Move (actuator)
d Communicate (actuator)
- Battery 1000 units Cost 1 unit / sensor or
actuator use
- Energy Rewards (Adaptive and Non-Adaptive
experiments) - Equal to number of red rocks discovered x
Scalar (Given as recharge at Lander) - Fitness Rewards (Adaptive Experiments)
- Equal to number of red rocks discovered (used
by evolution selection process)
11Adaptive Methods Agent Neural Controller
0..6 Non-Visual Neurons Nodes 1-4 Previous
motor output values (MO0 - MO3) Node 5 Ambient
wind Node 6 Ambient light Node 7 Previous
red rock evaluation
7..55 Visual Neurons Detection sensor
resolution (encoded in genotype) Each visual
neuron represents a voxel on ground z
plane Higher resolution ? Lower probability of
detection success
12Non-Adaptive Methods Defining agent types
- At the individual level Specialized agent
types
- Action Preferences set a priori and remained
static throughout a simulation run.
13Non-Adaptive Methods Non-specialized groups
- At the group level Specialized group types
- At the group level Non-Specialized group types
14Results
Adaptive methods Neuro-evolution comparison
Non-adaptive methods benchmarks
15Results (cont).
- Results from heuristic and neuro-evolution
methods Conformed to normal distributions
(Kolmogorov-Smirnov test applied). - T-test applied Specialized and non-specialized
heuristic results. Null hypothesis (two data
sets not significantly different) was rejected.
Supported 1st hypothesis specialization is
advantageous (increases task performance) in
certain environment types. - T-test applied Comparative neuro-evolution
results. Null hypothesis was rejected. Partially
supported 2nd hypothesis collective
neuro-evolution would yield a (relatively) higher
task performance. - Supporting 2nd hypothesis Compared the best
performing specialized heuristic method, and
specialization (composition of agent types in
group) derived by collective neuro-evolution
method.
16Results (cont.)
- Group Composition best performing specialized
non-adaptive group
- Group Composition best performing evolved
groups - RRVG Red Rock Value Gathered
- DoS Degree of Structure
- (Test environment type)
17Conclusions
- Supporting 1st hypothesis Heuristic
pre-designed specialization methods showed that
specialization was beneficial in a range of test
environments (defined by different resource
distributions) - Supporting 2nd hypothesis Collective
neuro-evolution method yielded a higher
performance (in all test environments),
comparative to a conventional neuro-evolution
method - Collective neuro-evolution Best performing
group converged to a specialized group
composition (majority of the agents assumed one
role). Resembled group composition of highest
performing pre-designed (heuristic) specialized
group
18 19Experimental Setup (Adaptive Experiments)
- Collective (online) Neuro-Evolution
- Generations Evolutionary operator applied to
agent controller each time fitness reward
received. - Conventional (offline) Neuro-Evolution
- Offline testing phase n test scenarios (250
iterations) testing n genotypes. - End of testing phase best 20 of genotypes
selected, 5 clones of each, corresponding
phenotypes placed into task environment. - Generations At the end of a testing scenario
the fittest 20 (elite portion) of genotypes were
recombined and mutated producing 5 offspring each
so as to replace the entire genotype population.
Next test scenario then began with the next
generation of genotypes.