Title: Further Cognitive Systems
1Further Cognitive Systems
- Learning
- Environmental interaction
- Artificial cognition?
- Current cognitive systems
- Science-fiction v fact
- Architectures
- Perception, Representation, Reasoning, Learning
Action - Learning Cognitive Systems
- Problems in LCS
- Advances in LCS
2(No Transcript)
3(No Transcript)
4(No Transcript)
5Environmental Interaction
- Perceive receive from the environment
- Represent Environment, agent,
- Reason about environment and self,
- Learn about environment and self,
- Action Act within the environment
6Coupled Interaction
- Where agents actively interact in close-loop with
the environment and other agents
7Cybernetics
- First order
- (Weiner 1948)
- Inspired by control theory and dynamical systems
- Simple feedback control systems are the prime
theoretical tool - (e.g. thermostat controls room temperature)
- Second order
- (revival Port and van Gelder)
- Agent and environment constituting the
meta-cybernetic system are inseparable and
concerns itself with the results of their
interaction - (e.g. room affects thermostat)
8Blind Search
- No single optimum search algorithm exists for
blind search - Uninformed or blind search is performed in state
spaces where operators have no costs, informed
search is performed in search spaces where
operators have costs and it makes sense to talk
about optimality of a search algorithm - (Ici 2001)
-
9No Free Lunch
- The vast majority of industrial cases fall
outside the NFL theorem - Inclusion of domain knowledge (therefore not
blind search) - Co-adaptation algorithms that do not search for
optimum populations - Domain specific algorithms
- Non-infinite populations
- Repetition (resampling) is ignored in NFL, but it
is an important consideration in industry. - Representation style is important for specific
domains, e.g., Gray can outperform binary
encoding. - Whitley 97
10Perception
- Perception through sensors (senses in humans) is
required in order to interact with the
environment.
11Perception
- Classify sensors into two important functional
areas - Proprioceptive
- Sensor measures values internal to the system
- e.g. motor speed, wheel load, joint angles,
battery voltage - Exteroceptive
- Sensor acquires information from the environment
- e.g. distance measurements, light intensity,
sound amplitude - Exteroceptive sensor measurements can be used to
extract meaningful environmental features.
12Perception
- Robot sensors
- Adapted from Siegwart and Nourbakhsh,
introduction to autonomous mobile robots MIT
press 2004
13Perception
- What about Smell, Taste and ESP?
- Olfaction (smell) may be used to sense molecular
stimuli (machine olfaction). - Develop neuronal models of the olfactory pathway
that are driven by real-world chemosensors as a
test-bed for biologically inspired signal
processing architectures. - Works with conducting polymer, optical, and
metal oxide semiconductor chemosensor devices. - Dr Tim Pearce, University of Leicester.
14System Response
- Response of the system to an agents action
- Agent will be updated.
- Environment will be updated.
- Reward may backup agents action
- (Stimulus, Ultimate or Continuous).
15System Response
- Stimulus response The learning system responds
immediately to the input with an output.
Kaz Kawamura Hand me a yellow object
16System Response
- Ultimate response The learning system may
require more than one input before an output is
reached.
Jeff Krichmar The Neurosciences Institute
17System Response
- Continuous response The learning system responds
with an output to each input in order to reach an
ultimate goal. -
18Supervision
- 3 types of supervision Barto 90
- Supervised learning The environment contains a
teacher that (directly or indirectly) provides
the correct response for certain environmental
states as a training signal for the learning
signal. - Unsupervised learning The learning system has an
internally defined teacher with a prescribed goal
that does not need utility feedback of any kind. - Reinforcement learning The environment does not
directly indicate what the correct response
should have been. Instead, it only provides
reward or punishment to indicate the utility of
actions that were actually taken by the system.
19Bootstrapping update estimates based on other
estimates Backup means to back up the action
made, Update may mean no change in
value, Reinforcement can be positive or negative
Adapted from Sutton and Barto 98
20Further Cognitive Systems
- Definition of return episodic, continuing,
discounted, etc. - Action values vs. state values vs. afterstate
values - Action selection/exploration e-greed, softmax,
more sophisticated methods - Synchronous vs. asynchronous
- Replacing vs. accumulating traces
- Real vs. simulated experience
- Location of backups (search control)
- Timing of backups part of selecting actions or
only afterward? - Memory for backups how long should backed up
values be retained?
Adapted from Sutton and Barto