Autonomous Mobile Robots CPE 470/670

About This Presentation

Title:

Autonomous Mobile Robots CPE 470/670

Description:

Motivated by psychology (the Law of Effect, Thorndike 1991) ... Nate Kohl & Peter Stone (2004) CPE 470/670 - Lecture 13. 14. Learning to Push ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 46

Provided by: monicani

Learn more at: https://www.cse.unr.edu

more less

Transcript and Presenter's Notes

Title: Autonomous Mobile Robots CPE 470/670

1
Autonomous Mobile RobotsCPE 470/670

Lecture 13
Instructor Monica Nicolescu

2
Review

Hybrid control
Selection, Advising, Adaptation, Postponing
AuRA, Atlantis, Planner-Reactor, PRS, many others
Adaptive behavior
Adaptation vs. learning
Challenges
Types of learning algorithms

3
Learning Methods

Reinforcement learning
Neural network (connectionist) learning
Evolutionary learning
Learning from experience
Memory-based
Case-based
Learning from demonstration
Inductive learning
Explanation-based learning
Multistrategy learning

4
Reinforcement Learning (RL)

Motivated by psychology (the Law of Effect,
Thorndike 1991)
Applying a reward immediately after the
occurrence of a response increases its
probability of reoccurring, while providing
punishment after the response will decrease the
probability
One of the most widely used methods for
adaptation in robotics

5
Reinforcement Learning

Combinations of stimuli
(i.e., sensory readings and/or state)
and responses (i.e., actions/behaviors)
are given positive/negative reward
in order to increase/decrease their probability
of future use
Desirable outcomes are strengthened and
undesirable outcomes are weakened
Critic evaluates the systems response and
applies reinforcement
external the user provides the reinforcement
internal the system itself provides the
reinforcement (reward function)

6
Decision Policy

The robot can observe the state of
the environment
The robot has a set of actions it can perform
Policy state/action mapping that determines
which actions to take
Reinforcement is applied based on the results of
the actions taken
Utility the function that gives a utility value
to each state
Goal learn an optimal policy that chooses the
best action for every set of possible inputs

7
Unsupervised Learning

RL is an unsupervised learning method
No target goal state
Feedback only provides information on the quality
of the systems response
Simple binary fail/pass
Complex numerical evaluation
Through RL a robot learns on its own, using its
own experiences and the feedback received
The robot is never told what to do

8
Challenges of RL

Credit assignment problem
When something good or bad happens, what exact
state/condition-action/behavior should be
rewarded or punished?
Learning from delayed rewards
It may take a long sequence of actions that
receive insignificant reinforcement to finally
arrive at a state with high reinforcement
How can the robot learn from reward received at
some time in the future?

9
Challenges of RL

Exploration vs. exploitation
Explore unknown states/actions or exploit
states/actions already known to yield high
rewards
Partially observable states
In practice, sensors provide only partial
information about the state
Choose actions that improve observability of
environment
Life-long learning
In many situations it may be required that robots
learn several tasks within the same environment

10
Types of RL Algorithms

Adaptive Heuristic Critic (AHC)
Learning the policy is separate from
learning the utility function the critic
uses for evaluation
Idea try different actions in
different states and observe
the outcomes over time

11
Q-Learning

Watkins 1980s
A single utility Q-function is learned
to evaluate both actions and states
Q values are stored in a table
Updated at each step, using the following rule
Q(x,a) ?Q(x,a) ? (r ?E(y) - Q(x,a))
x state a action ? learning rate r
reward
? discount factor (0,1)
E(y) is the utility of the state y E(y)
max(Q(y,a)) ? actions a
Guaranteed to converge to optimal solution, given
infinite trials

12
Learning to Walk

Maes, Brooks (1990)
Genghis hexapod robot
Learned stable tripod
stance and tripod gait
Rule-based subsumption
controller
Two sensor modalities for feedback
Two touch sensors to detect hitting the floor -
feedback
Trailing wheel to measure progress feedback

13
Learning to Walk

Nate Kohl Peter Stone (2004)

14
Learning to Push

Mahadevan Connell 1991
Obelix 8 ultrasonic sensors, 1 IR, motor current
Learned how to push a box (Q-learning)
Motor outputs grouped into 5 choices move
forward, turn left or right (22 degrees), sharp
turn left/right (45 degrees)
250,000 states

15
Supervised Learning

Supervised learning requires the user to give the
exact solution to the robot in the form of the
error direction and magnitude
The user must know the exact desired behavior for
each situation
Supervised learning involves training, which can
be very slow the user must supervise the system
with numerous examples

16
Neural Networks

One of the most used supervised learning methods
Used for approximating real-valued and
vector-valued target functions
Inspired from biology learning systems are built
from complex networks of interconnecting neurons
The goal is to minimize the error between the
network output and the desired output
This is achieved by adjusting the weights on the
network connections

17
Training Neural Networks

Hebbian learning
Increases synaptic strength along neural pathways
associated with a stimulus and a correct response
Perceptron learning
Delta Rule for networks without hidden layers
Back-propagation for multi-layer networks

18
Perceptron Learning

Repeat
Present an example from a set of positive and
negative learning experiences
Verify the output of the network as to whether it
is correct or incorrect
If it is incorrect, supply the correct output at
the output unit
Adjust the synaptic weights of the perceptrons in
a manner that reduces the error between the
observed output and the correct output
Until satisfactory performance (convergence or
stopping condition is met)

19
ALVINN

ALVINN (Autonomous Land Vehicle in a Neural
Network)
Dean Pomerleau (1991)
Pittsburg to San Diego 98.2 autonomous

20
Learning from Demonstration RL

S. Schaal (97)
Pole balancing, pendulum-swing-up

21
Learning from Demonstration

Inspiration
Human-like teaching by demonstration

Demonstration
Robot performance
22
Learning to Slalom
Demonstration
Robot performance
23
Learning from Robot Teachers

Transfer of task knowledge from humans to robots

Human demonstration
Robot performance
24
Classical Conditioning

Pavlov 1927
Assumes that unconditioned stimuli (e.g. food)
automatically generate an unconditioned response
(e.g., salivation)
Conditioned stimulus (e.g., ringing a bell) can,
over time, become associated with the
unconditioned response

25
Darvin VII

G. Edelman et. Al.
Darvin VII Sensors
CCD Camera
Gripper that senses conductivity
IR sensors
Darvin VII Actuators
PTZ camera
Wheels
Gripper

Low reflectivity walls, floor
Two types of stimulus blocks
6cm metallic cubes
Blobs low conductivity (bad taste)
Stripes high conductivity (good taste)

26
Darvins Perceptual Categorization
Early training
After the 10th stimulus

Instead of hard-wiring stimulus-response rules,
develop these associations over time

27
Genetic Algorithms

Inspired from evolutionary biology
Individuals in a populations have a particular
fitness with respect to a task
Individuals with the highest fitness are kept as
survivors
Individuals with poor performance are discarded
the process of natural selection
Evolutionary process search through the space of
solutions to find the one with the highest
fitness

28
Genetic Operators

Knowledge is encoded as bit strings chromozome
Each bit represents a gene
Biologically inspired operators are applied to
yield better generations

29
Classifier Systems

ALECSYS system
Learns new behaviors and coordination
Genetic operators act upon a set of rules encoded
by bit strings
Demonstrated tasks
Phototaxis
Coordination of approaching, chasing and escaping
behaviors by combination, suppression and
sequencing

30
Evolving Structure and Control

Karl Sims 1994
Evolved morphology and control
for virtual creatures performing
swimming, walking, jumping,
and following
Genotypes encoded as directed graphs are used to
produce 3D kinematic structures
Genotype encode points of attachment
Sensors used contact, joint angle and
photosensors

31
Evolving Structure and Control

Jordan Pollak
Real structures

32
Fuzzy Control

Fuzzy control produces actions using a set of
fuzzy rules based on fuzzy logic
In fuzzy logic, variables take values based on
how much they belong to a particular fuzzy set
Fast, slow, far, near not crisp values!!
A fuzzy logic control system consists of
Fuzzifier maps sensor readings to fuzzy input
sets
Fuzzy rule base collection of IF-THEN rules
Fuzzy inference maps fuzzy sets to other fuzzy
sets according to the rulebase
Defuzzifier maps fuzzy outputs to crisp actuator
commands

33
Examples of Fuzzy Control

Flakey the robot
Behaviors are encoded as collections of fuzzy
rules
IF obstacle-close-in-front AND NOT
obstacle-close-on-left
THEN turn sharp-left
Each behavior may be active to a varying degree
Behavior responses are blended smoothly
Multiple goals can be pursued
Systems for learning fuzzy rules have also been
developed

34
Where Next?
35
Fringe Robotics Beyond Behavior

Questions for the future
Human-like intelligence
Robot consciousness
Complete autonomy of complex thought and action
Emotions and imagination in artificial systems
Nanorobotics
Successor to human beings

36
A Robot Mind

The goal of AI is to build artificial minds
What is the mind?
The mind is what the brain does. (M. Minsky)
The mind includes
thinking
feeling

37
Computational Thought

What does it mean for a machine to think?
Bellman
Thought is not well defined, so we cannot
ascribe/judge it
Computers can perform processes representative of
human thought decision making/learning
Albus
For robots to understand humans, they must be
indistinguishable from humans in bodily
appearance, physical and mental development
Brooks
Thought and consciousness need not be programmed
in they will emerge

38
The Turing Test

Developed by the mathematician Alan Turing
Original version of Turing Test
Two people (a man and a woman) are put in
separate closed rooms. A third person can
interact with each of the two through writing (no
voices).
Can the 3rd person tell the difference between
the man and the woman?

39
The Turing Test

AI version of the Turing Test
A person sits in front of two terminals at one
end is a human at the other end is a computer.
The questioner is free to ask any questions to
the respondents at the other end of the terminals
If the questioner cannot tell the difference
between the computer and the human subject, the
computer has passed the Turing Test!

40
The Turing Test

The Turing Test contest is performed annually,
and it carries a 100,000 award for anybody who
passes it
No computer so far has truly passed the Turing
Test
Is this a good test of intelligence?
Thought is defined based on human fallibility
rather than on machine consciousness
Many researchers oppose to using this test as a
proof of intelligence

41
Penroses Critique

Roger Penrose (Emperors new Mind, Shadows of the
Mind), a British physicist, is a famous critic of
AI
Intelligence is a consequence of neural activity
and interactions in the brain
Computers can only simulate this activity, but
this is not sufficient for true intelligence
Intelligence requires understanding, and
understanding requires awareness, an aspect of
consciousness
Many refuting arguments have been given

42
They're Made Out Of Meat
Terry Bisson

"They're made out of meat.
"Meat?
"Meat. They're made out of meat.
"Meat?
"There's no doubt about it. We picked several
from different parts of the planet, took them
aboard our recon vessels, probed them all the way
through. They're completely meat.
"That's impossible. What about the radio
signals? The messages to the stars.
"They use the radio waves to talk, but the
signals don't come from them. The signals come
from machines.
"So who made the machines? That's who we want to
contact."

43
They're Made Out Of Meat
Terry Bisson

"They made the machines. That's what I'm trying
to tell you. Meat made the machines.
That's ridiculous. How can meat make a machine?
You're asking me to believe in sentient meat.
"I'm not asking you, I'm telling you. These
creatures are the only sentient race in the
sector and they're made out of meat.
"Maybe they're like the Orfolei. You know, a
carbon-based intelligence that goes through a
meat stage.
"Nope. They're born meat and they die meat. We
studied them for several of their life spans,
which didn't take too long. Do you have any idea
whats the life span of meat?
"Spare me. Okay, maybe they're only part meat.
You know, like the Weddilei. A meat head with an
electron plasma brain inside."

44
They're Made Out Of Meat
Terry Bisson

"Nope. We thought of that, since they do have
meat heads like the Weddilei. But I told you, we
probed them. They're meat all the way through.
"No brain?
"Oh, there is a brain all right. It's just that
the brain is made out of meat!
"So... what does the thinking?"
"You're not understanding, are you? The brain
does the thinking. The meat.
"Thinking meat! You're asking me to believe in
thinking meat!
"Yes, thinking meat! Conscious meat! Loving
meat. Dreaming meat. The meat is the whole deal!
Are you getting the picture?"

45
Conclusion