Title: Muhammad Al-Nasser
1Stochastic Optimization of Bipedal Walking using
Gyro Feedback and Phase Resetting
King Fahd University of Petroleum and Minerals
COE 584/484 Robotics
- Muhammad Al-Nasser
- Mohammad Shahab
March 2008 COE584 Robotics
2Outline
- Problem Definition
- Physical Description
- Humanoid Walking System
- Feedback
- Gyroscope
- Phase Resetting
- Stochastic Optimization
- PGRL
- Experimentation
- Comments
3Problem Definition
- Authors
- Felix Faber Sven Behnke, Univ. of Freinbrg,
Germany - Problem Statement
- to optimize the walking pattern of a humanoid
robot for forward speed using suitable
metaheuristics
4First Humanoid Robot!
- 1206 AD
- Ibn Ismail Ibn al-Razzaz Al-Jazari
- A boat with four programmable automatic musicians
that floated on a lake to entertain guests at
royal drinking parties!!
5Problem Definition
Sensor Noise Camera Gyroscope Ultrasonic Force
Inaccurate Actuators Motors
Nonlinear Dynamics i.e. complex system to
control
Environment Disturbances Unknown surface
6Physical Description
- Jupp, team NimbRo
- 60 cm, 2.3 kg
- Pocket PC
7Physical Description
- Pitch joint to bend trunk
- Each leg
- 3DOF hip
- Knee
- 2DOF ankle
- Each arm
- 2DOF shoulders
- elbow
8Humanoid Walking System
- One Approach
- Model-Based (Geometric Model)
- Accurate Model
- Solving motion equations for all joints (offline)
- 19 Degrees of Freedom
- Nonlinear model equations
- Computational complexity
Joints motor positions
Controller
Robot walks!
Leg Motion Trajectory
?s
9Humanoid Walking System
Joints motor positions
Controller
?s
- Central Pattern Generators (CPG)
- Sinusoid joint trajectory generated
- Bio-Inspired
- no need for model
10Humanoid Walking System
- Open-loop (no feedback) Gait
- Mechanism
- Shifting weight from one leg to the other
- Shortening the leg not needed
- Leg motion in forward direction
11Humanoid Walking System
- Open-loop Gait
- Clock-driven, Trunk phase being central clock
- Trunk Phase (with foot step frequency ? )
- Right leg motion phase ?Trunk ?/2
- Left leg motion phase ?Trunk - ?/2
?
?
time
-?
12Humanoid Walking System
?Leg
Kinematic Mapping
?Left
?Right
?
?Swing
?Foot
Human-Like Walking using Toes Joint and Straight
Stance Leg by Behnke
? Is leg extension
?Swing is leg swing amplitude
r Roll p Pitch y Yaw
13Feedback
Joints motor positions
Mapping
?s
Controller
- Gyroscope ?Gyro Inclination (Balance) Angular
Velocity - Force Sensing Resistors foot touch ground
trigger (High or Low)
14Feedback
- Gyroscope
- device for measuring orientation, based on the
principles of conservation of angular momentum - Remember Physics 101!
15Feedback
- P-Control
- ?Gyro increase robot fall
- Proportional Control
- reactive action proportionate to error (Error
sensor value desired value) - Desired values zero (i.e. no inclination)
- Other Proportional-Integral Control
- action proportionate to error and proportionate
to accumulation of error
Joints motor positions
?s
?Gyro
16Feedback
Joints motor positions
Mapping
?s
P-Control
17Feedback
Joints motor positions
Controller
?s
Online Adaptation (Stochastic Optimization)
- Adaptive Control
- Online tuning of parameters of the controller
18Stochastic Optimization Approach
- Goal
- Adjust parameters to achieve faster and more
stable walk. - Fitness function (cost function) is used
to express optimization goals (i.e. speed
robustness) - f (.) RN---gtR
- N number of parameters of interest
19Stochastic Optimization Approach
Kinematic Mapping (Behnke paper)
20Stochastic Optimization Approach
- We evaluate f in a given set of parameters
- x x1 , x2 , ... , xN (Table 1)
- Now, how to find the values of the parameters
that will result in the highest fitness value? - use a metaheuristic method called PGRL
?
1
d ltdexp
21Policy Gradient Reinforcement Learning (PGRL)
- An optimization method to maximize the walking
speed - It automatically searches a set of possible
parameters aiming to find the fastest walk that
can be achieved
22Policy Gradient Reinforcement Learning
- How dose PGRL work?
- 1st generates randomly B test polices x1,
x2,, xB - around an initially given set of parameter vector
xp - (where x x1 , x2 , , xN)
- Each parameter in a given test policy xi is
randomly set to - where 1i B and 1 j N
- e is a small constant value
23Policy Gradient Reinforcement Learning
- 2nd
- the test policy is evaluated by fitness
function. - For each parameter j is grouped into 3 categories
- Which are
- depending on where the jth parameter is modified
by e, 0, e
24Policy Gradient Reinforcement Learning
- Next 3rd , construct vector aa1, a2, , aN
- As are average of each category
25Policy Gradient Reinforcement Learning
- Then 4th (finally), adjust xp as follows
- where ? is a scalar step size
26Extension to PRLG
- Adaptive step size
- after g steps
- where
- s the number of fitness functions evaluations
- S maximum allowed number of s
27Overall
Joints motor positions
Controller
?s
xp
PGRL
28Experiment
29Results
30Results
After 1000 iteration
Initial
- Speed is 34.0 cm/s
- Fitness is 1.52
- speed is 21.3 cm/s
- fitness is 1.36
60
31Parameters
32Glossary
- Stance leg
- the leg which is on the floor during the walk.
- Swing leg
- the leg which moving during the walk.
- Single support
- The case where robot is touching the floor with
one leg. - Double support
- The case where robot is touching the floor with
both legs.