Automated Heuristic Refinement Applied to Sokoban - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Automated Heuristic Refinement Applied to Sokoban

Description:

Degrees of Freedom: Count how many possible moves the man can make ... Gordon S. Novak Jr. (2004) Artificial Intelligence: Lecture Notes (http://www.cs. ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 24
Provided by: dougd
Category:

less

Transcript and Presenter's Notes

Title: Automated Heuristic Refinement Applied to Sokoban


1
Automated Heuristic RefinementApplied to Sokoban
  • Doug Demyen (Grad)
  • Andrew McDonald (Undergrad)
  • Sami Wagiaalla (Undergrad)
  • Stephen Walsh (Undergrad)

2
Outline
  • (Re)introduction
  • Automated Heuristic Refinement
  • Sokoban
  • Challenges
  • Other Approaches
  • Our Goal
  • Enhancements
  • Features Used
  • Our Approaches
  • Regression
  • Gradient Descent
  • Offline Learning
  • Online Learning
  • Results
  • Conclusion

3
(Re)introductionAutomated Heuristic Refinement
  • For a given problem, one might have a number of
    features of any one state
  • Want to combine them to get an estimate of the
    distance to the goal
  • One way to do this is by weighting each such
    feature to get a linear combination
  • Want to use machine learning to find these weights

4
(Re)introduction Sokoban
  • Puzzle where a man must move a number of rocks
    onto goal positions
  • The man cannot move through either rocks or walls
  • He can also only push rocks, and one at a time

5
Challenges
  • Sokoban is very difficult
  • PSPACE complete
  • Irreversible states
  • Deadlocks the puzzle cant be solved from
  • Huge branching factor (up to 4 x of rocks)
  • Long solutions (can be hundreds of pushes)
  • Heuristics hard to determine and misleading
  • Often cannot generalize between puzzles

6
Other Approaches
  • Most solutions with limited success, only solving
    a few problems in the standard set
  • Rolling Stone is the most successful
  • Solves over 50 of the standard puzzle set
  • Many domain-specific enhancements
  • Took a PhD student, a professor, and a summer
    student over 2 years to do (taking several months
    to do the first)

7
Our Goal
  • Create features of a state of a sokoban puzzle,
    and combine them to get a heuristic for running
    IDA
  • Find a heuristic which would lead to a more
    efficient search to the goal (in number of nodes)
  • Determine heuristics on smaller puzzles and
    extend them to larger ones

8
Our Enhancements
  • Used rock-pushes instead of man-moves for
    actions
  • Increases branching factor but decreases solution
    depth much more
  • Extensive deadlock detection
  • Detects any configuration involving adjacent
    rocks and walls
  • Also rocks on a wall that cant be taken off
    (when the goal isnt along this wall)

9
Features
  • Average Manhattan Distance
  • For each rock, calculate the summed horizontal
    and vertical moves to each goal
  • Take the average for each and sum them
  • Degrees of Freedom
  • Count how many possible moves the man can make
  • Subtract this from the maximum possible (the
    number of rocks x 4)

10
Features (contd)
  • Individual Rock Distances
  • Find the number of pushes to get each rock on a
    goal tile, ignoring other rocks
  • Sum these numbers of each rock
  • Single-Rock Subproblems
  • Covert all but one rock into wall tiles and solve
    this sub-problem
  • Sum the solution lengths for all rocks (adding a
    large number for no solution)

11
Features (contd)
  • Turnaround points
  • For each rock on a wall, determine the distance
    to where it can be taken off
  • Take the average if there is more than one, and
    sum over each of these rocks
  • Clumping
  • For each rock, sum the vertical and horizontal
    distances to each other rock
  • Sum these values together

12
Features (contd)
  • Random feature
  • Simply returned a random number
  • Included for insight into the problem
  • Ended up with some interesting effects (more on
    this later)
  • Intercept
  • Always returned 1
  • Used for an intercept in regression

13
Our Approaches
  • For any puzzle state, we have a path of states
    from the start to there
  • Then for each of these states, we can extract the
    values of each feature
  • For offline learning we use the fact that the
    heuristic for the goal should be 0
  • And add one for each move before on the solution
    path

14
Offline Learning
  • For offline learning, we use brute force to
    find solutions for simple puzzles
  • Given this data, try to find weights for each
    feature such that their combination results in
    the correct distance to the goal
  • This was done using two methods
  • Gradient Descent
  • Regression

15
An Early (Discouraging) Result
  • Using the above technique with a few features and
    data sets from several simple problems, ran
    stepwise regression
  • This was to find the relevant of these features,
    their squares, and combinations of them, and
    their coefficients
  • The only feature found to be relevant for these
    datasets was random ( random2)

16
Cant generalize now what?
  • So one set of weights was not going to make all
    puzzles crumble at our feet
  • Still combinations of features can be useful in
    guiding search
  • Considered only one puzzle at a time to train
    weights for others
  • Only linear combinations, not squares or
    interaction, to avoid overfitting

17
Our More Modest Approach
  • Start by weighting the features to zero
  • Solve a simple problem by brute force
  • Obtain the distances to goal and feature values
    along the solution path
  • Use regression or gradient descent to find
    weights based on that data set
  • Continue applying to harder puzzles

18
Mixed Results
  • Including weighted features improved the search
    by providing some guidance
  • Improvements ranged from very little to searching
    hundreds of times fewer nodes
  • Results vary by how well our features could
    describe the puzzle
  • Value of training for a puzzle varies with
    similarities to the training puzzle

19
Online Learning
  • Also attempted to use online learning to improve
    the heuristic during the search
  • Whenever search reaches the IDA depth bound,
    assume the nodes that are cut off are still c
    steps from the goal
  • Use the feature values of the states from the
    start to this state to improve the weights for
    the next iteration

20
More Results
  • Online learning was quite successful
  • This allowed us to tune weights to the current
    puzzle without having to finish the puzzle, which
    could take a long time
  • A final result worth mentioning
  • Online learning of weights allowed us to solve
    the first puzzle of the standard set!

21
Conclusion
  • Sokoban puzzles are designed by humans
    specifically to be challenging
  • Each puzzle has its own tricks what works on
    one seldom works on another
  • Using features of a puzzle state as heuristics
    helps guide search, but their relative importance
    varies by the puzzle

22
References
  • Gordon S. Novak Jr. (2004) Artificial
    Intelligence Lecture Notes (http//www.cs.utexas.
    edu/users/novak/cs381k110.html)
  • Experiments with Automatically Created
    Memory-based Heuristics, R.C. Holte and Istvan
    Hernadvolgyi (2000), in the Proc. of the
    Symposium on Abstraction, Reformulation and
    Approximation (SARA-2000), Lecture Notes in AI,
    volume 1864, pp. 281-290, Springer-Verlag.
  • F. Schmiedle, D. Grosse, R. Drechsler, B. Becker.
    Too Much Knowledge Hurts Acceleration of Genetic
    Programs for Learning Heuristics. Computational
    Intelligence Theory and Applications, 2001.
  • Andreas Junghanns, Jonathan Schaeffer. Sokoban A
    Challenging Single-Agent Search Problem. Workshop
    on Using Games as an Experimental Testbed for AI
    Research, Proceedings IJCAI-97, Nagoya, Japan,
    August 1997.

23
Questions?
Write a Comment
User Comments (0)
About PowerShow.com