Heuristic Search in Artificial Intelligence: Recent Enhancements and Applications

About This Presentation

Title:

Heuristic Search in Artificial Intelligence: Recent Enhancements and Applications

Description:

Heuristic Search in Artificial Intelligence: Recent Enhancements and Applications ... BFS depends on its cost (heuristic) function. ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 57

Provided by: iseB8

Category:

more less

Transcript and Presenter's Notes

Title: Heuristic Search in Artificial Intelligence: Recent Enhancements and Applications

1
Heuristic Search in Artificial Intelligence
Recent Enhancements and Applications

Ariel Felner
ISE Department
Ben-Gurion University.
felner_at_bgu.ac.il

2
optimal path search algorithms

For small graphs provided explicitly, algorithm
such as Dijkstras shortest path, Bellman-Ford or
Floyd-Warshal. Complexity O(n2).
For very large graphs , which are implicitly
defined, the A algorithm which is a best-first
search algorithm.

3
Best-first search schema

sorts all generated nodes in an OPEN-LIST and
chooses the node with the best cost value for
expansion.
generate(x) insert x into OPEN_LIST.
expand(x) delete x from OPEN_LIST and generate
its children.
BFS depends on its cost (heuristic) function.
Different functions cause BFS to expand different
nodes..

20
25
30
35
30
35
35
40
Open-List
4
Best-first search Cost functions

g(x) Real distance from the initial state to x
h(x) The estimated remained distance from x to
the goal state.
ExamplesAir distance
Manhattan Dinstance
Different cost combinations of g and h
f(x)level(x) Breadth-First Search.
f(x)g(x) Dijkstras algorithms.
f(x)h(x) Pure Heuristic Search (PHS).
f(x)g(x)h(x) The A algorithm (1968).

5
A

A is a best-first search algorithm that uses
f(n)g(n)h(n) as its cost function.
f(x) in A is an estimation of the shortest path
to the goal via x.
A is admissible, complete and optimally
effective. Pearl 84
Result any other optimal search algorithm will
expand at least all the nodes expanded by A

Breadth First Search
A
6
How to improve search?

Enhanced algorithms Perimeter-search, RBFS,
Frontier-search etc, They all try to better
explore the search tree.
Better heuristics more parts of the search tree
will be pruned.
In the 3rd Millennium we have very large
memories.
We can build large tables.
For enhanced algorithms large open-lists or
transposition tables. They store nodes
explicitly.
A more intelligent way is to store general
knowledge. We can do this with heuristics

7
Pattern databases

Many problems can be decomposed into subproblems
(patterns) that must be also solved.
The cost of a solution to a subproblem is a
lower-bound on the cost of the complete solution
Instead of calculating the lower bounds on the
fly, we expand the whole pattern-space and store
the solution to each pattern configuration in a
pattern database

Search space
Mapping function
Pattern space
8
Non-additive pattern databases

15 puzzle 1013 states
Fringe pattern database Culberson Schaeffer
1996. Has only 259 Million states.
Improvement of a factor of 100 over Manhattan
Distance

9
Rubiks Cube (Korf 1997)

Has 1019 States.
PDB of the corner cubies has only 88 Million
states.
Korf AAAI-97 built 2 other pattern databases
for this domain.
The best way to combine different non-additive
pattern databases is to take their maximum!

10
Disjoint Additive Databases

15 and 24 puzzles Korf Felner AIJ-02,
Felner, Korf Hanan JAIR-04

8
7

Values of disjoint databases can be added for the
heuristic
Better than maxing heuristics

6
6
6
6
11
Dynamically-partitioned additive databases

Statically-partition databases do not capture
conflicts of tiles from different patterns.
We want to store as many pattern databases as
possible and partition them to disjoint
subproblems on the fly such the chosen partition
will yield the best heuristic.
This is called Dynamically
Partitioning PDBs.

2
1
2
1
1
2
1
3
4
1
12
Experimental Results15 puzzle
Fives
Sixes
SevenEight
13
Results 24 puzzle.

For the 24 puzzle we compared the SDB of sixes
with the DDB of pairs triples on 10 random
instances.

The relative advantage of the SDB decreases when
the problem scales up
What will happen for the 6x6 35 puzzle???

14
35 puzzle
We sampled 10 Billion random states and
calculated their heuristic. The table was created
by the method presented by Korf, Reid and
Edelkamp. (AIJ 129, 2001)

15
Tile puzzles Summary

The relative advantage of the SDB over DDB
decreases over time.
For the 15 puzzle 1/2 of the domain is stored.
For the 24 puzzle 1/4 of the domain is stored.
For the 35 puzzle 1/7 of the domain is stored.
The memory needed by the DDB was 100 times
smaller than that of the SDB!!

16
4-peg Towers of Hanoi (TOH4)

There is a conjecture about the length of optimal
path but it was not proven.
Systematic search is the only way to solve this
problem or to verify the conjecture.
There are too many cycles. IDA as a DFS will not
prune these cycles. Therefore, A (actually
frontier A Korf Zhang 2000) was used.

17
Heuristics for the TOH

Infinite peg heuristic (INP) Each disk moves to
its own temporary peg.
Additive pattern databases
Felner, Korf Hanan, JAIR-04

18
Additive PDBS for TOH4

Partition the disks into disjoint sets
Store the cost of the complete pattern space of
each set in a pattern database.
Add values from these PDBs for the heuristic
value.
The n-disk problem contains 4n states
The largest database that we stored was of 14
disks which needed 414256MB.

6
10
19
TOH4 results

The difference between static and dynamic is
covered in Felner, Korf Hanan JAIR-04

20
Vertex-Cover (VC) Felner et al JAIR-04

Given a graph we want the minimal set of vertices
such that they cover all the edges.
VC was one of the first problems that was proved
to be NP-complete.
Search tree
At each level, either include or exclude a
vertex.
Improvements
If a node is excluded, all its neighbors bust be
included.
Dealing with degree-0 and degree-1 vertices.

0
1
2
3
R
X0 V1,2,3
V0
V0,2 X1
V0,1
21
Depth-first Baranch and Bound

DFBnB Searches the tree from left to right.
Expands only sub trees with costs smaller than
the best solution found so far.
We also used Itervative Deepening A IDA (Korf
85)

6
8
7
6
7
8
6
22
Heuristics for VC

The included edges form the g part of fgh.
We want an admissible heuristic of the free
vertices.
Pairwise heuristic
A maximum-matching of the free-graph.
For a triangle we can add two to the heuristic.
In general, a clique of size k contributes k-1 to
h.
So partition the free-graph into disjoint
cliques and sum up their heuristics.

VC EX
1
3
2
4
Free vertices
23
Additive pattern databases

Clique is NP-complete. However, in random
graphs, cliques of size 5 and larger are rare.
Thus, it is easy to finds small cliques
Pattern databases Instead of finding the cliques
on the fly we identify them before the search and
store them in a pattern database. We stored
cliques of size 4 or smaller.
During the search we need to retrieve disjoint
cliques from the pattern database.

24
VC results

The results are on random graphs of size 150 and
an average degree of 16.
When we added our dynamic database to the best
proven tree search algorithm we further improved
the running time by a fact or more than 10.

25
Conclusions and Summary

In general Additivity can be applied whenever a
problem can be decomposed into disjoint
subproblems such that the sum of the costs is a
lower bound on cost of the complete problem.
Additive databases is a special case of additive
heuristic where we save the heuristics in a
table.

26
The Graph Partitioning ProblemFelner AMAI-2004

Given a graph G(E,V) the problem is to partition
the graph into two equal sized subsets of
vertices.
The number of edges that are crossing the
partition should me minimized.
The partition in the graph on the right is of
cost 2.

27
GPP as a search problem

A sub problem in GPP is to assign a vertex to one
of the subsets of the partition
Each level of the search tree corresponds to a
specific vertex of the graph.
Each branch assigns the vertex to another subset
of the partition.

Each node of the tree is a partial partition
including some of the vertices.
Size of the tree 2n

1,2
1
2
1
1,2,3
1,2
1,3
3
2
2,3

Leaves of the tree are the complete partitions.
One of them is the optimal.

28
Definitions

A node of the search tree is denoted by k while
vertex of the graph is denoted by x.
A vertex that is already assigned to one of the
subsets is called an assigned vertex.
Each of the other vertices is a free vertex.
Free vertices are unsolved subgoals.
Given a node k of the search tree we define
g(k) the number of edges that already cross
the partial partition due to assigned vertices.
h(k) A lower bound on the number of edges
that will cross the given partition due to free
vertices.

29
A heuristic from the free vertices

The free vertices have many edges connected to
them.
Can we have an estimation on the number of such
edges that must cross the partition?

1
3
4
2
Free vertices
30

More definitions
The subsets of the partial partition are A and B.
Each of the following heuristics completes the
partition with A and B
We can guess about A and B
Types of the edges
I Edges in A A
II Edges from A to B
III Edges from A to B
IV Edges from A to B

A5,6 B7,8
II
3 4 7 8
1 2 5 6
B
A
I
III
A
B
IV
31
f0 Uniform Cost Search

f0(k) g(k).
Edges that already cross the partition. Edges of
type II.
Mainly for comparison reasons.

Assigned
II
3 4 7 8
1 2 5 6
A
B
Free
A
B
32
f1 Adding edges of type III

For each free vertex x we define d(x,A) as the
number of edges from x to A and d(x,B) as the
number of edges from to B.

A B

An admissible heuristic for a vertex x will be
h1(x)mind(x,A),d(x,B)
h1(k)summing h1(x) for all free vertices x.
f1(k)g(k)h1(k)

1 2
3 4
x
33

Results for other graphs as well as using IDA
were very similar.
A better heuristic solves the problem faster

f3 if faster than f0 by almost 10,000 for
graphs with density of 6.
f3 is faster than f1 by a factor of 100 for a
graph with density of 20.

Graphs of size 100. Solved by f3 only.
Once again as the density of the graph increase
the optimal cut increases linearly and the time
to solve the problem increases exponentially.

36
Other domains

Traveling salesman problem Korf 1996
Number partitioning Korf 1998
Bin-packing Korf 2002
Rectangle packing Korf 2004.
Multiple sequence alignment Hansen Zough 2004

37
Best Usage of Memory

Given 1 giga byte of memory, how do we best use
it with pattern databases?
Holte, Newton, Felner, Meshulam and Furcy,
ICAPS-2004 showed that it is better to use many
small databases and take their maximum instead of
one large database.
We will present a different (orthogonal) method
Felner, Mushlam Holte AAAI-04.

38
Compressing pattern database Felner et al
AAAI-04

Traditionally, each configuration of the pattern
had a unique entry in the PDB.
Our main claim ?
Nearby entries in PDBs are highly correlated
!!
We propose to compress nearby entries by storing
their minimum in one entry.
We show that ?
most of the knowledge is preserved
Consequences Memory is saved, larger patterns
can be used ? speedup in search is obtained.

39
Cliques in the pattern space

The values in a PDB for a clique are d or d1
In permutation puzzles cliques exist when only
one object moves to another location.

d
G
d1
d

Usually they have nearby entries in the PDB
A44444

A clique in TOH4
40
Compressing cliques

Assume a clique of size K with values d or d1
Store only one entry (instead of K) for the
clique with the minimum d. Lose at most 1.
A44444 A44441
Instead of 4p we need only 4(p-1) entries.
This can be generalized to a set of nodes with
diameter D. (for cliques D1)
A44444 A44411
In general compressing by k disks reduces memory
requirements from 4p to 4(p-k)

41
TOH4 results 16 disks (142)

Memory was reduced by a factor of 1000!!! at a
cost of only a factor of 2 in the search effort.

42
TOH4 larger versions
Memory was reduced by a factor of 1000!!! At a
cost of only a factor of 2 in the search
effort. Lossless compressing is noe efficient in
this domain.

For the 17 disks problem a speed up of 3 orders
of magnitude is obtained!!!
The 18 disks problem can be solved in 5 minutes!!

43
Tile Puzzles
Goal State
Clique

Storing PDBs for the tile puzzle
(Simple mapping) A multi dimensional array ?
A1616161616 size1.04Mb
(Packed mapping) One dimensional array ?
A1615141312 size 0.52Mb.
Time versus memory tradeoff !!

44
15 puzzle results

A clique in the tile puzzle is of size 2.
We compressed the last index by two ?
A161616168

45
24 puzzle

The same tendencies were obtained for the 24
puzzle.
The 6-6-6-6 partitioning is so good that adding
another set of 6-6-6-6 did not speedup the
search.
We have also tried a 7-7-5-5 partitioning but it
did not speedup the search.

46
Ongoing and future work

An item for the PDB of tiles (a,b,c,d) is in the
form ltLa, Lb, Lc, Ldgtd
Store the PDBs in a Trie
A PDB of 5 tiles will have a level in the trie
for each tile. The values will be in the leaves
of the trie.
This data-structure will enable flexibility and
will save memory as subtrees of the trie can be
pruned

47
Trie pruninig
Simple (lossless) pruning Fold leaves with
exactly the same values.
No data will be lost.
2
2
2
2
2
48
Trie pruninig

Intelligent (lossy) pruning
Fold leaves/subtrees with are correlated to each
other (many option for this!!)
Some data will be lost.
Admissibility is still kept.

2
2
2
2
4
49
Trie Initial Results
A 5-5-5 partitioning stored in a trie with simple
folding
50
Neural Networks (NN)

We can feed a PDB into a neural network engine.
Especially, Addition above MD
For each tile we focus on its dx and dy from its
goal position. (i.e. MD)
Linear conflict
dx1 dx2 0
dy1 gt dy21
A NN can learn
these rules

2
1
dy1 2 dy20
51
Neural network

We train the NN by feeding the entire (or part of
the) pattern space.
For example for a pattern of 5 tiles we have 10
features, 2 for each tile.
During the search, given the locations of the
tiles we look them up in the NN.

52
Neural network example
dx4
Layout for the pattern of the tiles 4, 5 and 6
dy4
dx5
4
dy5
dx6
dy6
53
Neural Network problems

We face the problem of overestimating and will
have to bias the results towards underestimating.
We keep the overestimating values in a separate
hash table
Results are encouraging!!

54
Ongoing and Future Work

Dual pattern Databases
VARIABLES versus VALUES
In the tile puzzles locations are variables and
tiles are the values.
We ask Who is located in location X.
Swap their role and ask Where is tile X
located

55
Dual pattern Databases

In regular PDBs we asked
Where are tiles lt1,2,3,4gt located and what
does it take to move them to their goal postion
We now ask
Who is located in positions lt1,2,3,4gt and
what does it take to distribute them to their
goals
The same tables answers both questions!!

56
Search in Artificial Intelligence course