Title: Search Algorithms for Agents
1Search Algorithms for Agents
- problems that have been addressed by search
algorithms can be divided into three classes - path-finding problems
- constraint satisfaction problems (CSP)
- two-player games
-
2Two-player games
- Two-player games studies are obviously related to
DAI/multiagent systems where agents are
competitive.
3CSP Path-finding
- Most algorithms for these classes were
originally developed for a single-agent - Among them, what kinds of algorithms would be
useful for cooperative problem solving by
multiple agents?
4search algorithm graph representation
- A search problem can be represented by using a
graph. - Some of the search problems can be solved by
accumulating local computations for each node in
the graph.
5Asynchronous search algorithms definition
- Asynchronous search algorithm
- solves a search problem by accumulating
- local computations.
- The execution order of these local
- computation can be arbitrary or highly
- flexible, and can be executed
- asynchronously and concurrently.
6CSP a quick reminder
- A CSP consists of n variables x1,,xn,
- Whose values are taken from finite, discrete
domains - D1,,Dn, respectively, and a set of
constraints on their values. - The constraint pk(xk1,,xkj) is a predicate
- that is defined on the Cartesian product
- Dk1 x x Dkj. This predicate is true iff the
- value assignment of these variables satisfies
- this constraint.
7CSP
- Since constraint satisfaction is NP-complete in
general, a trial-and-error exploration of
alternatives is inevitable. - For simplicity, we will focus our attention on
binary CSPs, i.e., all the constraints are
between two variables.
8Example binary CSP graph
- The figure shows 3 variables x1,x2,x3 and
constraints x1 ! x3, x1 x2
x2
x1
!
x3
9Distributed CSP
- Assuming that the variables of a CSP
- are distributed among agents, solving the
- consist of achieving coherence between the
- agents.
- Problems like multiagent truth maintenance
- tasks, interpretation problems, and assignment
- problems can be formalized as distributed CSPs.
10CSP and asynchronous algorithms
- Each process will correspond to a variable.
- We assume the following communication
- model
- Processes communicate by sending messages.
- The delay in delivering a massage is finite.
- Between two processes, messages are received in
the order they were sent. - Processes that have links to xi is called
neighbors - of xi.
11Filtering Algorithm
- A process xi perform the following procedure
revise(xi,xj) for each neighboring process xj. - procedure revise(xi,xj)
- for all xi in Di do
- if there is no value vj in Dj such that vj is
consistent with vi then delete vi from Di end
if end do - When a value is deleted, the process sends its
new - domain to his neighboring processes.
- When xi receives a new domain from a neighbor xj,
the - procedure revise(xi,xj) is performed again.
- The execution order of these processes is
arbitrary.
12Filtering example 3-Queens
x1 x2 x3
Revise(x1,x2)
x1
Revise(x2,x3)
Revise(x3,x2)
x1 x2 x3
x1 x2 x3
x2
x3
133-Queens example continue
x1 x2 x3
Revise(x1,x3)
x1
x1 x2 x3
x1 x2 x3
x2
x3
14Filtering Algorithm
- If a domain of some variable becomes an empty
set, the problem is over-constrained and has no
solution. - If each domain has a unique value, then the
remaining values are a solution. - If there exist multiple values for some
variables, we cannot tell whether the problem has
a solution or not, and further search is
required. - Filtering should be considered a preprocessing
procedure that is invoked before the application
of other search methods.
15K-Consistency
- A CSP is k-consistent iff given any instantiation
of any k-1 variables satisfying all the
constraints among them, it is possible to find an
instantiation of any kth variable such that these
k variable values satisfy all the constraints
among them. - If the problem is k-consistent and j-consistent
for all jltk, the problem is called strongly
k-consistent. - Next, well see an algorithm that transforms a
given problem into an equivalent strongly
k-consistent problem.
16Hyper-Resolution-Based Consistency Algorithm
- The hyper-resolution rule is described as follows
(Ai is a proposition such as x11).
In this algorithm, all constraints are
represented as a nogood, which is a prohibited
combination of variables values. (example next
slide).
17Graph coloring example
- The constraints between x1 and x2 can be
represented as two nogoods x1red,x2red and
x1blue,x2blue. - By using the hyper-resolution rule we can obtain
from x1red,x2red and x1blue,x3blue a new
nogood x2red,x3blue
x2
x1
red,blue
red,blue
x3
red,blue
18Hyper-Resolution-Based Consistency Algorithm
- Each process represents its constraints as
nogoods. - Each process generates new nogoods by combining
the information about its domain and existing
nogoods using the hyper-resolution rule. - A newly obtained nogood is communicated to
related processes. - If a new nogood is communicated, the process
tries to generate further new nogoods using the
communicated nogood.
19Hyper-Resolution-Based Consistency Algorithm
- A nogood is a combination of variables values
that is - prohibited, therefore, a superset of a nogood
cannot be a solution. - If an empty set becomes a nogood, the problem is
over- - constrained and has no solution.
- The hyper-resolution rule can generate a very
large - number of nogoods. If we restrict the
application of the - rules so that only nogoods whose length are less
than k - are produced, the problem becomes strongly
k-consistent.
20Asynchronous Backtracking
- An asynchronous version of a backtracking
algorithm, which is a standard method for solving
CSPs. - The completeness of the algorithm is guaranteed.
- The processes are ordered by the alphabetical
order of the variable identifiers. Each process
chooses an assignment. - Each process maintains the current value of other
processes from its viewpoint (local view). A
process changes its assignment if its current
value isnt consistent with the assignments of
higher priority processes. - If there exist no value that is consistent with
the higher priority processes, the process
generates a new nogood, and communicate the
nogood to a higher priority process.
21Asynchronous Backtracking
- The local view may contain obsolete information.
Therefore, the receiver of a new nogood must
check whether the nogood ia actually violated
from its own local view. - The main messages types communicated among
processes are ok? to communicate the current
value, - and nogood to communicate a new nogood.
22Asynchronous Backtracking example
X2 2
X1 1,2
!
!
(((ok?, (x2,2
(ok?, (x1,1))
X3 1,2
Local view (x1,1),(x2,2)
23Asynchronous Backtracking example continue(1)
Add neighbor, and get value requests
Local view (x1,1)
X2 2
X1 1,2
New link
!
!
X3 1,2
(nogood, (x1,1),(x2,2))
24Asynchronous Backtracking example continue(2)
(nogood,(x1,1))
X2 2
X1 1,2
!
!
X3 1,2
25Asynchronous Backtracking
- When received (ok?, (xj,dj)) do
- add (xj,dj) to local_view
- check_local_view end do
- When received (nogood, nogood) do
- record nogood as a new constraint
- when (xk,dk) where xk is not a neighbor do
- request xk to add xi to its neighbors
- add xk to neighbors
- add (xk,dk) to local_view end do
- check_local_view
- end do
26Asynchronous Backtracking
- Procedure check_local_view
- when local_view and current_value are not
consistent do - if no value in Di is consistent with local_view
- then resolve new nogood using hyper-resolution
rule and send the nogood to the lowest priority
process in the nogood - when an empty nogood is found do
- broadcast to other processes that there is no
solution, terminate this algorithm end do - else select d in Di where local_view and d are
consistent - current_value f d
- send (ok?, (xi,d)) to neighbors end if end
do
27Asynchronous Weak-Commitment Search
- This algorithm introduces a method for
dynamically ordering processes so that a bad
decision can be revised without an exhaustive
search. -
- For each process, the initial priority is 0.
- If there exists no consistent value for xi, the
priority of xi is changed to k1, where k is the
largest value of related processes. - The order is defined such that any process with a
larger priority value has higher priority. If the
priority value of processes are the same, the
order is determined by the alphabetical order of
the variables.
28Asynchronous Weak-Commitment Search
- As in the asynchronous backtracking, each process
concurrently assigns a value to its variable, and
send the variable value to other processes. - The priority value, as well as the current
assignment, is communicated through the ok?
message. - If the current value is not consistent with the
local view the agent changes its value using the
min-conflict heuristic, i.e., a value that is
consistent with the local view and minimizes the
number of constraint violations with variable of
lower priority processes.
29Asynchronous Weak-Commitment Search
- Each process records the nogoods that have been
resolved. - When xi cannot find a consistent value with its
local view, xi sends nogoods messages to other
processes, - and increment its priority only if he created a
new nogood.
30Asynchronous Weak-Commitment Search example
Q
Q
Q
Q
Q
Q
Q
Q
X1 (0)
X1 (0)
X2 (0)
X2 (0)
X3 (0)
X3 (0)
X4 (0)
X4 (1)
(a)
(b)
31Asynchronous Weak-Commitment Search example -
continue
Q
Q
Q
Q
Q
Q
Q
Q
X1 (0)
X1 (0)
X2 (0)
X2 (0)
X3 (2)
X3 (2)
X4 (1)
X4 (1)
(c)
(d)
32Asynchronous Weak-Commitment Search Completeness
- The completeness of algorithm is guaranteed by
the fact - that the processes record all nogoods found so
far. - Handling a large number of nogoods is time/space
- consuming. We can restrict the number of recorded
- nogoods, such that each processes records only
the most - recently found nogoods. In this case the
theoretical - completeness is not guaranteed. Yet, when the
number of - recorded nogoods is reasonably large, an infinite
- processing loop rarely occurs.
33Path Finding Problem
- A path finding problem consist of the following
components - A set of nodes N, each representing a state.
- A set of directed links L, each representing an
operator available to a problem solving agent. - A unique node s called the start node.
- A set of nodes G, each represents a goal state.
34Path Finding Problem
- More definitions
- h(i) is the shortest distance from node i to
goal nodes - If j is a neighbor of i, the shortest distance
via j is given by f(j) k(i,j) h(j), where
k(i,j) is the cost of the link between i and j. - If i is not a goal node, then h(i) minjf(j)
holds.
35Asynchronous Dynamic Programming Algorithm
- Let assume the following situation.
- For each node i there exist a process
corresponding to it. - Each process records h(i), which is the estimated
value of h(i). The initial value of h(i) is
except for goal nodes. - For each goal node g, h(g) is 0.
- Each process can refer to h value of neighboring
nodes.
The algorithm each process updates h(i) by the
following procedure. For each neighboring node j,
compute f(j) k(i,j) h(j), and update h(i) as
follows h(i) f minjf(j).
36Asynchronous Dynamic Programming Example
1
3
a
c
2
1
1
4
0
s
g
1
1
1
3
3
b
d
2
3
2
2
37Asynchronous Dynamic Programming
- If the costs of all links are positive, it is
proved that for each node i, h(i) converges to
the true value h(i). - In reality, the number of nodes can be huge, and
we cannot afford to have processes for all nodes.
38Learning Real-Time A Algorithm (LRTA)
- As with asynchronous dynamic programming, each
agent - records the estimated distance h(i)
- Each agent repeats the following procedure.
- Lookahead calculate f(j) k(i,j) h(j).
- Update h(i) f minjf(j).
- Action selection move to the neighbor j that has
the minimum f(j) value.
39LRTA
- The initial value of h is determined using an
admissible heuristic function. - By using an admissible heuristic function on a
problem with finite number of nodes, in which all
links are positive and there exist a path from
every node to a goal node, the completeness is
guaranteed. - Since LRTA never overestimates, it learns the
optimal solutions through repeated trials.
40Real-Time A Algorithm (RTA)
- Similar to LRTA, only that the updating phase is
different. - - instead of setting h(i) to the smallest value
of f(j), - the second smallest value is assigned to
h(i). - - as a result, RTA learns more efficiently
than LRTA, but can overestimate heuristic
costs.
In a finite space with positive edge costs, in
which there exist a path from every state to a
goal, using a non-negative admissible initial
heuristic values, RTA is complete.
41Moving Target Search (MTS)
- MST algorithm is a generalization of LRTA to the
case where the target can move. - We assume that the problem solver and the target
move alternately, and each can traverse at most
one edge in a single move. - The task is accomplished when the problem solver
and the target occupy the same node.
- MTS maintains a matrix of heuristic values,
representing the function h(x,y) for all
pairs of states x and y. - The matrix initialized to the values returned
by the static evaluation
function.
42MTS
- To simplify the following discussion, we assume
that all - edges in the graph have unit cost.
- When the problem solver moves
- Calculate h(xj,yi) for each neighbor xj of xi.
- Update the value of h(xi,yi) as follows
- h(xi,yi) f max h(xi,yi), minxjh(xj,yi) 1
- 3. Move to the neighbor xj with the minimum
h(xj,yi).
43MTS
- When the target moves
- Calculate h(xi,yj) for the targets new position
yj. - Update the value of h(xi,yi) as follows
- h(xi,yi) f max h(xi,yi), h(xi,yj) -1
- 3. Assign yj to yi, yj is the new targets
position. - MST completeness
- In a finite problem space with positive edge
costs , in which - there exists a path from every state to the goal
state, - starting with non-negative admissible initial
heuristic - values, and with the other assumptions we
mentioned, - the problem solver will eventually reach the
target.
44Real-Time Bidirectional Search Algorithm (RTBS)
- Two problem solvers starting from the initial and
goal states move toward each other. - Each of them knows its current location, and can
communicate with the other. - The following steps are executed until the
solvers meet - Control strategy select a forward or backward
move. - Forward move the forward solver moves
toward the other. - Backward move the backward solver moves
toward the other.
45RTBS
- There are two categories of RTBS
- Centralized RTBS where the best action is
selected from among all possible moves of the two
solvers. - Decoupled RTBS where the two solvers
independently make their own decisions. - The evaluation results show that when the
heuristic - function return accurate values decoupled
performs better - than centralized.
- Otherwise, centralized is better.
46Is RTBS better than unidirectional search?
- The number of moves for centralized RTBS is
around 1/2 in 15-puzzles and 1/6 in 24-puzzles
that for real-time unidirectional search. - In mazes, the number of moves for RTBS is double
that for unidirectional search. - The key to understand this results is to view
that the - difference between RTBS and unidirectional
search is their - problem spaces.
47RTBS
- We call a pair of locations (x,y) a p-state.
- We call the problem space consisting of p-states
a combined problem space. - A heuristic depression is a set of connected
states with heuristic values less than or equal
to the set of immediate surrounding. - The performance of real-time search is sensitive
to the topography of the problem space,
especially to heuristic depressions.
48RTBS
- Heuristic depressions of the original problem
space have been observed to become large and
shallow in the combined problem space. - - if the original heuristic depressions are
deep, they become large, and that makes the
problem harder to solve. - - if the original depressions are shallow, they
become very shallow, and this makes the
problem easier to solve
49(No Transcript)