Title: Cross-Domain Action-Model Acquisition for Planning viaWeb Search
1Cross-Domain Action-Model Acquisition for
Planning viaWeb Search
- Hankz Hankui Zhuoa, Qiang Yangb, Rong Pana and
Lei Lia - aSun Yat-sen University, China
- bHong Kong University of Science Technology,
Hong Kong
2Motivation
- There are many domains that share knowledge with
each other, e.g.,
3Motivation
- There are many domains that share knowledge with
each other, e.g., - walking in the driverlog domain
http//www.superstock.com/stock-photos-images/1778
R-4701
4Motivation
- There are many domains that share knowledge with
each other, e.g., - walking in the driverlog domain
- navigating in the rovers domain
http//www.superstock.com/stock-photos-images/1778
R-4701
http//www.pixelparadox.com/mars.htm
5Motivation
- There are many domains that share knowledge with
each other, e.g., - walking in the driverlog domain
- navigating in the rovers domain
- moving in the elevator domain
- etc
http//www.superstock.com/stock-photos-images/1778
R-4701
http//www.venusengineers.com/goods-lift.html
http//www.pixelparadox.com/mars.htm
6Motivation
- These actions in these domains all share the
common knowledge about location change, thus, - it may be possible to borrow knowledge from
each other. - specifically, next slide
http//www.superstock.com/stock-photos-images/1778
R-4701
http//www.venusengineers.com/goods-lift.html
http//www.pixelparadox.com/mars.htm
7Motivation
http//www.superstock.com/stock-photos-images/1778
R-4701
http//www.pixelparadox.com/mars.htm
walk(?d-driver ?l1-loc ?l2-loc) precondition (and
(at ?d ?l1) (path ?l1 ?l2)) effect (and (not
(at ?d ?l1)) (at ?d ?l2)))
8Motivation
http//www.superstock.com/stock-photos-images/1778
R-4701
http//www.pixelparadox.com/mars.htm
navigate(?d-rover ?x-waypoint ?y-waypoint) precon
dition ?? effect ??
guess?
walk(?d-driver ?l1-loc ?l2-loc) precondition (and
(at ?d ?l1) (path ?l1 ?l2)) effect (and (not
(at ?d ?l1)) (at ?d ?l2)))
9Motivation
http//www.superstock.com/stock-photos-images/1778
R-4701
http//www.pixelparadox.com/mars.htm
walk(?d-driver ?l1-loc ?l2-loc) precondition (and
(at ?d ?l1) (path ?l1 ?l2)) effect (and (not
(at ?d ?l1)) (at ?d ?l2)))
navigate(?d-rover ?x-waypoint ?y-waypoint) precon
dition (at ?x ?y) (visible ?y ?z) effect
(not (at ?x ?y)) (at ?x ?z)
guess?
10Motivation
http//www.superstock.com/stock-photos-images/1778
R-4701
http//www.pixelparadox.com/mars.htm
walk(?d-driver ?l1-loc ?l2-loc) precondition (and
(at ?d ?l1) (path ?l1 ?l2)) effect (and (not
(at ?d ?l1)) (at ?d ?l2)))
navigate(?d-rover ?x-waypoint ?y-waypoint) precon
dition (at ?x ?y) (visible ?y ?z) effect
(not (at ?x ?y)) (at ?x ?z)
guess?
11Motivation
http//www.superstock.com/stock-photos-images/1778
R-4701
http//www.pixelparadox.com/mars.htm
walk(?d-driver ?l1-loc ?l2-loc) precondition (and
(at ?d ?l1) (path ?l1 ?l2)) effect (and (not
(at ?d ?l1)) (at ?d ?l2)))
navigate(?d-rover ?x-waypoint ?y-waypoint) precon
dition (at ?d ?x) (visible ?x ?y) effect
(not (at ?d ?x)) (at ?d ?y)
guess?
12Motivation
- In this work, we aim at learning action models
from a target domain, - e.g., learning the model of navigate in rovers,
- by transferring knowledge from another domain,
called a source domain, - e.g., the knowledge of the model walk in
driverlog.
13Problem Formulation
- Formally, our learning problem can be addressed
- Given as inputs
- Action models from a source domain As
- A few plan traces from the target domain
- lts0,a1,s1,,an,sngt,
- where si is a partial state, and ai
is an action. - Action schemas from the target domain A
- Predicates from the target domain P
14Problem Formulation
- Formally, our learning problem can be addressed
- Given as inputs
- Action models from a source domain As
- A few plan traces from the target domain
- lts0,a1,s1,,an,sngt,
- where si is a partial state, and ai
is an action. - Action schemas from the target domain A
- Predicates from the target domain P
- Output
- Action models in the target domain At
15Problem Formulation
- Our assumptions are
- based on STRIPS domain
- people do not write action names randomly
- E.g., not using eat to express move!
- no need to observe full intermediate states in
plan traces, i.e., intermediate state can be
partial or empty. - action sequences in plan traces are correct.
- actions in plan traces are all ordered, i.e.,
there are no concurrent actions. - there is information available in the Web related
to actions.
16Our Algorithm LAWS
Constraints from web searching
17Our Algorithm LAMMAS
Constraints from states between actions
18Our Algorithm LAMMAS
Constraints imposed on action models
19Our Algorithm LAMMAS
Constraints to ensure causal links in traces.
20Our Algorithm LAMMAS
Solving constraints Using a weighted MAXSAT
solver.
21Web constraints
- Used to measure the similarity between two
actions. - To do this, we search two actions in the Web.
- Specifically, we build predicate-action pairs
from the target domain as follows - Where,
- p is a predicate
- a is an action schema
- ps parameters are included by as
22Web constraints
- Similarly, we build predicate-action pairs from
the source - where,
- PAspre, PAsadd, PAsdel, denote sets of
precondition-action pairs, add-action pairs and
del-action pairs. - Note that we require p?PRE(a), which is different
from PAt
23Web constraints
- Next, we collect a set of web documents Ddi by
searching keyword - wltp,agt ?PAt.
- We process each page di as a vector yi by
calculating the tf-idf (Jones 1972). - As a result, we have a set of real-number vectors
Yyi. - Likewise, we can easily get a set of vectors
Xxi by searching keyword wltp,agt?PAspre.
24Web constraints
- We define the similarity function between two
keywords w and w as follows - similarity(w,w)MMD2(F, Y, X),
MMD is the Maximum Mean Discrepancy, which is
given by (Borgwardt et al. 2006). The mathematics
is like
25Web constraints
- We define the similarity function between two
keywords w and w as follows - similarity(w,w)MMD2(F, Y, X),
MMD is the Maximum Mean Discrepancy, which is
given by (Borgwardt et al. 2006). The mathematics
is like
where
26Web constraints
- We define the similarity function between two
keywords w and w as follows - similarity(w,w)MMD2(F, Y, X),
MMD is Maximum Mean Discrepancy, which is given
by (Borgwardt et al. 2006). The mathematics is
like
- a set of feature mapping function of a
- Gaussian kernel.
where
27Web constraints
- Finally, we generate weighted web constraints by
the following steps - For each wltp,agt?PAt, and wltp,agt?PAspre , we
calculate similarity(w,w), - Generate a constraint
- p ?PRE(a),
- and associate it with similarity(w, w) as
its weight. - likewise for ADD(a) and DEL(a)
28State constraints (given by Yang et.al 2007)
- Generally, if p frequently appears before a, it
is probably a precondition of a. Specifically,
- The weights of all the constraints are calculated
by counting their occurrences in all the plan
traces.
29Action constraints (given by Yang et.al 2007)
- Action constraints are imposed to ensure the
learned action models are succinct, which is
- These constraints are associated with the maximal
weight of all the state constraints to ensure
these constraints are maximally satisfied.
30Plan constraints (given by Yang et.al 2007)
- We require that causal links in plan traces are
not broken. Thus, we build constraints as
follows. - For each precondition p of an action aj in a plan
trace, either p is in the initial state, or there
is ai prior to aj that adds p, and no ak between
ai and aj that deletes p - where i lt k lt j.
- For each literal q in goal, either q is in the
initial state s0, or there is ai that adds q and
no ak that deletes q
31Plan constraints (given by Yang et.al 2007)
- To ensure these constraints are maximally
satisfied, we assign these constraints with the
maximal weight of state constraints.
32Solve constraints
- Before solving all these constraints, we adjust
the weights of web constraints by replacing the
original weights wo with wo - where wm is the maximal value of weights of state
constraints, and ? belongs to 0,1). - We can easily adjust wo from 0 to 8 by varying
? from 0 to 1.
33Solve constraints
- Solve these weighted constraints by running a
weighted MAXSAT solver. - The attained result is converted to action
models, e.g.
34Experimental Result
- Example result
- (action walk(?d - rover ?x - waypoint ?y -
waypoint) - precondition (and (at ?d ?x)
(visible ?x ?y)) - effect (and (not (at ?d ?x))
- (at ?d ?y) (not
(visible ?x ?y))))
Missing condition
Extra condition
By comparing to hand-written action models, we
know that there is a missing/extra condition.
We calculate the error rate by counting all the
missing and extra conditions, and finally get the
accuracy.
35Experimental Result
- We compared LAWS to t-LAMP (by Zhuo et. al. 2009)
and ARMS (Yang et. al. 2007), where - t-LAMP borrows knowledge by building syntax
mappings - ARMS learns without borrowing knowledge.
- The results are shown below
36Experimental Result
- We can see that
- LAWS gt t-LAMP gt ARMS accuracies of LAWS are
higher than t-LAMP and ARMS, which empirically
shows the advantage of LAWS. - accuracies decrease when plan traces increase,
which is consistent with our intuition, since
more information will help learning.
37Experimental Result
- We also test the following three cases
- Case I(? 0) not borrowing knowledge
- Case II(? 0.5 and wo 1) weights of web
constraints are the same, i.e., not using
similarity function - Case III(? 0.5) using the similarity function.
- The results are shown bellow
38Experimental Result
- We can see that
- Case III gt the other two suggests the similarity
function could really help improve the learning
result - Case II gt Case I suggests that web constraints
is helpful
39Experimental Result
- Next, we test different ratios of states
- Accuracy generally increases when the ratio
increases - This is consistent with our intuition, since the
increasing information could help improve the
learning result.
40Experimental Result
- We also test different values of ?
- When ? increases from 0 to 0.5, the accuracy
increases, which exhibits when the effect of web
knowledge enlarges, the accuracy gets higher - However, when ? is larger than 0.5, the accuracy
decreases when ? increases. This is because the
impact of plan traces is relatively reduced. This
suggests knowledge from plan traces is also
important in learning high-quality action models.
41Cpu Time
- The Cpu time is smaller than 1,000 seconds on a
typical 2 GHZ PC with 1GB memory. - It is quite reasonable in learning. However, it
did not include web searching time, since it
mainly depends on specific network quality.
42Conclusion
- In this paper, we propose an algorithm framework
to borrow knowledge from another domain with
web search, and empirically show the improvement
of the learning quality. - Our work can be extended to more complex action
models, e.g., PDDL models. - Can also be extended to multi-task action-model
acquisition.
43Thank You