Title: Approaches to Modeling and Learning User Preferences
1Approaches to Modeling and Learning User
Preferences
- Marie desJardins
- University of Maryland Baltimore County
- Presented at SRI International AI Center
- March 10, 2008
- Joint work with Fusun Yaman, Michael Littman, and
Kiri Wagstaff
2Overview
- Representing Preferences
- Learning Planning Preferences
- Preferences over Sets
- Directions / Conclusions
3 4What is a Preference?
- (Partial) ordering over outcomes
- Feature vector representation of outcomes (aka
objects) - Example Taking a vacation. Features
- Who (alone / family)
- Where (Orlando / Paris)
- Flight type (nonstop / onestop / multistop )
- Cost (low / medium / high)
-
- Languages
- Weighted utility function
- CP-net
- Lexicographic ordering
5Weighted Utility Functions
- Each value vij of feature fi has an associated
utility uij - Utility Uj of object oj ltv1j, v2j, , vkjgt
- Uj ?i wj uij
- Commonly used in preference elicitation
- Easy to model
- Independence of features is convenient
- Flight example
- U(flight) .8u(Who) .8u(Cost).6u(Where)
.4u(Flight Type)
6CP-Nets
- Conditional Preference Network
- Intuitive, graphical representation of
conditional preferences under a ceteris paribus
(all-else-being equal) assumption
I prefer to take a vacation with my family,
rather than going alone If I am with my family, I
prefer Orlando to Paris If I am alone, I prefer
Paris to Orlando
who
family gt alone
family Orlando gt Paris alone Paris gt Orlando
where
7Induced Preference Graph
- Every CP-net induces a preference graph on
outcomes - The partial ordering of outcomes is given by the
transitive closure of the preference graph
alone ? Orlando
who
family gt alone
alone ? Paris
family Orlando gt Paris alone Paris gtOrlando
family ? Paris
where
family ? Orlando
8Lexicographic Orderings
- Features are prioritized with a total ordering
f1, , fk - Each value of each feature is prioritized with a
total ordering, vi1vim - To compare o1 and o2
- Find the first feature in the feature ordering on
which o1 and o2 differ - Choose the outcome with the preferred value for
that feature - Travel example
- Who gt Where gt Cost gt Where gt Flight-Type gt
- Family gt Alone
- Orlando gt Florida
- Cheap gt Expensive
9Representation Tradeoffs
- Each representation has some limitations
- Additive utility functions cant capture
conditional preferences, and cant easily
represent hard constraints or preferences - CP-nets, in general, only give a partial
ordering, cant model integer/real features
easily, and cant capture tradeoffs - Lexicographic preferences cant capture
tradeoffs, and cant represent conditional
preferences
10- Learning Planning Preferences
11Planning Algorithms
- Domain-independent
- Inputs initial state, goal state, possible
actions - Domain-independent but not efficient
- Domain-specific
- Works for only one domain
- (Near-) optimal reasoning
- Very fast
- Domain-configurable
- Use additional planning knowledge to customize
the search automatically - Broadly applicable and efficient
12Domain Knowledge for Planning
- Provide search control information
- Hierarchy of abstract actions (HTN operators)
- Logical formulas (e.g., temporal logic)
- Experts must provide planning knowledge
- May not be readily available
- Difficult to express knowledge declaratively
13Learning Planning Knowledge
- Alternative Learn planning knowledge by
observation (i.e., from example plans) - Possibly even learn from a single complex example
- DARPAs Integrated Learning Program
- Our focus Learn preferences at various decision
points - Charming Hybrid Adaptive Ranking Model
- Currently Learns preferences over variable
bindings - Future Learn goal and operator preferences
14HTN Hierarchical Task Network
- Objectives are specified as high-level tasks to
be accomplished - Methods describe how high-level tasks are
decomposed down to primitive tasks
travel(X,Y)
travel(X,Y)
short-distance travel
payDriver
rideTaxi(X,Y)
getTaxi(X)
travel(X,Y)
High-level tasks
long-distance travel
HTN operators
Primitive actions
travel(Ay,Y)
buyTicket(Ax,Ay)
fly(Ax,Ay)
travel(X,Ax)
15CHARM Charming Hybrid Adaptive Ranking Model
- Learns preferences in HTN methods
- Which objects to choose when using a particular
method? - Which flight to take? Which airport to choose?
- Which goal to select next during planning?
- Which method to choose to achieve a task?
- By plane or by train?
- Preferences are expressed as lexicographic
orderings - A natural choice for many (not all) planning
domains
16Summary of CHARM
- CHARM learns a preference rule for each method.
- Given an HTN, initial state, and the plan tree
- Find an ordering on variable values for each
decision point (planning context) - CHARM has two modes
- Gather training data for each method
- Orlando (tropical, family-oriented, expensive)
is preferred to Boise (cold,
outdoors-oriented, cheap) - Learn preference rule in each method
17Preference Rules
- A preference rule is a function that returns lt,
, or gt, given two objects represented as vectors
of attributes. - Assumption Preference rules are lexicographic
- For every attribute there is a preferred value
- There is a total order on the attributes
representing the order of importance - A warm destination is preferred to a cold one.
Among destinations of the same climate, an
inexpensive one is better than an expensive one.
18Learning Lexicographic Preference Models
- Existing algorithms return one of many models
consistent with the data - The worst case performance of such algorithms is
worse than random selection - Higher probability of poor performance if there
are fewer training observations - A novel democratic approach Variable Voting
- Sample the possible consistent models
- Implicit sampling models that satisfy certain
properties are permitted to vote - Preference decision is based on the majority of
votes
19Variable Voting
- Given a partial order, lt, on the attributes and
two objects, A and B - D attributes that are different in A and B
- D most salient attributes in D with respect to
lt - The object with the largest number of preferred
values for the attributes in D is the preferred
object
X1 X2 X3 X4 X5
A 1 0 1 0 0
B 0 0 1 1 1
20Learning Variable Ranks
- Initially, all attributes are equally important
- Loop until ranks converge
- Given two objects, predict a winner using the
current beliefs - If the prediction was wrong, decrease the
importance of the attribute values that led to
the wrong prediction - The importance of an attribute never goes beyond
its actual place in the order of attributes - Mistake bounds algorithm, learns from its
mistakes - Mistake bound is O( n2 ), where n is the number
of attributes
21Democracy vs. Autocracy
VariableVoting
22 23Preferences over Sets
- Subset selection applications
- Remote sensing, sports teams, music playlists,
planning - Ranking, like a search engine?
- Doesnt capture dependencies between items
- Encode, apply, learn set-based preferences
24User Preferences
- Depth utility function (desirable values)
- Diversity variety and coverage
- Geologist near far views (context)
25Encoding User Preferences
- DD-PREF a language for expressing preferred
depth and diversity, for sets
Diversity
Depth
or
?
26Finding the Best Subset
subset valuation
utility of subset s
Depth
diversity value of s
Diversity
27Learning Preferences from Examples
- Hard for users to specify quantitative values
(especially with more general quality functions) - Instead, adopt a machine learning approach
- Users provide example sets with high valuation
- System infers
- Utility functions
- Desired diversity
- Feature weights
- Once trained, the system can select subsets of
new data (blocks, images, songs, food)
28Learning a Preference Model
- Depth utility functions
- Probability density estimation KDE (kernel
density estimation) Duda et al., 01 - Diversity average of observed diversities
- Feature weights minimize difference between
computed valuation and true valuation - BFGS bounded optimization Gill et al., 81
29Results Blocks World
- Compute valuation of sets chosen by true
preference, learned preference, and random
selection - As more training sets are available, performance
increases (learned approximates true)
Mosaic
Tower
30Rover Image Experiments
- Methodology
- Six users 2 geologists, 4 computer scientists
- Five sets of 20 images each
- Each user selects a subset of 5 images from each
set - Evaluation
- Learn preferences on (up to 4) examples,select a
new subset from a held-out set - Metrics
- Valuation of the selected subset
- Functional similarity between learned preferences
31Learned Preferences
Subset of 5 images, chosen by a geologist, from
20 total
Learned feature weights Learned feature weights
Rock 0.3
Soil 0.1
Sky 1.0
Learned diversities Learned diversities
Rock 0.8
Soil 0.9
Sky 0.5
32Subset Selection
Subset of 5 images, chosen by a geologist, from
20 total
5 images chosen from 20 images, using greedy
DD-Select and learned prefs
5 images chosen by the same geologist from the
same 20 new images
33Current Work
- Extending to document data
- Text (discrete) features
- Menu World Chinese restaurant dish selection
- How do you combine multiple preferences with
different priorities? - Rover dust devils, carbonate rocks,
cross-bedding - Priorities that can change over time
34 35Future Directions
- Hybrid preference representation
- Decision tree with lexicographic orderings at the
leaves - Permits conditional preferences
- How to learn the splits in the tree?
- Support operator, goal orderings for planning
- Incorporate concept of set-based preferences into
planning domains
Questions?