Approaches to Modeling and Learning User Preferences - PowerPoint PPT Presentation

About This Presentation

Title:

Approaches to Modeling and Learning User Preferences

Description:

Alternative: Learn planning knowledge by observation (i.e., from example plans) ... Rock. Results: Blocks World ... Rock. Subset Selection. Subset of 5 images, ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 35

Provided by: marie263

Category:

more less

Transcript and Presenter's Notes

Title: Approaches to Modeling and Learning User Preferences

1
Approaches to Modeling and Learning User
Preferences

Marie desJardins
University of Maryland Baltimore County
Presented at SRI International AI Center
March 10, 2008
Joint work with Fusun Yaman, Michael Littman, and
Kiri Wagstaff

2
Overview

Representing Preferences
Learning Planning Preferences
Preferences over Sets
Directions / Conclusions

Representing Preferences

4
What is a Preference?

(Partial) ordering over outcomes
Feature vector representation of outcomes (aka
objects)
Example Taking a vacation. Features
Who (alone / family)
Where (Orlando / Paris)
Flight type (nonstop / onestop / multistop )
Cost (low / medium / high)
Languages
Weighted utility function
CP-net
Lexicographic ordering

5
Weighted Utility Functions

Each value vij of feature fi has an associated
utility uij
Utility Uj of object oj ltv1j, v2j, , vkjgt
Uj ?i wj uij
Commonly used in preference elicitation
Easy to model
Independence of features is convenient
Flight example
U(flight) .8u(Who) .8u(Cost).6u(Where)
.4u(Flight Type)

6
CP-Nets

Conditional Preference Network
Intuitive, graphical representation of
conditional preferences under a ceteris paribus
(all-else-being equal) assumption

I prefer to take a vacation with my family,
rather than going alone If I am with my family, I
prefer Orlando to Paris If I am alone, I prefer
Paris to Orlando
who
family gt alone
family Orlando gt Paris alone Paris gt Orlando
where
7
Induced Preference Graph

Every CP-net induces a preference graph on
outcomes
The partial ordering of outcomes is given by the
transitive closure of the preference graph

alone ? Orlando
who
family gt alone
alone ? Paris
family Orlando gt Paris alone Paris gtOrlando
family ? Paris
where
family ? Orlando
8
Lexicographic Orderings

Features are prioritized with a total ordering
f1, , fk
Each value of each feature is prioritized with a
total ordering, vi1vim
To compare o1 and o2
Find the first feature in the feature ordering on
which o1 and o2 differ
Choose the outcome with the preferred value for
that feature
Travel example
Who gt Where gt Cost gt Where gt Flight-Type gt
Family gt Alone
Orlando gt Florida
Cheap gt Expensive

9
Representation Tradeoffs

Each representation has some limitations
Additive utility functions cant capture
conditional preferences, and cant easily
represent hard constraints or preferences
CP-nets, in general, only give a partial
ordering, cant model integer/real features
easily, and cant capture tradeoffs
Lexicographic preferences cant capture
tradeoffs, and cant represent conditional
preferences

Learning Planning Preferences

11
Planning Algorithms

Domain-independent
Inputs initial state, goal state, possible
actions
Domain-independent but not efficient
Domain-specific
Works for only one domain
(Near-) optimal reasoning
Very fast
Domain-configurable
Use additional planning knowledge to customize
the search automatically
Broadly applicable and efficient

12
Domain Knowledge for Planning

Provide search control information
Hierarchy of abstract actions (HTN operators)
Logical formulas (e.g., temporal logic)
Experts must provide planning knowledge
May not be readily available
Difficult to express knowledge declaratively

13
Learning Planning Knowledge

Alternative Learn planning knowledge by
observation (i.e., from example plans)
Possibly even learn from a single complex example
DARPAs Integrated Learning Program
Our focus Learn preferences at various decision
points
Charming Hybrid Adaptive Ranking Model
Currently Learns preferences over variable
bindings
Future Learn goal and operator preferences

14
HTN Hierarchical Task Network

Objectives are specified as high-level tasks to
be accomplished
Methods describe how high-level tasks are
decomposed down to primitive tasks

travel(X,Y)
travel(X,Y)
short-distance travel
payDriver
rideTaxi(X,Y)
getTaxi(X)
travel(X,Y)
High-level tasks
long-distance travel
HTN operators
Primitive actions
travel(Ay,Y)
buyTicket(Ax,Ay)
fly(Ax,Ay)
travel(X,Ax)
15
CHARM Charming Hybrid Adaptive Ranking Model

Learns preferences in HTN methods
Which objects to choose when using a particular
method?
Which flight to take? Which airport to choose?
Which goal to select next during planning?
Which method to choose to achieve a task?
By plane or by train?
Preferences are expressed as lexicographic
orderings
A natural choice for many (not all) planning
domains

16
Summary of CHARM

CHARM learns a preference rule for each method.
Given an HTN, initial state, and the plan tree
Find an ordering on variable values for each
decision point (planning context)
CHARM has two modes
Gather training data for each method
Orlando (tropical, family-oriented, expensive)
is preferred to Boise (cold,
outdoors-oriented, cheap)
Learn preference rule in each method

17
Preference Rules

A preference rule is a function that returns lt,
, or gt, given two objects represented as vectors
of attributes.
Assumption Preference rules are lexicographic
For every attribute there is a preferred value
There is a total order on the attributes
representing the order of importance
A warm destination is preferred to a cold one.
Among destinations of the same climate, an
inexpensive one is better than an expensive one.

18
Learning Lexicographic Preference Models

Existing algorithms return one of many models
consistent with the data
The worst case performance of such algorithms is
worse than random selection
Higher probability of poor performance if there
are fewer training observations
A novel democratic approach Variable Voting
Sample the possible consistent models
Implicit sampling models that satisfy certain
properties are permitted to vote
Preference decision is based on the majority of
votes

19
Variable Voting

Given a partial order, lt, on the attributes and
two objects, A and B
D attributes that are different in A and B
D most salient attributes in D with respect to
lt
The object with the largest number of preferred
values for the attributes in D is the preferred
object

X1 X2 X3 X4 X5
A 1 0 1 0 0
B 0 0 1 1 1
20
Learning Variable Ranks

Initially, all attributes are equally important
Loop until ranks converge
Given two objects, predict a winner using the
current beliefs
If the prediction was wrong, decrease the
importance of the attribute values that led to
the wrong prediction
The importance of an attribute never goes beyond
its actual place in the order of attributes
Mistake bounds algorithm, learns from its
mistakes
Mistake bound is O( n2 ), where n is the number
of attributes

21
Democracy vs. Autocracy
VariableVoting
22

Preferences Over Sets

23
Preferences over Sets

Subset selection applications
Remote sensing, sports teams, music playlists,
planning
Ranking, like a search engine?
Doesnt capture dependencies between items
Encode, apply, learn set-based preferences

24
User Preferences

Depth utility function (desirable values)
Diversity variety and coverage
Geologist near far views (context)

25
Encoding User Preferences

DD-PREF a language for expressing preferred
depth and diversity, for sets

Diversity
Depth
or
?
26
Finding the Best Subset

Maximize
where

subset valuation
utility of subset s
Depth
diversity value of s
Diversity
27
Learning Preferences from Examples

Hard for users to specify quantitative values
(especially with more general quality functions)
Instead, adopt a machine learning approach
Users provide example sets with high valuation
System infers
Utility functions
Desired diversity
Feature weights
Once trained, the system can select subsets of
new data (blocks, images, songs, food)

28
Learning a Preference Model

Depth utility functions
Probability density estimation KDE (kernel
density estimation) Duda et al., 01
Diversity average of observed diversities
Feature weights minimize difference between
computed valuation and true valuation
BFGS bounded optimization Gill et al., 81

29
Results Blocks World

Compute valuation of sets chosen by true
preference, learned preference, and random
selection
As more training sets are available, performance
increases (learned approximates true)

Mosaic
Tower
30
Rover Image Experiments

Methodology
Six users 2 geologists, 4 computer scientists
Five sets of 20 images each
Each user selects a subset of 5 images from each
set
Evaluation
Learn preferences on (up to 4) examples,select a
new subset from a held-out set
Metrics
Valuation of the selected subset
Functional similarity between learned preferences

31
Learned Preferences
Subset of 5 images, chosen by a geologist, from
20 total
Learned feature weights Learned feature weights
Rock 0.3
Soil 0.1
Sky 1.0
Learned diversities Learned diversities
Rock 0.8
Soil 0.9
Sky 0.5
32
Subset Selection
Subset of 5 images, chosen by a geologist, from
20 total
5 images chosen from 20 images, using greedy
DD-Select and learned prefs
5 images chosen by the same geologist from the
same 20 new images
33
Current Work