Approaches to Modeling and Learning User Preferences - PowerPoint PPT Presentation

About This Presentation
Title:

Approaches to Modeling and Learning User Preferences

Description:

Alternative: Learn planning knowledge by observation (i.e., from example plans) ... Rock. Results: Blocks World ... Rock. Subset Selection. Subset of 5 images, ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 35
Provided by: marie263
Category:

less

Transcript and Presenter's Notes

Title: Approaches to Modeling and Learning User Preferences


1
Approaches to Modeling and Learning User
Preferences
  • Marie desJardins
  • University of Maryland Baltimore County
  • Presented at SRI International AI Center
  • March 10, 2008
  • Joint work with Fusun Yaman, Michael Littman, and
    Kiri Wagstaff

2
Overview
  • Representing Preferences
  • Learning Planning Preferences
  • Preferences over Sets
  • Directions / Conclusions

3
  • Representing Preferences

4
What is a Preference?
  • (Partial) ordering over outcomes
  • Feature vector representation of outcomes (aka
    objects)
  • Example Taking a vacation. Features
  • Who (alone / family)
  • Where (Orlando / Paris)
  • Flight type (nonstop / onestop / multistop )
  • Cost (low / medium / high)
  • Languages
  • Weighted utility function
  • CP-net
  • Lexicographic ordering

5
Weighted Utility Functions
  • Each value vij of feature fi has an associated
    utility uij
  • Utility Uj of object oj ltv1j, v2j, , vkjgt
  • Uj ?i wj uij
  • Commonly used in preference elicitation
  • Easy to model
  • Independence of features is convenient
  • Flight example
  • U(flight) .8u(Who) .8u(Cost).6u(Where)
    .4u(Flight Type)

6
CP-Nets
  • Conditional Preference Network
  • Intuitive, graphical representation of
    conditional preferences under a ceteris paribus
    (all-else-being equal) assumption

I prefer to take a vacation with my family,
rather than going alone If I am with my family, I
prefer Orlando to Paris If I am alone, I prefer
Paris to Orlando
who
family gt alone
family Orlando gt Paris alone Paris gt Orlando
where
7
Induced Preference Graph
  • Every CP-net induces a preference graph on
    outcomes
  • The partial ordering of outcomes is given by the
    transitive closure of the preference graph

alone ? Orlando
who
family gt alone
alone ? Paris
family Orlando gt Paris alone Paris gtOrlando
family ? Paris
where
family ? Orlando
8
Lexicographic Orderings
  • Features are prioritized with a total ordering
    f1, , fk
  • Each value of each feature is prioritized with a
    total ordering, vi1vim
  • To compare o1 and o2
  • Find the first feature in the feature ordering on
    which o1 and o2 differ
  • Choose the outcome with the preferred value for
    that feature
  • Travel example
  • Who gt Where gt Cost gt Where gt Flight-Type gt
  • Family gt Alone
  • Orlando gt Florida
  • Cheap gt Expensive

9
Representation Tradeoffs
  • Each representation has some limitations
  • Additive utility functions cant capture
    conditional preferences, and cant easily
    represent hard constraints or preferences
  • CP-nets, in general, only give a partial
    ordering, cant model integer/real features
    easily, and cant capture tradeoffs
  • Lexicographic preferences cant capture
    tradeoffs, and cant represent conditional
    preferences

10
  • Learning Planning Preferences

11
Planning Algorithms
  • Domain-independent
  • Inputs initial state, goal state, possible
    actions
  • Domain-independent but not efficient
  • Domain-specific
  • Works for only one domain
  • (Near-) optimal reasoning
  • Very fast
  • Domain-configurable
  • Use additional planning knowledge to customize
    the search automatically
  • Broadly applicable and efficient

12
Domain Knowledge for Planning
  • Provide search control information
  • Hierarchy of abstract actions (HTN operators)
  • Logical formulas (e.g., temporal logic)
  • Experts must provide planning knowledge
  • May not be readily available
  • Difficult to express knowledge declaratively

13
Learning Planning Knowledge
  • Alternative Learn planning knowledge by
    observation (i.e., from example plans)
  • Possibly even learn from a single complex example
  • DARPAs Integrated Learning Program
  • Our focus Learn preferences at various decision
    points
  • Charming Hybrid Adaptive Ranking Model
  • Currently Learns preferences over variable
    bindings
  • Future Learn goal and operator preferences

14
HTN Hierarchical Task Network
  • Objectives are specified as high-level tasks to
    be accomplished
  • Methods describe how high-level tasks are
    decomposed down to primitive tasks

travel(X,Y)
travel(X,Y)
short-distance travel
payDriver
rideTaxi(X,Y)
getTaxi(X)
travel(X,Y)
High-level tasks
long-distance travel
HTN operators
Primitive actions
travel(Ay,Y)
buyTicket(Ax,Ay)
fly(Ax,Ay)
travel(X,Ax)
15
CHARM Charming Hybrid Adaptive Ranking Model
  • Learns preferences in HTN methods
  • Which objects to choose when using a particular
    method?
  • Which flight to take? Which airport to choose?
  • Which goal to select next during planning?
  • Which method to choose to achieve a task?
  • By plane or by train?
  • Preferences are expressed as lexicographic
    orderings
  • A natural choice for many (not all) planning
    domains

16
Summary of CHARM
  • CHARM learns a preference rule for each method.
  • Given an HTN, initial state, and the plan tree
  • Find an ordering on variable values for each
    decision point (planning context)
  • CHARM has two modes
  • Gather training data for each method
  • Orlando (tropical, family-oriented, expensive)
    is preferred to Boise (cold,
    outdoors-oriented, cheap)
  • Learn preference rule in each method

17
Preference Rules
  • A preference rule is a function that returns lt,
    , or gt, given two objects represented as vectors
    of attributes.
  • Assumption Preference rules are lexicographic
  • For every attribute there is a preferred value
  • There is a total order on the attributes
    representing the order of importance
  • A warm destination is preferred to a cold one.
    Among destinations of the same climate, an
    inexpensive one is better than an expensive one.

18
Learning Lexicographic Preference Models
  • Existing algorithms return one of many models
    consistent with the data
  • The worst case performance of such algorithms is
    worse than random selection
  • Higher probability of poor performance if there
    are fewer training observations
  • A novel democratic approach Variable Voting
  • Sample the possible consistent models
  • Implicit sampling models that satisfy certain
    properties are permitted to vote
  • Preference decision is based on the majority of
    votes

19
Variable Voting
  • Given a partial order, lt, on the attributes and
    two objects, A and B
  • D attributes that are different in A and B
  • D most salient attributes in D with respect to
    lt
  • The object with the largest number of preferred
    values for the attributes in D is the preferred
    object

X1 X2 X3 X4 X5
A 1 0 1 0 0
B 0 0 1 1 1
20
Learning Variable Ranks
  • Initially, all attributes are equally important
  • Loop until ranks converge
  • Given two objects, predict a winner using the
    current beliefs
  • If the prediction was wrong, decrease the
    importance of the attribute values that led to
    the wrong prediction
  • The importance of an attribute never goes beyond
    its actual place in the order of attributes
  • Mistake bounds algorithm, learns from its
    mistakes
  • Mistake bound is O( n2 ), where n is the number
    of attributes

21
Democracy vs. Autocracy
VariableVoting
22
  • Preferences Over Sets

23
Preferences over Sets
  • Subset selection applications
  • Remote sensing, sports teams, music playlists,
    planning
  • Ranking, like a search engine?
  • Doesnt capture dependencies between items
  • Encode, apply, learn set-based preferences

24
User Preferences
  • Depth utility function (desirable values)
  • Diversity variety and coverage
  • Geologist near far views (context)

25
Encoding User Preferences
  • DD-PREF a language for expressing preferred
    depth and diversity, for sets

Diversity
Depth
or
?
26
Finding the Best Subset
  • Maximize
  • where

subset valuation
utility of subset s
Depth
diversity value of s
Diversity
27
Learning Preferences from Examples
  • Hard for users to specify quantitative values
    (especially with more general quality functions)
  • Instead, adopt a machine learning approach
  • Users provide example sets with high valuation
  • System infers
  • Utility functions
  • Desired diversity
  • Feature weights
  • Once trained, the system can select subsets of
    new data (blocks, images, songs, food)

28
Learning a Preference Model
  • Depth utility functions
  • Probability density estimation KDE (kernel
    density estimation) Duda et al., 01
  • Diversity average of observed diversities
  • Feature weights minimize difference between
    computed valuation and true valuation
  • BFGS bounded optimization Gill et al., 81

29
Results Blocks World
  • Compute valuation of sets chosen by true
    preference, learned preference, and random
    selection
  • As more training sets are available, performance
    increases (learned approximates true)

Mosaic
Tower
30
Rover Image Experiments
  • Methodology
  • Six users 2 geologists, 4 computer scientists
  • Five sets of 20 images each
  • Each user selects a subset of 5 images from each
    set
  • Evaluation
  • Learn preferences on (up to 4) examples,select a
    new subset from a held-out set
  • Metrics
  • Valuation of the selected subset
  • Functional similarity between learned preferences

31
Learned Preferences
Subset of 5 images, chosen by a geologist, from
20 total
Learned feature weights Learned feature weights
Rock 0.3
Soil 0.1
Sky 1.0
Learned diversities Learned diversities
Rock 0.8
Soil 0.9
Sky 0.5
32
Subset Selection
Subset of 5 images, chosen by a geologist, from
20 total
5 images chosen from 20 images, using greedy
DD-Select and learned prefs
5 images chosen by the same geologist from the
same 20 new images
33
Current Work
  • Extending to document data
  • Text (discrete) features
  • Menu World Chinese restaurant dish selection
  • How do you combine multiple preferences with
    different priorities?
  • Rover dust devils, carbonate rocks,
    cross-bedding
  • Priorities that can change over time

34
  • Future Directions

35
Future Directions
  • Hybrid preference representation
  • Decision tree with lexicographic orderings at the
    leaves
  • Permits conditional preferences
  • How to learn the splits in the tree?
  • Support operator, goal orderings for planning
  • Incorporate concept of set-based preferences into
    planning domains

Questions?
Write a Comment
User Comments (0)
About PowerShow.com