Personalizing Information Search: Understanding Users and their Interests - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

Personalizing Information Search: Understanding Users and their Interests

Description:

Personalization is a process where retrieval is customized to the individual ... Collaborators: Nick Belkin, Xin Fu, Vijay Dollu, Ryen White. Thank You. TREC ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 57

Provided by: shaw185

Category:

more less

Transcript and Presenter's Notes

Title: Personalizing Information Search: Understanding Users and their Interests

1
Personalizing Information Search Understanding
Users and their Interests
Diane KellySchool of Information Library
ScienceUniversity of North Carolina
dianek_at_email.unc.edu
IPAM 04 October 2007
2
Background IR and TREC

What is IR?
Who works on problems in IR?
Where can I find the most recent work in IR?
A TREC primer

3
Background Personalization

Personalization is a process where retrieval is
customized to the individual (not
one-size-fits-all searching)
Hans Peter Luhn was one of the first people to
personalize IR through selective dissemination of
information (SDI) (now called filtering)
Profiles and user models are often employed to
house data about users and represent their
interests
Figuring out how to populate and maintain the
profile or user model is a hard problem

4
Major Approaches

Explicit Feedback
Implicit Feedback
Users desktop

5
Explicit Feedback
6
Explicit Feedback

Term relevance feedback is one of the most widely
used and studied explicit feedback techniques
Typical relevance feedback scenarios (examples)
Systems-centered research has found that
relevance feedback works (including
pseudo-relevance feedback)
User-centered research has found mixed results
about its effectiveness

7
Explicit Feedback

Terms are not presented in context so it may be
hard for users to understand how they can help
Quality of terms suggested is not always good
Users dont have the additional cognitive
resources to engage in explicit feedback
Users are too lazy to provide feedback
Questions about the sustainability of explicit
feedback for long-term modeling

8
Examples
9
Examples
BACK
10
Query Elicitation Study

Users typically pose very short queries
This may be because
users have a difficult time articulating their
information needs
traditional search interfaces encourage short
queries
Polyrepresentative extraction of information
needs suggests obtaining multiple representations
of a single information need (reference interview)

11
Motivation

Research has demonstrated that a positive
relationship exists between query length and
performance in batch-mode experimental IR
Query expansion is an effective technique for
increasing query length, but research has
demonstrated that users have some difficulty with
traditional term relevance feedback features

12
Elicitation Form
Already Know
Why Know
Keywords
13
Results Number of Terms
16.18
10.67
9.33
Already Know
Why
Keywords
2.33
N45
14
Experimental Runs
15
Overall Performance
0.3685
0.2843
16
Query Length and Performance
y 0.263 .000265(x), p.000
17
Major Findings

Users provided lengthy responses to some of the
questions
There were large differences in the length of
users responses to each question
In most cases responses significantly improved
retrieval
Query length and performance were significantly
related

18
Implicit Feedback
19
Implicit Feedback

What is it?
Information about users, their needs and document
preferences that can be obtained unobtrusively,
by watching users interactions and behaviors
with systems
What are some examples?
Examine Select, View, Listen, Scroll, Find,
Query, Cumulative measures
Retain Print, Save, Bookmark, Purchase, Email
Reference Link, Cite
Annotate/Create Mark up, Type, Edit, Organize,
Label

20
Implicit Feedback

Why is it important?
It is generally believed that users are unwilling
to engage in explicit relevance feedback
It is unlikely that users can maintain their
profiles over time
Users generate large amounts of data each time
the engage in online information-seeking
activities and the things in which they are
interested is in this data somewhere

21
Implicit Feedback

What do we know about it?
There seems to be a positive correlation between
selection (click-through) and relevance
There seems to be a positive correlation between
display time and relevance
What is problematic about it?
Much of the research has been based on incomplete
data and general behavior
And has not considered the impact of contextual
variables such as task and a users familiarity
with a topic on behaviors

22
Implicit Feedback Study

To investigate
the relationship between behaviors and relevance
the relationship between behaviors and context
To develop a method for studying and measuring
behaviors, context and relevance in a natural
setting, over time

23
Method

Approach naturalistic and longitudinal, but
some control
Subjects/Cases 7 Ph.D. students
Study period 14 weeks
Compensation new laptops and printers

24
Data Collection
Endurance
Frequency
Tasks
Stage
Relevance
Context
Document
Persistence
Usefulness
Topics
Familiarity
Behaviors
Display Time
Printing
Saving
25
Protocol
Client- Server-side Logging
Context Evaluation Document Evaluations
Context Evaluation
Document Evaluations
Week 1
Week 13
START
END
14 weeks
26
(No Transcript)
27
Results Description of Data
28
Relevance Usefulness
6.1 (2.00)
6.0 (0.80)
5.3 (2.40)
5.3 (2.20)
5.0 (2.40)
4.8 (1.65)
4.6 (0.80)
29
Relevance Usefulness
30
Display Time
31
Display Time Usefulness
32
Display Time Task
33
Major Findings

Behaviors differed for each subject, but in
general
most display times were low
most usefulness ratings were high
not much printing or saving
No direct relationship between display time and
usefulness

34
Major Findings

Main effects for display time and all contextual
variables
Task (5 subjects)
Topic (6 subjects)
Familiarity (5 subjects)
Lower levels of familiarity associated with
higher display times
No clear interaction effects among behaviors,
context and relevance

35
Personalizing Search

Using the display time, task and relevance
information from the study, we evaluated the
effectiveness of a set of personalized retrieval
algorithms
Four algorithms for using display time as
implicit feedback were tested
User
Task
User Task
General

36
Results
MAP
Iteration
37
Major Findings

Tailoring display time thresholds based on task
information improved performance, but doing so
based on user information did not
There was a lot of variability between subjects,
with the user-centered algorithms performing well
for some and poorly for others
The effectiveness of most of the algorithms
increased with time (and more data)

38
Some Problems
39
Relevance

What are we modeling? Does click relevance?

Relevance is multi-dimensional and dynamic
A single measure does to adequately reflect
relevance
Most pages are likely to be rated as useful, even
if the value or importance of the information
differs

40
Definition
Recipe
41
Weather Forecast
Information about Rocky Mountain Spotted Fever
42
Paper about Personalization
43
Page Structure

Some behaviors are more likely to occur on some
types of pages
A more intelligent modeling function would know
when and what to observe and expect
The structure of pages encourage/inhibit certain
behaviors
Not all pages are equally as useful for modeling
a users interests

44
What types of behaviors do you expect here?
And here?
45
And here?
And here?
46
The Future
47
Future

New interaction styles and systems create new
opportunities for explicit and implicit feedback
Collaborative search features and query
recommendation
Features/Systems that support the entire search
process (e.g., saving, organizing, etc.)
QA systems
New types of feedback
Negative
Physiological

48
Thank You

Diane Kelly (dianek_at_email.unc.edu)
WEB http//ils.unc.edu/dianek/research.html
Collaborators Nick Belkin, Xin Fu, Vijay Dollu,
Ryen White

49
TRECText REtrieval Conference

Its not this

50
What is TREC?

TREC is a workshop series sponsored by the
National Institute of Standards and Technology
(NIST) and the US Department of Defense.
Its purpose is to build infrastructure for
large-scale evaluation of text retrieval
technology.
TREC collections and evaluation measures are the
de facto standard for evaluation in IR.
TREC is comprised of different tracks each of
which focuses on different issues (e.g., question
answering, filtering).

51
(No Transcript)
52
TREC Collections

Central to each TREC Track is a collection, which
consists of three major components
A corpus of documents (typically newswire)
A set of information needs (called topics)
A set of relevance judgments.
Each Track also adopts particular evaluation
measures
Precision and Recall F-measure
Average Precision (AP) and Mean AP (MAP)

53
Comparison of Measures
54
Learn more about TREC

http//trec.nist.gov
Voorhees, E. M., Harman, D. K. (2005). TREC
Experiment and Evaluation in Information
Retrieval, Cambridge, MA MIT Press.

BACK
55
Example Topic
BACK
56
Learn more about IR

ACM SIGIR Conference
Sparck-Jones, K., Willett, P. (1997). Readings
in Information Retrieval. Morgan-Kaufman
Publishers.
Baeza-Yates, R., Ribeiro-Neto, B. (1999).
Modern information retrieval. New York, NY ACM
Press.
Grossman, D. A., Frieder, O. (2004). Information
retrieval Algorithms and Heuristics. The
Netherlands Springer.

BACK

Write a Comment

User Comments (0)