Title: Susan Dumais
1Personal Information Management Helping Finders
Become Keepers
- Susan Dumais
- Microsoft Research
- http//research.microsoft.com/sdumais
2Outline
- Personal information management today
- Stuff Ive Seen (SIS)
- Research prototype system
- Deployment experiences, usage data
- (MSN Toolbar Desktop Search MS Vista)
- Future directions
- Contextualized search
- Personalized search
3Personal Information Mgt.
- Information acquisition vs. management/access
- Easy to acquire and create information of many
types - E.g., email, docs, web pages, calendars,
pictures, music, etc. - 100 gig drives
- Hard to organize
- And, even harder to re-find
- Information discovery vs. re-use
- Many tools for finding information
- Fewer tools for keeping information
- Yet, many tasks involve re-using information
4Search Today
- Information silos
- Many locations, interfaces for finding things
(e.g., web, mail, contacts, docs, photos, notes)
5Search With SIS
- Unified index of stuff youve seen
- All types of info (e.g., files, email, calendar,
contacts, web pages, rss, im) - Index full-content plus metadata (e.g., time,
author, title, size, usage) - Automatic and immediate update of index
- Rich UI possibilities, since its your content
(e.g., consider usage)
- Get back to information youve seen
- Re-use vs. initial discovery
6Related Work
- Several systems for improving access for specific
sources (e.g., web, mail, files, photos, music) - Some integration across sources
- Keeping Found Things Found Jones et al., 2001
- Lifestreams/Scopeware Fertig, Freeman,
Gelernter, 1996 - Haystack Adar et al., 1999 Huynh et al. 2002
- MyLife Bits Gemmell, Bell et al., 2002
- Commercial systems
- OS Mac Spotlight/Sherlock, MS Windows Indexing
Service - DS Apps Enfish, 80-20, dtSearch, X1, Copernic,
G-DS, MSN-DS, etc. - Whats new with SIS
- Full content and metadata for many different
sources - Extensible architecture (gather, filter, word
break) - Focus on user interface and user experience
- Design guided by usage experiences and
experimental data
7SIS Design Principles
- Indexing experience
- No additional work is required
- User sees something, and it gets indexed
- Retrieval experience
- Fast, flexible
- Interactive refinement
- Sort and filter on metadata
- Note Sort/filter automatically triggers query
- UI innovations
- Previews, Top/Side, Previews
- Richer visualizations
8SIS Demo
9MS Search Architecture
10SIS Architecture
- Indexing infrastructure uses MS Search components
(note IR platform) - Gatherer interface to content sources, e.g.,
files, http, MAPI - Filters decode different file types, e.g.,
word, powerpoint, html, pdf, journal notes - Tokenizer break into words, including date
normalization, stemming, etc. - Indexer standard inverted index
- Retriever Boolean, best match (Okapi), fielded
- User interface
- Client side indexing and storage
11Evaluating SIS
- Internal deployment
- 2500 users
- Users include program management, test, sales,
development, administrative, executives, etc. - Research techniques
- Free-form feedback
- Questionnaires Structured interviews
- Usage patterns from log data
- UI experiments (randomly deploy different
versions) - Lab studies for richer UI (e.g., timeline, trends)
12SIS Usage Data
- Internal deployment and evaluation
- 3000 people including, program management,
test, sales, development, administrative,
executives, etc. - Methods free-form feedback, questionnaires,
interviews, usage logs, laboratory experiments,
etc. - Personal store characteristics
- 5k 250k items
- Query characteristics
- Short queries (1.6 words)
- Few advanced operators or fielded search in query
box (7) - Many advanced operators and query iteration in UI
(48) - Filters (type, date) modify query re-sort
results
13SIS Usage Data, contd
- Characteristics of items opened
- File types opened
- 76 Email
- 14 Web pages
- 10 Files
- Age of items opened
- 5 today
- 21 within the last week
- 47 within the last month
- 50 of the cases -gt 36 days
- Web 11 days
- Mail 36 days
- Files 55 days
14SIS Usage Data, contd
15User Interface (UI) Alternatives
16SIS Usage Data, contd
- UI Usage
- Small effects of Top/Side, Previews/NoPreviews
- Large effect of Sort Order
- Date by far the most common sort field, even for
people who had Okapi Rank as default - Importance of time
- Few searches for best match many other
criteria
17Metadata vs. Best-match list
18SIS Usage Data, contd
- Observations about unified access
- Metadata quality is variable
- Email rich, pretty clean
- Web little, not very useful for retrieval
- Files some, but often wrong
- Need abstractions, e.g., Useful date, People,
Picture - Initially, used date seen
- But
- Appointment, when it happens
- File, when it is changed
- Email and Web, when it is seen
- Useful date abstraction
19SIS Usage Data, contd
- Ease of finding information
- Easier after SIS for web, email, files
- Non-SIS search decreases for web, email, files
- Additional benefits
- The ability to find misfiled documents and email
has been extremely helpful. -- A sales
executive in Washington D.C. - Thanks again for the MARVELOUS tool! I find
myself unable to live without it! It saves me at
least 10-15 minutes a day looking for
information saves even more time not having to
file things. It makes me more effective, as more
time goes to thinking and deciding, and less to
overhead. -- An executive in Redmond.
20SIS, Timeline w/ Landmarks
- SIS time as important access cue
- Importance of landmarks in human memory
- Identify and use landmarks to facilitate
information management and search - Timeline interface, augmented with landmarks
- General landmarks holidays, world events
- Personal landmarks important photos,
appointments - Heuristics or Bayesian models to identify
memorable events
21SIS, Timeline w/ Landmarks
22SIS, Timeline Experiment
With Landmarks
Without Landmarks
23Landmarks, key dependencies(from learned
graphical model)
24SIS, Visualizing Patterns
- Summarize the results of a search
- Abstraction beyond individual results
- Grid-based design
- Axes represent topic, time, people, etc.
- Cells encode frequency, recency
- Supports activities like
- What newsgroups are active (on topic x)?
- What people are active, authoritative (on topic
x)? - When did I last interact w/ people?
25SIS, Visualizing Patterns
26SIS, Grid vs. List Experiment
Grid View
List View
27Contextualized Search
- Search is not the end goal
- Need to support information management in the
context of ongoing work activities
28Context, Implicit Queries
Quick searches for people associated with the
message and Subject.
Background search on top k interesting terms from
message, based on users index Score tfdoc /
log(tfcorpus1)
Go to SIS for detailed search. Query autofills
with IQ terms.
Top N hits for this Implicit Query (IQ). Open
items directly.
29Personalized Search
- Today
- All users get the same results, independent of
previous search history, current context, etc. - Personalized Search Prototype
- Use rich client-side info to personalize search
results - Personal content (e.g., SIS index), Activities
- No profile setup or maintenance required
- All profile storage and processing client-side
30Summary
- SIS Unified index of stuff youve seen
- Heterogeneous content files, email, web, etc.
- Fast access to full-text and metadata
- Automatic and immediate update of index
- Studied usage with several techniques
- Supports many new capabilities for personal
information management - Search/metadata vs. single hierarchy
- Landmarks, patterns, implicit queries, etc.
- New directions
- Contextualized search
- Personalized search
- Sharing, Alerting, Metadata generation
31Vannevar Bushs VisionV. Bush (1945). As we may
think. Atlantic Monthly, 176, July 1945, 101-108.
- Consider a future device for individual use,
which is a sort of mechanized private file and
library. It needs a name, and, to coin one at
random, "memex" will do. A memex is a device in
which an individual stores all his books,
records, and communications, and which is
mechanized so that it may be consulted with
exceeding speed and flexibility. It is an
enlarged intimate supplement to his memory.
32Thank You
- Questions/Comments
- More info, http//research.microsoft.com/sdumais
- MSN-Desktop Search,
http//toolbar.msn.com
33(No Transcript)
34Information Access in Context
35Keeping Found Things Found(w/ Jones and Bruce)
- Study 1 Observe keeping activities as
participants complete work-related, web-intensive
tasks in their workplace - 23 participants researchers, information
professionals, managers detailed observations
and think aloud - Functional analysis
- Survey 1 Methods for keeping web information
250 participants - Study 2 Observe efforts to re-find web
information for a subset of these same
participants - Survey 2 Strategies of personal information
management 250 participants
36Keeping Found Things Found (w/ Jones and Bruce)
- Many methods observed for keeping (and
re-finding) web content - Bookmark the page
- Do nothing (and later search, or auto-complete,
or enter URL, or access from another web site) - Send email to others
- Print out the web page
- Send email to self
- Save the web page as a file
- Paste URL into a document
- Add hyperlink to a web site
- Write down the URL on paper
- Etc.
37Keeping Found Things FoundA Functional Analysis
38Next Steps
- Continued explorations of
- Rich UI, including usage data
- Personalization
- Retrieval in context
- Improved support for
- Sharing with others
- Discovery/alerting, i.e., Stuff I Should See
- User-generated metadata, i.e., Phlat
- Good search makes filing less important
- Attributes rather than directory locations
39(No Transcript)
40SIS, Visualizing Trends
- Summarize the results of a search
- Abstraction beyond individual results
- Grid-based design
- Axes represent topic, time, people
- Cells encode frequency, recency
- Supports activities like
- What newsgroups are active (on topic x)?
- What people are active, authoritative (on topic
x)? - When did I last interact w/ people?
41SIS, Visualizing Trends
42SIS, Grid vs. List Experiment
Grid View
List View
43Search Your Way
- Stuff Ive Seen (SIS)
- Unified search over your content (mail, files,
web, calendar, contacts, music, notes, rss, etc.) - Try (something like) it, MSN Desktop Search
(http//toolbar.msn.com) - Memory landmarks
- NewsJunkie
- Monitoring ongoing news events
- Identify stories that are novel, given what
youve already read -
44Personalized Web Search (PS) (w/
Jaime Teevan and Eric Horvitz)
- Web Search
- All users get the same results, independent of
previous search history, current context, etc. - Personalized Web Search
- Personalize search results, using rich
client-side information - Personal content (e.g., MSN-DS index), activities
- No profile setup or maintenance required
- All profile storage and processing client-side,
for improved privacy - User control over amount of personalization
45Personalized Search Demo
46Query bush
47Query traffic
48Personalized Search (PS) Overview
Step 1 Retrieve web search results, ngtgt10
User Model Content Activity Rich and
unstructured Client-Side All storage and
processing
49PS Theoretical Framework
Score S tfi wi
World
Client
Where N NR, ni niri
50PS Evaluation
- Results
- User Repr All SIS gt Recent SIS gt Web SIS gt Query
history gt None - Doc Repr Snippets in results set gt Full document
in results set - PS score Web rank, even better
51PS Evaluation
- How well does it work?
- Rich space of algorithmic and UI possibilities
- Experiment
- 15 participants, judge top 50 results, 137
queries - User Profile
- No Profile lt Query history lt Web SIS lt Recent SIS
lt All SIS - Document Representation
- Full document in results set lt Snippets in
results set - PS score Web rank, even better
- Internal deployment ongoing
52Vannevar Bushs Vision
- Consider a future device for individual use,
which is a sort of mechanized private file and
library. It needs a name, and, to coin one at
random, "memex" will do. A memex is a device in
which an individual stores all his books,
records, and communications, and which is
mechanized so that it may be consulted with
exceeding speed and flexibility. It is an
enlarged intimate supplement to his memory. - V. Bush (1945). As we may think. Atlantic
Monthly, 176, July 1945, 101-108.