Susan Dumais - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Susan Dumais

Description:

General landmarks: holidays, world events. Personal landmarks: important photos, appointments ... Content Activity * Rich and unstructured. Client-Side: * All ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 48
Provided by: SUSAND59
Category:
Tags: dumais | susan

less

Transcript and Presenter's Notes

Title: Susan Dumais


1
Personal Information Management Helping Finders
Become Keepers
  • Susan Dumais
  • Microsoft Research
  • http//research.microsoft.com/sdumais

2
Outline
  • Personal information management today
  • Stuff Ive Seen (SIS)
  • Research prototype system
  • Deployment experiences, usage data
  • (MSN Toolbar Desktop Search MS Vista)
  • Future directions
  • Contextualized search
  • Personalized search

3
Personal Information Mgt.
  • Information acquisition vs. management/access
  • Easy to acquire and create information of many
    types
  • E.g., email, docs, web pages, calendars,
    pictures, music, etc.
  • 100 gig drives
  • Hard to organize
  • And, even harder to re-find
  • Information discovery vs. re-use
  • Many tools for finding information
  • Fewer tools for keeping information
  • Yet, many tasks involve re-using information

4
Search Today
  • Information silos
  • Many locations, interfaces for finding things
    (e.g., web, mail, contacts, docs, photos, notes)
  • Often slow

5
Search With SIS
  • Unified index of stuff youve seen
  • All types of info (e.g., files, email, calendar,
    contacts, web pages, rss, im)
  • Index full-content plus metadata (e.g., time,
    author, title, size, usage)
  • Automatic and immediate update of index
  • Rich UI possibilities, since its your content
    (e.g., consider usage)
  • Get back to information youve seen
  • Re-use vs. initial discovery

6
Related Work
  • Several systems for improving access for specific
    sources (e.g., web, mail, files, photos, music)
  • Some integration across sources
  • Keeping Found Things Found Jones et al., 2001
  • Lifestreams/Scopeware Fertig, Freeman,
    Gelernter, 1996
  • Haystack Adar et al., 1999 Huynh et al. 2002
  • MyLife Bits Gemmell, Bell et al., 2002
  • Commercial systems
  • OS Mac Spotlight/Sherlock, MS Windows Indexing
    Service
  • DS Apps Enfish, 80-20, dtSearch, X1, Copernic,
    G-DS, MSN-DS, etc.
  • Whats new with SIS
  • Full content and metadata for many different
    sources
  • Extensible architecture (gather, filter, word
    break)
  • Focus on user interface and user experience
  • Design guided by usage experiences and
    experimental data

7
SIS Design Principles
  • Indexing experience
  • No additional work is required
  • User sees something, and it gets indexed
  • Retrieval experience
  • Fast, flexible
  • Interactive refinement
  • Sort and filter on metadata
  • Note Sort/filter automatically triggers query
  • UI innovations
  • Previews, Top/Side, Previews
  • Richer visualizations

8
SIS Demo
9
MS Search Architecture
10
SIS Architecture
  • Indexing infrastructure uses MS Search components
    (note IR platform)
  • Gatherer interface to content sources, e.g.,
    files, http, MAPI
  • Filters decode different file types, e.g.,
    word, powerpoint, html, pdf, journal notes
  • Tokenizer break into words, including date
    normalization, stemming, etc.
  • Indexer standard inverted index
  • Retriever Boolean, best match (Okapi), fielded
  • User interface
  • Client side indexing and storage

11
Evaluating SIS
  • Internal deployment
  • 2500 users
  • Users include program management, test, sales,
    development, administrative, executives, etc.
  • Research techniques
  • Free-form feedback
  • Questionnaires Structured interviews
  • Usage patterns from log data
  • UI experiments (randomly deploy different
    versions)
  • Lab studies for richer UI (e.g., timeline, trends)

12
SIS Usage Data
  • Internal deployment and evaluation
  • 3000 people including, program management,
    test, sales, development, administrative,
    executives, etc.
  • Methods free-form feedback, questionnaires,
    interviews, usage logs, laboratory experiments,
    etc.
  • Personal store characteristics
  • 5k 250k items
  • Query characteristics
  • Short queries (1.6 words)
  • Few advanced operators or fielded search in query
    box (7)
  • Many advanced operators and query iteration in UI
    (48)
  • Filters (type, date) modify query re-sort
    results

13
SIS Usage Data, contd
  • Characteristics of items opened
  • File types opened
  • 76 Email
  • 14 Web pages
  • 10 Files
  • Age of items opened
  • 5 today
  • 21 within the last week
  • 47 within the last month
  • 50 of the cases -gt 36 days
  • Web 11 days
  • Mail 36 days
  • Files 55 days

14
SIS Usage Data, contd
15
User Interface (UI) Alternatives
16
SIS Usage Data, contd
  • UI Usage
  • Small effects of Top/Side, Previews/NoPreviews
  • Large effect of Sort Order
  • Date by far the most common sort field, even for
    people who had Okapi Rank as default
  • Importance of time
  • Few searches for best match many other
    criteria

17
Metadata vs. Best-match list
18
SIS Usage Data, contd
  • Observations about unified access
  • Metadata quality is variable
  • Email rich, pretty clean
  • Web little, not very useful for retrieval
  • Files some, but often wrong
  • Need abstractions, e.g., Useful date, People,
    Picture
  • Initially, used date seen
  • But
  • Appointment, when it happens
  • File, when it is changed
  • Email and Web, when it is seen
  • Useful date abstraction

19
SIS Usage Data, contd
  • Ease of finding information
  • Easier after SIS for web, email, files
  • Non-SIS search decreases for web, email, files
  • Additional benefits
  • The ability to find misfiled documents and email
    has been extremely helpful. -- A sales
    executive in Washington D.C.
  • Thanks again for the MARVELOUS tool! I find
    myself unable to live without it! It saves me at
    least 10-15 minutes a day looking for
    information saves even more time not having to
    file things. It makes me more effective, as more
    time goes to thinking and deciding, and less to
    overhead. -- An executive in Redmond.

20
SIS, Timeline w/ Landmarks
  • SIS time as important access cue
  • Importance of landmarks in human memory
  • Identify and use landmarks to facilitate
    information management and search
  • Timeline interface, augmented with landmarks
  • General landmarks holidays, world events
  • Personal landmarks important photos,
    appointments
  • Heuristics or Bayesian models to identify
    memorable events

21
SIS, Timeline w/ Landmarks
22
SIS, Timeline Experiment
With Landmarks
Without Landmarks
23
Landmarks, key dependencies(from learned
graphical model)
24
SIS, Visualizing Patterns
  • Summarize the results of a search
  • Abstraction beyond individual results
  • Grid-based design
  • Axes represent topic, time, people, etc.
  • Cells encode frequency, recency
  • Supports activities like
  • What newsgroups are active (on topic x)?
  • What people are active, authoritative (on topic
    x)?
  • When did I last interact w/ people?

25
SIS, Visualizing Patterns
26
SIS, Grid vs. List Experiment
Grid View

List View
27
Contextualized Search
  • Search is not the end goal
  • Need to support information management in the
    context of ongoing work activities

28
Context, Implicit Queries
Quick searches for people associated with the
message and Subject.
Background search on top k interesting terms from
message, based on users index Score tfdoc /
log(tfcorpus1)
Go to SIS for detailed search. Query autofills
with IQ terms.
Top N hits for this Implicit Query (IQ). Open
items directly.
29
Personalized Search
  • Today
  • All users get the same results, independent of
    previous search history, current context, etc.
  • Personalized Search Prototype
  • Use rich client-side info to personalize search
    results
  • Personal content (e.g., SIS index), Activities
  • No profile setup or maintenance required
  • All profile storage and processing client-side

30
Summary
  • SIS Unified index of stuff youve seen
  • Heterogeneous content files, email, web, etc.
  • Fast access to full-text and metadata
  • Automatic and immediate update of index
  • Studied usage with several techniques
  • Supports many new capabilities for personal
    information management
  • Search/metadata vs. single hierarchy
  • Landmarks, patterns, implicit queries, etc.
  • New directions
  • Contextualized search
  • Personalized search
  • Sharing, Alerting, Metadata generation

31
Vannevar Bushs VisionV. Bush (1945). As we may
think. Atlantic Monthly, 176, July 1945, 101-108.
  • Consider a future device for individual use,
    which is a sort of mechanized private file and
    library. It needs a name, and, to coin one at
    random, "memex" will do. A memex is a device in
    which an individual stores all his books,
    records, and communications, and which is
    mechanized so that it may be consulted with
    exceeding speed and flexibility. It is an
    enlarged intimate supplement to his memory.

32
Thank You
  • Questions/Comments
  • More info, http//research.microsoft.com/sdumais
  • MSN-Desktop Search,
    http//toolbar.msn.com

33
(No Transcript)
34
Information Access in Context
35
Keeping Found Things Found(w/ Jones and Bruce)
  • Study 1 Observe keeping activities as
    participants complete work-related, web-intensive
    tasks in their workplace
  • 23 participants researchers, information
    professionals, managers detailed observations
    and think aloud
  • Functional analysis
  • Survey 1 Methods for keeping web information
    250 participants
  • Study 2 Observe efforts to re-find web
    information for a subset of these same
    participants
  • Survey 2 Strategies of personal information
    management 250 participants

36
Keeping Found Things Found (w/ Jones and Bruce)
  • Many methods observed for keeping (and
    re-finding) web content
  • Bookmark the page
  • Do nothing (and later search, or auto-complete,
    or enter URL, or access from another web site)
  • Send email to others
  • Print out the web page
  • Send email to self
  • Save the web page as a file
  • Paste URL into a document
  • Add hyperlink to a web site
  • Write down the URL on paper
  • Etc.

37
Keeping Found Things FoundA Functional Analysis
38
Next Steps
  • Continued explorations of
  • Rich UI, including usage data
  • Personalization
  • Retrieval in context
  • Improved support for
  • Sharing with others
  • Discovery/alerting, i.e., Stuff I Should See
  • User-generated metadata, i.e., Phlat
  • Good search makes filing less important
  • Attributes rather than directory locations

39
(No Transcript)
40
SIS, Visualizing Trends
  • Summarize the results of a search
  • Abstraction beyond individual results
  • Grid-based design
  • Axes represent topic, time, people
  • Cells encode frequency, recency
  • Supports activities like
  • What newsgroups are active (on topic x)?
  • What people are active, authoritative (on topic
    x)?
  • When did I last interact w/ people?

41
SIS, Visualizing Trends
42
SIS, Grid vs. List Experiment
Grid View

List View
43
Search Your Way
  • Stuff Ive Seen (SIS)
  • Unified search over your content (mail, files,
    web, calendar, contacts, music, notes, rss, etc.)
  • Try (something like) it, MSN Desktop Search
    (http//toolbar.msn.com)
  • Memory landmarks
  • NewsJunkie
  • Monitoring ongoing news events
  • Identify stories that are novel, given what
    youve already read

44
Personalized Web Search (PS) (w/
Jaime Teevan and Eric Horvitz)
  • Web Search
  • All users get the same results, independent of
    previous search history, current context, etc.
  • Personalized Web Search
  • Personalize search results, using rich
    client-side information
  • Personal content (e.g., MSN-DS index), activities
  • No profile setup or maintenance required
  • All profile storage and processing client-side,
    for improved privacy
  • User control over amount of personalization

45
Personalized Search Demo
46
Query bush
47
Query traffic
48
Personalized Search (PS) Overview
Step 1 Retrieve web search results, ngtgt10
User Model Content Activity Rich and
unstructured Client-Side All storage and
processing
49
PS Theoretical Framework
Score S tfi wi
World
Client
Where N NR, ni niri
50
PS Evaluation
  • Results
  • User Repr All SIS gt Recent SIS gt Web SIS gt Query
    history gt None
  • Doc Repr Snippets in results set gt Full document
    in results set
  • PS score Web rank, even better

51
PS Evaluation
  • How well does it work?
  • Rich space of algorithmic and UI possibilities
  • Experiment
  • 15 participants, judge top 50 results, 137
    queries
  • User Profile
  • No Profile lt Query history lt Web SIS lt Recent SIS
    lt All SIS
  • Document Representation
  • Full document in results set lt Snippets in
    results set
  • PS score Web rank, even better
  • Internal deployment ongoing

52
Vannevar Bushs Vision
  • Consider a future device for individual use,
    which is a sort of mechanized private file and
    library. It needs a name, and, to coin one at
    random, "memex" will do. A memex is a device in
    which an individual stores all his books,
    records, and communications, and which is
    mechanized so that it may be consulted with
    exceeding speed and flexibility. It is an
    enlarged intimate supplement to his memory.
  • V. Bush (1945). As we may think. Atlantic
    Monthly, 176, July 1945, 101-108.
Write a Comment
User Comments (0)
About PowerShow.com