Stuff I - PowerPoint PPT Presentation

About This Presentation
Title:

Stuff I

Description:

Search with Stuff I've Seen (SIS) ... Unified index of stuff you've seen ... Identify Stuff I Should See. Flat-land. Good search makes filing less important ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 24
Provided by: susand9
Category:
Tags: stuff

less

Transcript and Presenter's Notes

Title: Stuff I


1
Stuff Ive Seen
A System for Personal Information Retrieval and
Re-Use
  • Susan Dumais
  • Microsoft Research
  • http//research.microsoft.com/sdumais

2
Outline
  • Search today
  • Search with Stuff Ive Seen (SIS)
  • With Edward Cutrell, JJ Cadiz, Gavin Jancke,
    Raman Sarin, Daniel Robbins
  • Experiences with SIS
  • Deployment
  • Usage data
  • UI innovations
  • Next steps for SIS

3
Search Today
  • Many locations, interfaces for finding things
    (e.g., web, mail, local files, help, history,
    notes)

the No.1 question we're trying to solve in
Longhorn is Where's my stuff? Right now,
file space on any PC is a cesspool. Bill
Gates, FORTUNE interview, June 23, 2002
  • Often slow

4
Search With SIS
  • Unified index of stuff youve seen
  • All types of information, e.g., files of all
    types, email, calendar, contacts, web pages, etc.
  • Full-text index of content plus metadata
    attributes (e.g., creation time, author, title,
    size)
  • Automatic and immediate update of index
  • Rich UI possibilities, since its your content
  • Get back to information youve seen
  • Re-use vs. initial discovery

5
Related Work
  • Several systems for improving access for specific
    sources (e.g., web, mail, files, photos, music)
  • Some integration across sources
  • KFTF Jones et al., 2002
  • Lifestreams/Scopeware Fertig, Freeman,
    Gelernter, 1996
  • MyLife Bits Gemmell et al., 2002
  • Haystack Adar et al., 1999 Huynh et al. 2002
  • Commercial products
  • OS Mac Sherlock, Windows Indexing Service
  • Apps Enfish, 80-20 retriever, dtSearch, X1, etc.
  • Whats new with SIS
  • Full content and metadata for many different
    sources
  • Extensible architecture
  • Usage experiences and experimental data
  • UI focus

6
SIS Architecture
  • Indexing infrastructure uses MS Search components
    (note IR platform)
  • Gatherer interface to content sources, e.g.,
    files, http, MAPI
  • Filters decode different file types, e.g.,
    word, powerpoint, html, pdf, journal
  • Tokenizer break into words, including date
    normalization, stemming, etc.
  • Indexer standard inverted index
  • Retriever Boolean, best match (Okapi)
  • User interface
  • Client side indexing and storage

7
SIS Design Principles
  • Indexing
  • No additional work is required
  • User sees something, and it gets indexed
  • Retrieval
  • Fast, flexible
  • Interactive refinement
  • Sort and filter on metadata
  • Note Sort/filter automatically triggers query
  • UI experiments
  • Previews, Top/Side, Previews, Richer
    visualizations
  • Richer visualizations

8
SIS Demo
9
SIS Demo Points
  • Search
  • Fast
  • Integrates content from many places
  • Search by full-text or properties, including null
    queries
  • Sort and filter results
  • Update index in real time, with no explicit user
    action
  • ?Right-click and other advanced functionality
  • ?Saved queries, queries from other apps, IQ
  • UI alternatives
  • Top/Side
  • Preview/Not
  • Default sort order

10
Evaluating SIS
  • Internal deployment
  • 1500 downloads
  • Users include program management, test, sales,
    development, administrative, executives, etc.
  • Research techniques
  • Free-form feedback
  • Questionnaires Structured interviews
  • Usage patterns from log data
  • UI experiments (randomly deploy different
    versions)
  • Lab studies for richer UI (e.g., timeline,
    trends)
  • But even here must work with users own content

11
(No Transcript)
12
SIS Usage Data
  • Detailed analysis for 234 people, 6 weeks usage
  • Personal store characteristics
  • 5k 100k items index lt150 meg
  • Query characteristics
  • Short queries (1.59 words)
  • Few advanced operators or fielded search in query
    box (7.5)
  • Frequent use of query iteration (48)
  • 50 refined queries involve filters type, date
    most common
  • 35 refined queries involve changes to query
  • 13 refined queries involve re-sort
  • Query content
  • Vs. Spink et al.s analysis of web queries
  • Importance of people
  • 29 of the queries involve peoples names

13
SIS Usage Data, contd
  • Characteristics of items opened
  • File types opened
  • 76 Email
  • 14 Web pages
  • 10 Files
  • Age of items opened
  • 7 today
  • 22 within the last week
  • 46 within the last month
  • Ease of finding information
  • Easier after SIS for web, email, files
  • Non-SIS search decreases for web, email, files

14
SIS Usage, contd
  • UI Usage
  • Small effects of Top/Side, Previews
  • Sort order
  • Date by far the most common sort field, even for
    people who had Okapi Rank as default
  • Importance of time
  • Few searches for best match many other
    criteria

15
SIS Usage, contd
  • Observations about unified access
  • Metadata quality is variable
  • Email rich, pretty clean
  • Web little, not very useful for retrieval
  • Files some, but often wrong
  • Human annotation dont depend on it
  • Need abstractions, e.g., Useful date
  • Initially, used date seen
  • But
  • Appointment, when it happens
  • Email and Web, seen
  • Files, changed
  • What do people remember about time?
  • Memory landmarks

16
SIS, Timeline w/ Landmarks
  • Timeline interface
  • Augmented with landmarks as pointers into human
    memory
  • General holidays, world events
  • Personal important photos, appointments
  • Heuristics or Bayesian models to identify
    memorable events

17
SIS, Timeline w/ Landmarks
18
SIS, Timeline Experiment
With Landmarks
Without Landmarks
19
SIS, Visualizing Trends
  • Summarize the results of a search
  • Abstraction beyond individual results
  • Grid-based design
  • Axes represent topic, time, people
  • Cells encode frequency, recency
  • Supports activities like
  • What newsgroups are active (on topic x)?
  • What people are active, authoritative (on topic
    x)?
  • When did I last interact w/ people?

20
SIS, Visualizing Trends
21
SIS, Grid vs. List Experiment
Grid View

List View
22
Next Steps
  • Continue explorations of rich UI
  • Augment index with usage data
  • SIS as service, with many entry points
  • Contextualize retrieval
  • Retrieve using Implicit Queries
  • Identify Stuff I Should See
  • Flat-land
  • Good search makes filing less important
  • Attributes rather than directory locations

23
SIS Summary
  • Unified index of stuff youve seen
  • Fast access to full-text and metadata
  • Heterogeneous content files, email, web, etc.
  • Automatic and immediate update of index
  • Studied usage with several techniques
  • Ease of finding improves with SIS
  • Importance of people and time
  • Short queries, quick iteration
  • Novel UI to leverage personal memories
  • New capabilities for personal information
    management
  • More info, http//research.microsoft.com/sdumais

24
Vannevar Bushs Vision
  • Consider a future device for individual use,
    which is a sort of mechanized private file and
    library. It needs a name, and, to coin one at
    random, "memex" will do. A memex is a device in
    which an individual stores all his books,
    records, and communications, and which is
    mechanized so that it may be consulted with
    exceeding speed and flexibility. It is an
    enlarged intimate supplement to his memory.
  • V. Bush (1945). As we may think. Atlantic
    Monthly, 176, July 1945, 101-108.
Write a Comment
User Comments (0)
About PowerShow.com