Title: Haystack: an Adaptive Personalized Information Retrieval System
1Haystack an Adaptive Personalized Information
Retrieval System David Karger, Lynn Stein, Eytan
Adar, Mark Asdoorian, Aidan Low, Jing Qian, Orion
Richardson MIT Laboratory for Computer Science
and AI Laboratory, Cambridge MA 02139, USA
The Bookshelf Metaphor
Integration
What is Haystack?
- An information retrieval system focused on
exploiting interaction with individuals - complements large search engines
- treats different people differently
- Interesting research issues
- Heterogeneous Data Deal with the variety of
content individuals tend to collect - User Interface offer ubiquitous access
- Big Brother Develop user interface tools to
gather all possible information about users - Learning Develop mechanisms for letting past
user reactions influence future system actions - Collaboration Share data and metadata among a
large community of users
- Search Engines are Like Libraries
- Massive corpora Mostly irrelevant, often out of
date - Anonymous Treat all users exactly the same
- Rigid Use librarians one-for-all ontology
- People prefer to start with bookshelves
- My data Information gathered personally. High
quality, easy to understand. Annotated. - My organization owner-chosen subject partition.
Best items near desk. - Then they turn to colleagues
- Trust Colleagues are authorities on other
topics can recommend good data - Leverage Colleagues have organized their data
makes searching easy
- Haystack archives all user content, adds metadata
- Plug in components extract data from content
- Textifiers ascii, html, postscript, ocr....
- Field finders author, title, date, summary....
- Haystack mediates user-selected search tools
- Text search mg, verity, isearch, grep....
- Database LORE
- Haystack can be used without thinking
- access during all standard activities (mail, web,
edit) - application-specific stubs talk to kernel
- keyword search for file to edit
- archive and annotate current web page
Target Queries
Adaptation
Collaboration
What research is being done on multicast?
Where is the email about Haystack that I sent
Lynn last month?
- Goal improved performance over time
- Annotations by user
- user can add to/change all data/metadata
- requires active intervention, so undesirable
- Observation of query process
- user performs query haystack returns results
- user selects relevant result
- haystack records connection for future queries
- adapt using machine learning techniques
- Observation of general activity (proposed)
- items that are used a lot
- items that are used together
- items used after a search
- Leverage individual users Haystacks
- simple RPC to query other Haystacks data
- gather data from several combine evidence
- Exploits self interest
- individuals seek/organize info for own gain
- organization provides benefit to others
- Need to identify expert colleagues
- Haystacks of people I contact often
- Haystack with much overlapping data
- Haystacks that gave good answers in past
- referrals
- Collaborative Filtering model techniques
- CF finds stuff youll like
- Haystack finds stuff youll like for this query
Which DARPA BAAs should I read?
Current Status
- Prototype completed Fall 1997
- some functionality in all categories limited
extensibility - Kernel reimplemented from scratch, due Summer 1998
Check out Web Site at http//www.ai.mit.edu/proje
cts/haystack