Title: SIMS 202 Information Organization and Retrieval
1SIMS 202Information Organization and Retrieval
- Prof. Marti Hearst and Prof. Ray Larson
- UC Berkeley SIMS
- Tues/Thurs 930-1100am
- Fall 2000
- Modern IR textbook topics
- The Information Seeking Process
3Information Retrieval (IR)
- Information
- Representation
- Storage
- Organization
- Access
4Information vs. Data Retrieval
- IR concerned more with retrieving information
about a subject than the retrieving data which
satisfies a given query - IR usually deals with natural language text which
is not always well structures and could be
semantically ambiguous
5Textbook Topics
6More Detailed View
7What Well Cover
A Lot
A Little
8Search and RetrievalOutline of Part I of SIMS 202
- The Search Process
- Information Retrieval Models
- Content Analysis/Zipf Distributions
- Evaluation of IR Systems
- Precision/Recall
- Relevance
- User Studies
- System and Implementation Issues
- Web-Specific Issues
- User Interface Issues
- Special Kinds of Search
9What is an Information Need?
10The Standard Retrieval Interaction Model
11Standard Model
- Assumptions
- Maximizing precision and recall simultaneously
- The information need remains static
- The value is in the resulting document set
12Problem with Standard Model
- Users learn during the search process
- Scanning titles of retrieved documents
- Reading retrieved documents
- Viewing lists of related topics/thesaurus terms
- Navigating hyperlinks
- Some users dont like long disorganized lists of
13Search is an Iterative Process
14Berry-Picking as an Information Seeking
Strategy (Bates 90)
- Standard IR model
- assumes the information need remains the same
throughout the search process - Berry-picking model
- interesting information is scattered like berries
among bushes - the query is continually shifting
15A sketch of a searcher moving through many
actions towards a general goal of satisfactory
completion of research related to an information
need. (after Bates 89)
16Berry-picking model (cont.)
- The query is continually shifting
- New information may yield new ideas and new
directions - The information need
- is not satisfied by a single, final retrieved set
- is satisfied by a series of selections and bits
of information found along the way.
17Information Seeking Behavior
- Two parts of a process
- search and retrieval
- analysis and synthesis of search results
- This is a fuzzy area we will look at several
different working theories.
18Search Tactics and Strategies
- Search Tactics
- Bates 79
- Search Strategies
- Bates 89
- ODay and Jeffries 93
19Tactics vs. Strategies
- Tactic short term goals and maneuvers
- operators, actions
- Strategy overall planning
- link a sequence of operators together to achieve
some end
20Information Search Tactics (after Bates 79)
- Monitoring tactics
- keep search on track
- Source-level tactics
- navigate to and within sources
- Term and Search Formulation tactics
- designing search formulation
- selection and revision of specific terms within
search formulation
21Term Tactics
- Move around the thesaurus
- superordinate, subordinate, coordinate
- neighbor (semantic or alphabetic)
- trace -- pull out terms from information already
seen as part of search (titles, etc) - morphological and other spelling variants
- antonyms (contrary)
22Source-level Tactics
- Bibble
- look for a pre-defined result set
- e.g., a good link page on web
- Survey
- look ahead, review available options
- e.g., dont simply use the first term or first
source that comes to mind - Cut
- eliminate large proportion of search domain
- e.g., search on rarest term first
23Source-level Tactics (cont.)
- Stretch
- use source in unintended way
- e.g., use patents to find addresses
- Scaffold
- take an indirect route to goal
- e.g., when looking for references to obscure
poet, look up contemporaries - Cleave
- binary search in an ordered file
24Monitoring Tactics(strategy-level)
- Check
- compare original goal with current state
- Weigh
- make a cost/benefit analysis of current or
anticipated actions - Pattern
- recognize common strategies
- Correct Errors
- Record
- keep track of (incomplete) paths
25Additional Considerations(Bates 79)
- Add a Sort tactic!
- More detail is needed about short-term
cost/benefit decision rule strategies - When to stop?
- How to judge when enough information has been
gathered? - How to decide when to give up an unsuccesful
search? - When to stop searching in one source and move to
26Lexis-Nexis Interface
- What tactics did you use?
- What strategies did you use?
- Interfaces should make it easy to store
intermediate results - Interfaces should make it easy to follow trails
with unanticipated results - Makes evaluation more difficult.
28Orienteering (ODay Jeffries 93)
- Interconnected but diverse searches on a single,
problem-based theme - Focus on information delivery rather than search
performance - Classifications resulting from an extended
observational study - 15 clients of professional intermediaries
- financial analyst, venture capitalist, product
marketing engineer, statistician, etc.
29Orienteering (ODay Jeffries 93)
- Identified three main search types
- Monitoring
- Following a plan
- Exploratory
- A series of interconnected but diverse searches
on one problem-based theme - Changes in direction caused by triggers
- Each stage followed by reading, assimilation, and
analysis of resulting material.
30Orienteering (ODay Jeffries 93)
- Defined three main search types
- monitoring
- a well-known topic over time
- e.g., research four competitors every quarter
- following a plan
- a typical approach to the task at hand
- e.g., improve business process X
- exploratory
- explore topic in an undirected fashion
- get to know an unfamiliar industry
31Orienteering (ODay Jeffries 93)
- Trends
- A series of interconnected but diverse searches
on one problem-based theme - This happened in all three search modes
- Each analyst did at least two search types
- Each stage followed by reading, assimilation, and
analysis of resulting material
32Orienteering (ODay Jeffries 93)
- Searches tended to trigger new directions
- Overview, then detail, repeat
- Information need shifted between search requests
- Context of problem and previous searches were
carried to next stage of search - The value was contained in the accumulation of
search results, not the final result set - These observations verified Bates predictions.
33Orienteering (ODay Jeffries 93)
- Triggers motivation to switch from one strategy
to another - next logical step in a plan
- encountering something interesting
- explaining change
- finding missing pieces
34Stop Conditions (ODay Jeffries 93)
- Stopping conditions not as clear as for triggers
- People stopped searching when
- no more compelling triggers
- finished an appropriate amount of searching for
the task - specific inhibiting factor
- e.g., learning market was too small
- lack of increasing returns
- 80/20 rule
- Missing information/inferences ok
- business world different than scholarship
35After the Search Analyzing and Synthesizing
Search Results
- Orienteering Post-Search Behaviors
- Read and Annotate
- Analyze 80 fell into six main types
36Post-Search Analysis Types (ODay Jeffries 93)
- Trends
- Comparisons
- Aggregation and Scaling
- Identifying a Critical Subset
- Assessing
- Interpreting
- The rest
- cross-reference
- summarize
- find evocative visualizations
- miscellaneous
37SenseMaking (Russell et al. 93)
- The process of encoding retrieved information to
answer task-specific questions - Combine
- internal cognitive resources
- external retrieved resources
- Create a good representation
- an iterative process
- contend with a cost/benefit tradoff
38Sensemaking (Russell et al. 93)
- Most of the effort is in the synthesis of a good
representation - covers the data
- increase usability
- decrease cost-of-use
- The information access process
- Berry picking/orienteering offer an alternative
to the standard IR model - More difficult to assess results
- Interactive search behavior can be analyzed in
terms of tactics and strategies - Sensemaking
- Combining searching with the use of the results
of search.
40Next Time
- IR Systems Overview
- Query Languages
- Boolean Model
- Boolean Queries