WIRED Week 3 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

WIRED Week 3

Description:

'Theory of Clumps' Treasury of words. How deep are the relationships? ... Scientific communication. Communication (Theory) Psychology. Information Systems ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 22
Provided by: bert189
Category:
Tags: wired | week

less

Transcript and Presenter's Notes

Title: WIRED Week 3


1
WIRED Week 3
  • Key Concepts in IR
  • Mozilla Firefox
  • Projects Papers

2
Key Concepts in IR
  • Understanding the System
  • Cant read users minds
  • Cant know about documents
  • Evaluation is key
  • Information Needs
  • More like this
  • Starting points, guides
  • Topics, Subjects
  • Documents
  • Images
  • Text, Natural Languages
  • A query as a text
  • Not just (simple) question answering

3
Aboutness Subject Indexing
  • What is aboutness?
  • Meaning of a document
  • Abstract or Topic(s) of a document
  • How you (or someone else) uses the document
  • Kinds of questions the document can answer
  • How can we uncover aboutness?
  • Author(s), Time, Date, Location, Format
  • Relationships, Sturctures, Markup, Metadata
  • Use, Recall, Popularity

4
Subtleties of Aboutness
  • Can we characterize a whole document
  • Parts of a document
  • Each part, different descriptions ( uses)
  • Do you need the document if youve got a good
    summary?
  • Not just text summaries
  • Use origination data
  • How do you extract key information?
  • Understand the context
  • Frequency Rarity
  • NLP, Genres, Keyword indicators
  • Sentence diagrams to the extreme?
  • Novelty of informaiton, expectations for
    education
  • Politics of description

5
Aboutness the Web
  • Rapid broad analysis
  • Let users define aboutness
  • Different users more descriptions
  • Lots of users, lots to select from
  • A system to average rank aboutness
    descriptions?
  • World Wide - means different cultures
  • More older documents with many more very new
    documents
  • Differences and its like that one
  • Internal consistency vs. flexiblity context

6
Testing Index Language Devices
  • What are the different ways to represent
    documents?
  • Systems are faster, but designs differ
  • Can you represent them in more than one way?
  • At once?
  • By audience?
  • Not just terms, but relationships between terms
  • What language do you use to represent docs?
  • Structure Flexible
  • Consistent Understandable (human computer)
  • Dewey, LoC, Dublin Core
  • Data structures, XML, Situational-Temporal
  • What if you indexed documents by terms queries?
  • Can you get too complex?
  • Good for the user vs. good for the system

7
Indexers Issues
  • Staff for evaluation
  • How is the system used?
  • Card catalogs
  • Search engine results pages
  • Natural language queries NL answers
  • Vocabulary of document, index or user impacts?
  • Syntactic indexing
  • use of headings which display the relationship
    between the various elements, as distinct from
    those which merely show existence of several
    attributes relevant to the subject indexed. p 98

8
Preparation of an Index
  • Assess document subject
  • Related to users
  • Concepts keywords
  • Translate assessment into index language
  • Add to index
  • Make concept analysis for answering questions
  • Will users understand find document
  • How helpful (ranking)
  • Match concepts to index (to document)
  • Rebuild enable updated index

9
Index Language parts
  • Controlled vocabulary (p 99)
  • Specific terms for relevance (p100)
  • Measuring for performance
  • Precision
  • Recall
  • (Relevance)
  • With the Web, we dont know how many total
    documents for a subject or how many are correct
  • With the Web, we dont know how documents are
    described or indexed
  • Metatags
  • Keywords
  • Indexing databases
  • Crawling updating

10
Thesaurus
  • Theory of Clumps
  • Treasury of words
  • How deep are the relationships?
  • Can relationships relevance be measured?
  • How specific can one be?
  • Not just alphabetical, topical
  • Purposes of a Thesaurus (p 112)
  • Which are most important?
  • Whats missing?

11
Variety of Thesaurus formats
  • Rogets
  • Alphabetical with cross indexing
  • Subject categories (as numbers)
  • Ordering
  • Sub-ordering
  • Relationships
  • Language issues, syntax completeness (phrases)
  • Shifted, inverted rotated
  • complications -- IR systems

12
Terms
  • Number of terms
  • Singular, plural
  • Phrases, quotes, cliches
  • Desciptive, contextual
  • Symbols
  • Homographs Thesaurofacets
  • Just a few ways to impose formats structure
  • What are some other methods?

13
Layouts Display of Thesauri
  • Most dynamic area
  • Making it easier to build thesauri
  • Get whole or specific picture
  • Expose structure to users
  • For understanding
  • For approval
  • Graphical displays
  • Browsing
  • Trees, Flowcharts, Maps
  • Colors, shapes, sizes

14
Revising, Adding Relations
  • Most issues in reading minor in systems now
  • New problems in issues of scale
  • Generate new vs. add to existing?
  • Where do the experts fit in?
  • Building a set of rules
  • Beyond formats
  • Testing for internal consistency
  • How do you link or merge two thesauri?
  • Little merges into larger?
  • More detailed encompasses less?
  • Can you ever get agreement?

15
Problem Structures HCI
  • A call to make IR systems more usable
  • Let users search systems themselves
  • Make systems work more like users think they
    should (for what year?)
  • Is a search like a dialogue?
  • Person to person
  • Person to machine
  • Multiple questions answers to get to the point
  • Understanding language behavior
  • Do what I mean, not what I say
  • Indentifying the problem
  • Focusing the question (related to the available
    documents)
  • User familiarity with system

16
Interaction, step 1 for Evaluation
  • Benchmarks for evaluation
  • How would a person ask this question?
  • What kind of answers are received?
  • How are subtle expectations met?
  • How long or comprehensive is the question or the
    answer?
  • How is this different for Web IR?
  • What advantages do both physical virtual search
    systems have?

17
Relevance Review Framework
  • Finding the needle in a haystack
  • A few documents in a collection
  • Possible that no documents are perfectly relevant
  • Not just a content match
  • Dependent on the user situation

18
Relevance the system
  • Relevance as a point of measurement
  • Different fields gague relevance differently
  • Scientific communication
  • Communication (Theory)
  • Psychology
  • Information Systems
  • False Drops vs. Completeness
  • Rarity value of information
  • Precision Recall probabilities of finding
    relevance
  • Tests were numerical, binary structured

19
Relevance is no good?
  • Very hard to define, should be ignored?
  • Too human centered
  • A gradual process moving towards the correct
    information
  • Cooper Utility
  • Quality, novelty, importance, credibility
  • Wilsons Situational Relevance
  • Psychological Logical relevance
  • Matching vs. Satisfying
  • Situational
  • Relevance numbers

20
Relevance Future Work
  • Knowledge and (the) knower
  • Selection
  • Inference
  • Mapping
  • Dynamics
  • Association
  • Redundancy
  • p161

21
Projects and/or Papers Overview
  • How can (Web) IR be better?
  • Better IR models
  • Better User Interfaces
  • More to find vs. easier to find
  • Scriptable applications
  • New interfaces for applications
  • New datasets for applications
Write a Comment
User Comments (0)
About PowerShow.com