Information retrieval: overview - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Information retrieval: overview

Description:

Corpus: fixed collection of documents, typically 'nice' docs (e.g., NYT articles) ... TF = term frequency in document. DF = doc frequency of term (# docs with term) ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 22

Provided by: raymie

Category:

Tags: documents | information | overview | retrieval

Transcript and Presenter's Notes

Title: Information retrieval: overview

1
Information retrieval overview
2
Information Retrieval and Text Processing

Huge literature dating back to the 1950s!
SIGIR/TREC - home for much of this
Readings
Salton, Wong, Yang, A Vector Space Model for
Automatic Indexing, CACM Nov 75 V18N11
Tutle, Croft, Inference Networks for Document
Retrieval, ???, OPTIONAL

3
IR/TP applications

Search
Filtering
Summarization
Classification
Clustering

Information extraction
Knowledge management
Author identification
and more...

4
Types of search

Recall -- finding documents one knows exists,
e.g., an old e-mail message or RFC
Discovery -- finding interesting documents
given a high-level goal
Classic IR search is focused on discovery

5
Classic discovery problem

Corpus fixed collection of documents, typically
nice docs (e.g., NYT articles)
Problem retrieve documents relevant to users
information need

6
Classical search
Task
Conception
Info Need
Formulation
Query
Search
Refinement
Corpus
Results
7
Definitions

Task example write a Web crawler
Information need perception of documents needed
to accomplish task, e.g., Web specs
Query sequence of characters given to a search
engine one hopes will return desired documents

8
Conception

Translating task into information need
Mis-conception identify too little (tips on
high-bandwidth DNS lookups) and/or too much (TCP
spec) as relevant to task
Sometimes a little extra breadth in results can
tip user off to need to refine info need, but not
much research into dealing with this automatically

9
Translation

Translating info need into query syntax of
particular search engine
Mis-translation get this wrong
Operator error (is a b ab or ab ?)
Polysemy -- same word, different meanings
Synonimy -- different words, same meaning
Automation NLP, easy syntax, query
expansion, QA

10
Refinement

Modification of query, typically in light of
particular results, to better meet info need
Lots of work of refining query automatically
(often with some input from user, e.g.,
relevance feedback)

11
Precision and recall

Classic metrics of search-result goodness
Recall fraction of all good docs retrieved
relevant results / all relevant docs in
corpus
Precision fraction of results that are good
relevant results / result-set size

12
Precision and recall

Recall/precision trade-off
Return everything gt great recall, bad precision
Return nothing gt great precision, bad recall
Precision curves
Search engine produces total ranking
Plot precision at 10, 20, .., 100 recall

13
Other metrics

Novelty / anti-redundancy
Information content of result set is disjoint
Comprehendible
Returned documents can be understood by user
Accurate / authoritative
Citation ranking!!
Freshness

14
Classic search techniques

Boolean
Ranked boolean
Vector space
Probabilistic / Bayesian

15
Term vector basics

Basic abstraction for information retrieval
Useful for measuring semantic similarity of text

A row in the above table is a term vector
Columns are word stems and phrases
Trying to capture meaning

16
Everythings a vector!!

Documents are vectors
Document collections are vectors
Queries are vectors
Topics are vectors

17
Cosine measurement of similarity

E1 . E2 / (E1E2) cos(E1,E2)
Rank docs against Qs, measure similarity of
docs, etc.
In example
cos(doc1, doc2) 1/3
cos(doc1, doc3) 2/3
cos(doc2, doc3) 1/2
So doc13 are closest

18
Weighting of terms in vectors

Saltons TFIDF
TF term frequency in document
DF doc frequency of term ( docs with term)
IDF inverse doc freq. 1/DF
Weight of term TF IDF
Importance of term determined by
Count of term in doc (high gt important)
Number of docs with term (low gt important)

19
Relevance-feedback in VSM

Rocchio formula
Q FQ, Relevant, Irrelevant
Where F is weighted sum, such as
Qt aQtbsum_i R_ itcsum_i I_ it

20
Remarks on VSM

Principled way of solving many IR/text processing
problems, not just search
Tons of variations on VSM
Different term weighting schemes
Different similarity formulas
Normalization itself is a huge sub-industry

21
All of this goes out on Web

Very small, unrefined queries
Recall not an issue
Quality is the issue (want most relevant)
Precision-at-ten matters (how many total losers)
Scale precludes heavy VSM techniques
Corpus assumptions (e.g., unchanging, uniform
quality) do not hold
Adversarial IR - new challenge on Web
Still, VSM important tool for Web Archeology

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Automated Storage and Retrieval System (ASRS) Market PowerPoint PPT Presentation

Automated Storage and Retrieval System (ASRS) Market - Automated storage & retrieval systems are inventory management systems commonly used in manufacturing centers, distribution facilities, and warehouses. They consist of variety of computer-controlled systems for automatically placing and retrieving loads from one site to another. These systems are used where high-volume loads are to be moved from one location to another, as they help provide quick, accurate, reliable, and low-cost solutions. Maximized storage capacity, increased warehouse efficiency, lower energy consumption, less waste, and lower maintenance costs, and potential lifespan of 20–25+ years are some of the factors that have supported long-term expansion for Automated Storage and Retrieval System (ASRS) Market. | PowerPoint PPT presentation | free to view

Lecture 22: Interfaces for Information Retrieval I PowerPoint PPT Presentation

Lecture 22: Interfaces for Information Retrieval I - Lecture 22: Interfaces for Information Retrieval I SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS | PowerPoint PPT presentation | free to view

Information Retrieval and Map-Reduce Implementations PowerPoint PPT Presentation

Information Retrieval and Map-Reduce Implementations - Information Retrieval and Map-Reduce Implementations Adopted from Jimmy Lin s s, which is licensed under a Creative Commons Attribution-Noncommercial-Share ... | PowerPoint PPT presentation | free to view

Blood Clot Retrieval Devices Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opportunity 2024-32 PowerPoint PPT Presentation

Blood Clot Retrieval Devices Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opportunity 2024-32 - According to the latest research report by IMARC Group, The global blood clot retrieval devices market size reached US$ 1.4 Billion in 2023. Looking forward, IMARC Group expects the market to reach US$ 4.8 Billion by 2032, exhibiting a growth rate (CAGR) of 14.22% during 2024-2032. More Info:- https://www.imarcgroup.com/blood-clot-retrieval-devices-market | PowerPoint PPT presentation | free to view

Evaluations in information retrieval PowerPoint PPT Presentation

Evaluations in information retrieval - Evaluations in information retrieval: summary The following gives an overview of approaches that are applied to assess the quality of information retrieval systems ... | PowerPoint PPT presentation | free to view

Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) PowerPoint PPT Presentation

Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) - Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) | PowerPoint PPT presentation | free to view

CSM06 Information Retrieval PowerPoint PPT Presentation

CSM06 Information Retrieval - CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway a.salway@surrey.ac.uk | PowerPoint PPT presentation | free to view

Multilinguales Information Retrieval PowerPoint PPT Presentation

Multilinguales Information Retrieval - Multilinguales Information Retrieval Ruprecht-Karls-Universit t Heidelberg HS Information Retrieval WS 01/02 Ana Kovatcheva | PowerPoint PPT presentation | free to view

Information Retrieval and Map-Reduce Implementations PowerPoint PPT Presentation

Information Retrieval and Map-Reduce Implementations - Information Retrieval and Map-Reduce Implementations Adopted from Jimmy Lin s s, which is licensed under a Creative Commons Attribution-Noncommercial-Share ... | PowerPoint PPT presentation | free to view

Airline Ticket Information Retrieval Using Mobile Agents PowerPoint PPT Presentation

Airline Ticket Information Retrieval Using Mobile Agents - Airline Ticket Information Retrieval Using Mobile Agents By: Geetha .N. Kapse Advisor: Dr. Chung-E Wang Second Reader: Dr. Weide Chang 04/29/03 California State ... | PowerPoint PPT presentation | free to view

Introduction to Information Retrieval Systems PowerPoint PPT Presentation

Introduction to Information Retrieval Systems - An IR System is a system capable of storage, retrieval, and maintenance of ... Audio: WAV, Real Audio. Image: GIF, JPEG, BMP... Logical Subsetting (Zoning) ... | PowerPoint PPT presentation | free to view

Music Representation, Searching, and Retrieval (a.k.a. Organization of and Searching in Musical Information) PowerPoint PPT Presentation

Music Representation, Searching, and Retrieval (a.k.a. Organization of and Searching in Musical Information) - Music Representation, Searching, and Retrieval (a.k.a. Organization of and Searching in Musical Information) Donald Byrd School of Informatics & School of Music | PowerPoint PPT presentation | free to view

Organization of and Searching in Musical Information (a.k.a. Music Representation, Searching, and Retrieval) PowerPoint PPT Presentation

Organization of and Searching in Musical Information (a.k.a. Music Representation, Searching, and Retrieval) - Organization of and Searching in Musical Information (a.k.a. Music Representation, Searching, and Retrieval) Donald Byrd School of Informatics & School of Music | PowerPoint PPT presentation | free to view

A Ranking Scheme for XML Information Retrieval Based on Benefit and Reading Effort PowerPoint PPT Presentation

A Ranking Scheme for XML Information Retrieval Based on Benefit and Reading Effort - A Ranking Scheme for XML Information Retrieval Based on Benefit and Reading Effort Toshiyuki Shimizu (Kyoto University) Masatoshi Yoshikawa (Kyoto University) | PowerPoint PPT presentation | free to view

Distributed Information Retrieval PowerPoint PPT Presentation

Distributed Information Retrieval - Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further ... | PowerPoint PPT presentation | free to view

Scalable Information Extraction and Integration PowerPoint PPT Presentation

Scalable Information Extraction and Integration - Scalable Information Extraction and Integration Eugene Agichtein Microsoft Research Emory University Sunita Sarawagi IIT Bombay | PowerPoint PPT presentation | free to view

Introduction of NCBI overview PowerPoint PPT Presentation

Introduction of NCBI overview - Genome information resources (????????) DNA sequence analysis(??????) ... Searched directly using any search term or phrase (in the same way as the ... | PowerPoint PPT presentation | free to view

Annotation Free Information Extraction PowerPoint PPT Presentation

Annotation Free Information Extraction - Annotation Free Information Extraction Chia-Hui Chang Department of Computer Science & Information Engineering National Central University chia@csie.ncu.edu.tw | PowerPoint PPT presentation | free to view

Financial Aid: An Overview PowerPoint PPT Presentation

Financial Aid: An Overview - Financial Aid: An Overview Presented by Dave Marsteller Assistant Director, Office of Student Financial Aid, The University of Akron 330-972-7032 or 1-800-621-3748 | PowerPoint PPT presentation | free to view

Information Retrieval PowerPoint PPT Presentation

Information Retrieval - Information Retrieval Non-Textual Materials 2 | PowerPoint PPT presentation | free to view

Information Retrieval Systems Info624 Week 1 PowerPoint PPT Presentation

Information Retrieval Systems Info624 Week 1 - Info624 Week 1. Dr. Min Song. College of Information Science and Technology ... ever ... Which statement do you like the best? It is easy to find just about ... | PowerPoint PPT presentation | free to view

Designing Information Architecture for Search PowerPoint PPT Presentation

Designing Information Architecture for Search - Designing Information Architecture for Search Tutorial: SIGIR 2001 Marti Hearst University of California, Berkeley www.sims.berkeley.edu/~hearst NSF CAREER Grant ... | PowerPoint PPT presentation | free to view

Overview of Query Evaluation PowerPoint PPT Presentation

Overview of Query Evaluation - Title: Overview of Query Evaluation Subject: Database Management Systems Author: Raghu Ramakrishnan and Johannes Gehrke Keywords: Chapter 12 Last modified by | PowerPoint PPT presentation | free to view

Top Key-Players To Deepen The Growth Curve For Oocyte Retrieval Products Market PowerPoint PPT Presentation

Top Key-Players To Deepen The Growth Curve For Oocyte Retrieval Products Market - Oocyte retrieval also referred to as egg collection. Oocyte retrieval is a procedure wherein viable eggs are removed from the ovaries, for the in-vitro fertilization process. | PowerPoint PPT presentation | free to view

Evaluating Internet Resource Information PowerPoint PPT Presentation

Evaluating Internet Resource Information - Evaluating Internet Resource Information The central work of life is interpretation. ... | PowerPoint PPT presentation | free to view

SIMS 247 Information Visualization and Presentation PowerPoint PPT Presentation

SIMS 247 Information Visualization and Presentation - SIMS 247 Information Visualization and Presentation Marti Hearst March 15, 2002 Outline Why Text is Tough Visualizing Concept Spaces Clusters Category Hierarchies ... | PowerPoint PPT presentation | free to view

Information Retrieval PowerPoint PPT Presentation

Information Retrieval - Modern Information Retrieval by Ricardo Baeza-Yates and Berthier ... cars, Le Mans, France, tourism. Retrieval. Browsing. Database. CSE 8337 Spring 2003. 7 ... | PowerPoint PPT presentation | free to view