Under The Hood Part I WebBased Information Architectures - PowerPoint PPT Presentation

About This Presentation

Title:

Under The Hood Part I WebBased Information Architectures

Description:

The Vector Space Model for IR (VSM) Evaluation Metrics for IR ... Discard articles, auxiliaries, prepositions, ... typically 100-300 most frequent ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 28

Provided by: cjin

Learn more at: https://www.andrew.cmu.edu

Category:

Tags: webbased | architectures | hood | information | part | prepositions | under

Transcript and Presenter's Notes

Title: Under The Hood Part I WebBased Information Architectures

1
Under The Hood Part IWeb-Based Information
Architectures

MSEC 20-760 Mini II
28-October-2003
Jaime Carbonell

2
Topics Covered

The Vector Space Model for IR (VSM)
Evaluation Metrics for IR
Query Expansion (the Rocchio Method)
Inverted Indexing for Efficiency
A Glimpse into Harder Problems

3
The Vector Space Model

Definitions of document and query vectors, where
wj jth word, and c(wj,di) count the
occurrences of wi in document dj

4
Computing the Similarity

Dot-product similarity
Cosine similarity

5
Computing Norms and Products

Dot product
Eucledian vector norm (aka 2-norm)

6
Similarity in Retrieval

Similarity ranking
If sim(q,di) gt sim(q,dj), di ranks higher
Retrieving top k documents

7
Refinements to VSM (1)

Word normalization
Words in morphological root form
countries gt country
interesting gt interest
Stemming as a fast approximation
countries, country gt countr
moped gt mop
Reduces vocabulary (always good)
Generalizes matching (usually good)
More useful for non-English IR
(Arabic has gt 100 variants per verb)

8
Refinements to VSM (2)

Stop-Word Elimination
Discard articles, auxiliaries, prepositions, ...
typically 100-300 most frequent small words
Reduce document length by 30-40
Retrieval accuracy improves slightly (5-10)

9
Refinements to VSM (3)

Proximity Phrases
E.g. "air force" gt airforce
Found by high-mutual information
p(w1 w2) gtgt p(w1)p(w2)
p(w1 w2 in k-window) gtgt
p(w1 in k-window) p(w2 in same k-window)
Retrieval accuracy improves slightly (5-10)
Too many phrases gt inefficiency

10
Refinements to VSM (4)

Words gt Terms
term word stemmed word phrase
Use exactly the same VSM method on terms (vs
words)

11
Evaluating Information Retrieval (1)
Recall a/(ac) fraction of relevant
retrieved Precision a/(ab) fraction of
retrieved that is relevant

Contingency table

12
Evaluating Information Retrieval (2)

P a/(ab) R a/(ac)
Accuracy (ad)/(abcd)
F1 2PR/(PR)
Miss c/(ac) 1 - R
(false negatives)
F/A b/(abcd)
(false positives)

13
Evaluating Information Retrieval (3)

11-point precision curves
IR system generates total ranking
Plot precision at 10, 20, 30 ... recall,

14
Query Expansion (1)

Observations
Longer queries often yield better results
Users vocabulary may differ from document
vocabulary
Q how to avoid heart disease
D "Factors in minimizing stroke and cardiac
arrest Recommended dietary and exercise
regimens"
Maybe longer queries have more chances to help
recall.

15
Query Expansion (2)

Bridging the Gap
Human query expansion (user or expert)
Thesaurus-based expansion
Seldom works in practice (unfocused)
Relevance feedback
Widen a thin bridge over vocabulary gap
Adds words from document space to query
Pseudo-Relevance feedback
Local Context analysis

16
Relevance FeedbackRocchios Method

Idea update the query via user feedback
Exact method (vector sums)

17
Relevance Feedback (2)

For example, if
Q (heart attack medicine)
W(heart,Q) W(attack,Q) W(medicine,Q) 1
Drel (cardiac arrest prevention medicine
nitroglycerine heart disease...)
W(nitroglycerine,D) 2, W(medicine,D) 1
Dirr (terrorist attack explosive semtex attack
nitroglycerine proximity fuse...)
W(attack,D) 1, W(nitroglycerine 2),
W(explosive,D) 1
AND a 1, ß 2, ? .5

18
Relevance Feedback (3)

Then
W(attack,Q) 11 - 0.51 0.5
W(nitroglycerine, Q)
W(medicine, Q)
w(explosive, Q)

19
Term Weighting Methods (1)

Saltons TfIDf
Tf term frequency in a document
Df document frequency of term
documents in collection
with this term
IDf Df-1

20
Term Weighting Methods (2)

Saltons TfIDf
TfIDf f1(Tf)f2(IDf)
E.g. f1(Tf) Tfave(Dj)/D
E.g. f2(IDf) log2(IDF)
f1 and f2 can differ for Q and D

21
Efficient Implementations of VSM (1)

Exploit sparseness
Only compute non-zero multiplies in dot-products
Do not even look at zero elements (how?)
gt Use non-stop terms to index documents

22
Efficient Implementations of VSM (2)

Inverted Indexing
Find all unique stemmed terms in document
collection
Remove stopwords from word list
If collection is large (over 100,000 documents),
Optionally remove singletons
Usually spelling errors or obscure names
Alphabetize or use hash table to store list
For each term create data structure like

23
Efficient Implementations of VSM (3)

term IDFtermi,
ltdoci, freq(term, doci )
docj, freq(term, docj )
...gt
or
term IDFtermi,
ltdoci, freq(term, doci), pos1,i, pos2,i, ...
docj, freq(term, docj), pos1,j, pos2,j, ...
...gt
posl,1 indicates the first position of term in
documentj and so on.

24
Open Research Problems in IR (1)

Beyond VSM
Vectors in different Spaces
Generalized VSM, Latent Semantic Indexing...
Probabilistic IR (Language Modeling)
P(DQ) P(QD)P(D)/P(Q)

25
Open Research Problems in IR (2)

Beyond Relevance
Appropriateness of doc to user comprehension
level, etc.
Novelty of information in doc to user
anti-redundancy as approx to novelty

26
Open Research Problems in IR (3)

Beyond one Language
Translingual IR
Transmedia IR

27
Open Research Problems in IR (4)

Beyond Content Queries
"Whats new today?"
"What sort of things to you know about"
"Build me a Yahoo-style index for X"
"Track the event in this news-story"

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Under The Hood Part I WebBased Information Architectures PowerPoint PPT Presentation

Under The Hood Part I WebBased Information Architectures - ... arrest prevention medicine. nitroglycerine heart disease...) W(nitroglycerine,D) = 2, W(medicine,D) = 1 ... Then, the dictionary is (alphabetically sorted) ... | PowerPoint PPT presentation | free to view

Under The Hood Part II WebBased Information Architectures PowerPoint PPT Presentation

Under The Hood Part II WebBased Information Architectures - Under The Hood [Part II] Web-Based Information Architectures. MSEC 20-760. Mini II. Jaime Carbonell. Today's Topics. Term weighting in detail ... | PowerPoint PPT presentation | free to view

CLR: Under the Hood PowerPoint PPT Presentation

CLR: Under the Hood - Internals knowledge not needed to write .NET apps. Quick ... Bits & bytes under the hood. New metadata tables. Flags defines variance and special constraints ... | PowerPoint PPT presentation | free to view

Mahabalipuram Manuments - Part 3 (Rathas) PowerPoint PPT Presentation

Mahabalipuram Manuments - Part 3 (Rathas) - The Pallavas contributions to temple architecture are many, of which conceiving temples sculpted out of single blocks of stone would remain the most important. There are as many as eight in Mamallapuram, each of which has certain special features. The Panch-pandava group is the most important, in which the Dharmaraja Ratha stands out as the best, containing some exquisite sculptures never found later in this part of India. | PowerPoint PPT presentation | free to view

Radiant Logic Training Part One: Welcome PowerPoint PPT Presentation

Radiant Logic Training Part One: Welcome - Radiant Logic Training Part One: Welcome * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * The Merge Tree Wizard is used for merging multiple ... | PowerPoint PPT presentation | free to view

Huge Variety of G35 Body Kits & Carbon Fiber Hood and Trunk PowerPoint PPT Presentation

Huge Variety of G35 Body Kits & Carbon Fiber Hood and Trunk - Widest collection of body kit, G35 body kits, honda civic body kits, mitsubishi eclipse body kits, carbon fiber hood and trunk – Here at Body Kits Depot. No matter what you want, you can find exactly what you need. | PowerPoint PPT presentation | free to view

Kitchen Vent Hoods PowerPoint PPT Presentation

Kitchen Vent Hoods - We are the best wall mount range hood company. Search our list of kitchen vent hoods and enjoy free shipping. | PowerPoint PPT presentation | free to view

Find G35 Body Kits Carbon Fiber Hood Honda Civic Body Kits Online PowerPoint PPT Presentation

Find G35 Body Kits Carbon Fiber Hood Honda Civic Body Kits Online - Get ample amount of option on G35 body kits, carbon fiber hood, Honda civic body kits, Mitsubishi eclipse body kits at our online store. Shop one or more 350z body kits at affordable prices. | PowerPoint PPT presentation | free to view

Universal Access in the Information Society: Achievements, Challenges and Promises PowerPoint PPT Presentation

Universal Access in the Information Society: Achievements, Challenges and Promises - Universal Access in the Information Society: Achievements, Challenges and Promises Constantine Stephanidis Institute of Computer Science Foundation for Research and ... | PowerPoint PPT presentation | free to view

HospitalSoftwareShop - Laboratory Information System LIS PowerPoint PPT Presentation

HospitalSoftwareShop - Laboratory Information System LIS - HospitalSoftwareShop provides exhaustive LIS Software - Laboratory Information System -for Pathology Labs. Being web-based makes lab software globally accessible. Software for labs offers widest range of pathology tests, direct integration with lab equipment without manual intervention reduces errors and time | PowerPoint PPT presentation | free to view

HospitalSoftwareShop - Radiology Information System RIS PowerPoint PPT Presentation

HospitalSoftwareShop - Radiology Information System RIS - HospitalSoftwareShop provides exhaustive RIS Software - Radiology Information System. Being web-based makes radiology software globally accessible. Software for radiologists offers widest range of diagnostics tests, direct integration with radiology equipment without manual intervention reduces errors and time, RIS is DICOM compliant | PowerPoint PPT presentation | free to view

Best Things to know about the Range Hood PowerPoint PPT Presentation

Best Things to know about the Range Hood - This natural beauty and the ability to blend in with the traditional and contemporary, while still be an individual piece of eye-catching beauty is one of the reasons many choose these range hoods. Add to this the high-tech design and top of the range components and you have the ultimate balance of beauty and functionality. | PowerPoint PPT presentation | free to view

Latest Release: Global Island Extractor Hood Market 2017 Industry Growth and Key Opportunities PowerPoint PPT Presentation

Latest Release: Global Island Extractor Hood Market 2017 Industry Growth and Key Opportunities - The Global Island Extractor Hood Market Research Report 2017 provides a detailed Island Extractor Hood industry overview along with the analysis of industry’s gross margin, cost structure, consumption value and sale price. The key companies of the market, manufacturers, distributors along with the latest development trends and forecasts are detailed in the report. | PowerPoint PPT presentation | free to view

North Mountain Gear Shadow Brown Camouflage Complete Camo Leafy 3D Hunting System Outdoors Suit Jacket Pants Hood Lightweight review PowerPoint PPT Presentation

North Mountain Gear Shadow Brown Camouflage Complete Camo Leafy 3D Hunting System Outdoors Suit Jacket Pants Hood Lightweight review - It is a really nice 3D leafy case with large leaves, it is a two-part suit which extends from large to large body to wounds from different sizes. It's very lightweight, silent. The fabric is made of polyester; It is comfortable and breathable. It is a big 3D leafy case with a great beautiful page. | PowerPoint PPT presentation | free to view

Future Market Trends of Global Exhaust Fume Collecting Hood Market 2021 PowerPoint PPT Presentation

Future Market Trends of Global Exhaust Fume Collecting Hood Market 2021 - Get a Sample Brochure @ https://tinyurl.com/y8o57ufm Exhaust Fume Collecting Hood Report by Material, Application, and Geography – Global Forecast to 2021 is a professional and in-depth research report on the world's major regional market conditions, focusing on the main regions (North America, Europe and Asia-Pacific) and the main countries (United States, Germany, united Kingdom, Japan, South Korea and China). | PowerPoint PPT presentation | free to view

Vent Hood Cleaning Houston PowerPoint PPT Presentation

Vent Hood Cleaning Houston - We offer commercial kitchen exhaust vent hood cleaning, restaurant equipment & floor cleaning, exhaust fan hinge kit installation cleaning services in Houston, TX. We understand your needs so that you can focus on more important matters. | PowerPoint PPT presentation | free to view

Restaurant Vent Hood Cleaning Houston PowerPoint PPT Presentation

Restaurant Vent Hood Cleaning Houston - We offer commercial kitchen exhaust vent hood cleaning, restaurant equipment & floor cleaning, exhaust fan hinge kit installation cleaning services in Houston, TX. We understand your needs so that you can focus on more important matters. Website: https://www.venthoodcleaningtx.com/ | PowerPoint PPT presentation | free to view

Types of Commercial Kitchen Hood & its Benefits PowerPoint PPT Presentation

Types of Commercial Kitchen Hood & its Benefits - Demand for kitchen hood is on peak nowadays. A large portion of the hoods are installed above the cooking area. Know its amazing benefits. | PowerPoint PPT presentation | free to view

How To Choose The Best Kitchen Hood For Your Kitchen PowerPoint PPT Presentation

How To Choose The Best Kitchen Hood For Your Kitchen - If you’re feeling jaded with your flavors, here are some simple tips to help you put the zest back into your cooking with Newmatic kitchen hood. | PowerPoint PPT presentation | free to view

Benefits Of Kitchen Hood In Kenya - Newmatic Appliances PowerPoint PPT Presentation

Benefits Of Kitchen Hood In Kenya - Newmatic Appliances - The first and foremost benefit of Kitchen Hood In Kenya is that it clears the atmosphere by filtering air and eliminates dust particles and toxins from the room. | PowerPoint PPT presentation | free to view

IKECA Certified Hood Cleaning Services PowerPoint PPT Presentation

IKECA Certified Hood Cleaning Services - We perform cleaning services by adhering to all rules and guidelines and are also known as IKECA certified cleaning company. Hood Cleaners of America reduces risk of kitchen fires, improves ventilation for smoke and odor removal, enhances working environment for staff and compliance with local and national fire and health codes through its services. To know more about us please visit at: http://www.hoodcleanersofamerica.com/ | PowerPoint PPT presentation | free to view

IKECA Certified Hood Cleaning Services (1) PowerPoint PPT Presentation

IKECA Certified Hood Cleaning Services (1) - We perform cleaning services by adhering to all rules and guidelines and are also known as IKECA certified cleaning company. Hood Cleaners of America reduces risk of kitchen fires, improves ventilation for smoke and odor removal, enhances working environment for staff and compliance with local and national fire and health codes through its services. To know more about us please visit at: http://hoodcleanersofamerica.com/ | PowerPoint PPT presentation | free to view

Advantages of choosing a range hood for your kitchen PowerPoint PPT Presentation

Advantages of choosing a range hood for your kitchen - Everyday cooking can let cooking fumes and smoke settle on the kitchen walls after some time. This dirt and grease on the walls make it difficult to clean the kitchen surfaces. Installing a perfect kitchen hood helps cook and breathe in a clean environment. Read More: https://wholesalewoodhoods.blogspot.com/2020/03/advantages-of-choosing-range-hood-for.html | PowerPoint PPT presentation | free to view

Tips To Select Metal Stove Hood For Your Kitchen? PowerPoint PPT Presentation

Tips To Select Metal Stove Hood For Your Kitchen? - While designing the interior space of home we usually opt to imbibe or follow the designs and concepts suggested by our interior designers or our friends and relatives. Besides all these suggestions and ideas at some corner of our mind, we have already drafted what kind of design, color, and type of furniture or interior space we want, and that’s what we call our dream house concept. Visit - https://wholesalewoodhoods.home.blog/2020/11/25/tips-to-select-metal-stove-hood-for-your-kitchen/ | PowerPoint PPT presentation | free to view

What are the different types of Laboratory Fume Hoods? – Aakar Scientific PowerPoint PPT Presentation

What are the different types of Laboratory Fume Hoods? – Aakar Scientific - Are You Looking for Laboratory Fume Hood for your scientific lab? Then Aakar Scientific has been one of the leading manufacturers and suppliers of laboratory Fume Hood,walk in and bench mounted fume hoods. Contact us today! | PowerPoint PPT presentation | free to view

Making Installation Easy Of Kitchen Hood With Two Piece Wood Hoods PowerPoint PPT Presentation

Making Installation Easy Of Kitchen Hood With Two Piece Wood Hoods - Wholesalewoodhoods introduced a very innovative range of two-piece wood hoods, to provide their customers with a range of range wood hoods that is innovative, easy to install, and powerful performer to vent the unwanted fumes, smoke, vapor, heat out of the kitchen space. It helps in making the kitchen space pleasant and filled with fresh-pure air. Read More at: https://wholesalewoodhoods.com/blog/ | PowerPoint PPT presentation | free to view

Know-How Many CFM Is Best For Your Kitchen Range Hood; Complete Guide With Specifications and Advantages - Wholesale Wood Hoods PowerPoint PPT Presentation

Know-How Many CFM Is Best For Your Kitchen Range Hood; Complete Guide With Specifications and Advantages - Wholesale Wood Hoods - We are going to discuss the best hood liner available in the market that is the Broan hood liner with two different models and CFM in this article, Broan 250 CFM and Broan 390 CFM. Broan hood liner is the best in the industry having power pack performance with longevity and easy maintenance. Read More: https://wholesalewoodhoods.com/know-how-many-cfm-is-best-for-your-kitchen-range-hood-complete-guide-with-specifications-and-advantages/ | PowerPoint PPT presentation | free to view