Databases and Information Retrieval PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Databases and Information Retrieval


1
Databases and Information Retrieval Lecture
1 Basics of Databases and Information Retrieval
Instructor Mr. Gautam Das University of Texas at
Arlington Email gdas_at_cse.uta.edu
2
Database
IR
  • Data
  • Collection of Documents Unstructured piece of
    information
  • Follows Rank and Relevance query model
  • Output is the document
  • Consist of Schema
  • Relational Model
  • Data stored in form of tables
  • Follow typical Query Model and Joins
  • Output in form of tuples which are made of joins
    from one or more tables

3
Types of Queries
  • Conjunctive Queries
  • Car , Accident
  • Will search for the word either Car or
    Accident.
  • General Boolean Queries
  • Car Accident Arlington
  • Will Search for words Car and Accident
    but should not have word Arlington.

4
Retrieval Models of IR
  • Boolean Retrieval Model
  • Ranked / Relevance Retrieval Model
  • One which is missing in databases

5
Parameters Used for Ranking in Typical
Information Retrieval System
  • Parameter 1
  • Occurrence and Frequency
  • The number of times the specified word occurs in
    the document decides the rank
  • The position it occurs at e.g. Title, Sub Title.

6
Parameters Used for Ranking in Typical
Information Retrieval System
  • Parameter 2
  • Proximity
  • If two or more words are specified in the search
    string then the documents containing those words
    near to each other should be ranked higher.

7
Parameters Used for Ranking in Typical
Information Retrieval System
  • Parameter 3
  • Stemming
  • Uses various verbal forms of word for seraching.
  • E.g. Run gt Ran, Run over, Running
  • Exact match of word should be ranked higher
  • E.g. If the word info is searched then the
    document containing word infotech should be
    ranked after the document containing exact match
    as info.

8
Parameters Used for Ranking in Typical
Information Retrieval System
  • Parameter 4
  • Frequency across Documents
  • The words like a, an, the etc. should be
    suppressed as more probability is that those are
    irrelevant as far as searching criteria is
    concerned.
  • If we are searching for Microsoft Corporation
    then the specific word Microsoft is more
    important than the general word Corporation

9
Parameters Used for Ranking in Typical
Information Retrieval System
  • Parameter 5
  • Page Access Frequency
  • If the page is accessed more number of times i.e.
    If the page is popular then it should be ranked
    higher
  • This kind of ranking requires to maintain log
    about the frequency of page access
  • Useful in case of systems which store News,
    Stories or readable articles.

10
Parameters Used for Ranking in Typical
Information Retrieval System
  • Parameter 6
  • Number of In-Links to the Page
  • It is the number of times other pages on web are
    having links to the page be ranked.
  • Again a parameter for deciding the popularity of
    a page.
Write a Comment
User Comments (0)
About PowerShow.com