LAST WEEK - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

LAST WEEK

Description:

LAST WEEK Retrieval evaluation Why? How? Recall and precision Venn s Diagram & Contingency Table WMES3103 INFORMATION RETRIEVAL WEEK 5 QUERY LANGUAGES AND ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 46
Provided by: fsktm
Category:
Tags: last | week | pattern | syntactic

less

Transcript and Presenter's Notes

Title: LAST WEEK


1
LAST WEEK
  • Retrieval evaluation
  • Why?
  • How?
  • Recall and precision Venns Diagram
    Contingency Table

2
WMES3103INFORMATION RETRIEVAL
  • WEEK 5
  • QUERY LANGUAGES
  • AND OPERATION

3
QUERY LANGUAGES
  • Will cover the different kinds of queries sent to
    text retrieval systems.
  • Will show the different types of query that a
    user can formulate.
  • Normally, the main and most popularly used type
    of user query is the keyword-based retrieval.

4
  • Different queries are continuously sent to an
    IRS.
  • Most query languages use the content (semantics)
    and the structure of the text (text syntax) to
    find the relevant documents.
  • At times, the IRS may fail to trace and retrieve
    the relevant documents.
  • Therefore, we need to use a number of techniques
    which will hopefully enhance the query and this
    will enable us to retrieve an acceptable level of
    relevant documents.

5
QUERY LANGUAGES
  • eg. use of thesaurus, synonyms, stemming,
    stopwords, etc
  • A keyword is a word that can be retrieved by an
    IRS.
  • The retrieval unit is the basic element which can
    be retrieved by the system as an answer to a
    query also known as documents
  • A retrieval unit can be a file, document, Web
    page, paragraph, or some other structural unit
    which contains the answer to the query.

6
Example Keyword
  • Keyword used is artificial intelligence

7
Example Retrieval unit
  • website

8
Example Retrieval unit
  • document

9
TYPES OF QUERY LANGUAGES
  • Keyword-based querying
  • Single-word
  • Context
  • Boolean
  • Natural language
  • Pattern matching
  • Structural queries
  • Form-like fixed
  • Hypertext
  • Hierarchical

10
KEYWORD-BASED QUERYING
  • Query formulation of a user information need.
  • Query a keyword or a number of keywords a
    basic query
  • Documents containing such keywords are searched
    for in the IRS.
  • Keyword-based queries are popular because
  • Intuitive
  • Easy to express
  • Allows for fast ranking.

11
Single-word query
  • Simplest form of query that can be formulated in
    an IRS.
  • Text document long sequences of words.
  • The IRS will look at the text and search for the
    word.
  • Result of a word query a set of documents
    containing at least one of the words of the
    query.
  • Set of documents will be ranked according to the
    degree of similarity to the query.

12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Single-word query
  • Ranking done via word occurences inside the text
  • Most popularly used term frequency counts the
    number of times a word appears inside a document

16
Context query
  • Singleword queries are complemented with the
    ability to search for words in a given context
    near other words.
  • Words which appear near other words may indicate
    a higher possibility of relevance than if they
    appear apart.
  • 2 type of queries
  • phrase
  • proximity

17
  • Phrase sequence of single-word queries.
  • Proximity more relaxed version of the phrase
    query.
  • Phrase is given together with a maximum allowed
    distance between them.
  • Distance measured in characters or words
    depending on the system

18
Example ABI-INFORM (CD)
  • PRE/n first keyword precede second keyword by
    up to n words.
  • European pre/1 community the word European must
    precede the word community by up to 1 word
    European economic community, European flavoured
    community
  • W/n first keyword must be within n words of the
    second keyword.
  • computer w/1 data the word computer must be
    within 1 word of the word data computer
    generated data, computer simulated data, data
    mining computer

19
Example COMPENDEX
  • Desired proximity of keywords specified with full
    stops between keywords
  • back..basics back to basics, back to the
    basics
  • Keywords must appear in the same sentence type
    in the keywords separated by an underscore
  • computer_medicine
  • Search for a phrase type in each keyword
    separated by a space will search for the phrase
    with the 2 keywords next to each other and in the
    specified order
  • artificial intelligence

20
Boolean query
  • Oldest form of keyword query use of Boolean
    operators
  • Typical Boolean query words operators.
  • Given 2 basic keyword queries A and B
  • A or B - selects all documents with the word A or
    B.
  • A and B selects all documents with A and B
  • A not B selects all documents with the word A
    but without the word B.
  • Represented by Venns Diagram

21
Boolean operator AND
22
Boolean operator OR
23
Boolean operator NOT
24
Natural Language
  • User determines the keywords that should be
    eliminated and are not useful for searching.
  • Ranking for documents with these keywords would
    be very low.

25
TARGET - Dialog
  • ? target
  • Input search terms separated by spaces ( e.g. DOG
    CAT FOOD). You can enhance your TARGET search
    with the following options
  •  
  • -          PHRASES are enclosed in single quotes
    (e.g. DOG FOOD)
  • -          SYNONYMS are enclosed in parentheses
    (e.g. (DOG CANINE))
  • -          SPELLING variations are indicated with
    a ? (e.g. DOG? To search for DOG, DOGS)
  • -          Terms that MUST be present are flagged
    with an asterisk (e.g. DOG FOOD)
  •  
  • Q QUIT H HELP
  •  
  • ? komodo dragon food diet nutrition
  • Your TARGET search request will retrieve up to 50
    of the statistically relevant records.
  • Searching 1997 1998 records only
  • Processing Complete
  •  
  • Your search retrieved 50 records
  • Press ENTER to browse results C Customize
    display Q Quit H Help

26
Pattern Matching
  • More specific query formulation
  • Retrieve pieces of text that have some property.
  • Used in the retrieval of text statistics, data
    extraction, etc.
  • A pattern is a set of syntactic features that
    must occur in a text segment.
  • Segments that fulfils the pattern specifications
    pattern match

27
Pattern Matching
  • Interested in documents containing segments which
    match the given search pattern.
  • Each IRS will allow some degree of search
    pattern.
  • Very simple or very complex.
  • The more powerful the set of patterns allowed,
    the more involved are the queries that can be
    formulated by the user, and the more complex is
    the implementation of the search.

28
Pattern Matching
  • Words a word in the text, most basic pattern.
  • Prefixes the beginning of a text word eg.
    prefix comput will retrieve all documents
    containing the words such as computers,
    computing, computation, computational, etc.
  • Suffixes - the termination of a text word eg.
    prefix ters will retrieve all documents
    containing the words such as monsters, posters,
    potters, painters, etc.

29
Pattern Matching
  • Substrings can appear within a text word eg.
    tal will retrieve all documents containing the
    words such as coastal, talk, metallic, pedestal,
    etc.
  • Ranges A pair of strings which matches any word
    lying between them in lexicographical order eg.
    range between words held and hold will retrieve
    strings such as hoax, hissing, helm, help, etc.

30
Pattern Matching
  • Allowing errors A word together with an error
    threshold
  • will retrieve all text words which are similar to
    the given word.
  • errors are caused by typing, spelling, etc.
  • most accepted model is the Levenshtein distance
    or edit distance.

31
Pattern Matching
  • Example Edit distance between COLOR and COLOUR
    is 1, SURVEY and SURGERY is 2. Therefore, in the
    query, we must specify the maximum number of
    allowed errors for a word to match the pattern.

32
Structural Queries
  • Based on structure of the text
  • 3 structures fixed, hypertext, hierarchical
  • The user will query the text based on the
    structure.
  • Query language nowadays integrates both contents
    and structural queries.

33
  • Example UM Library OPAC records
  • Example of query fi au ali and subject malaysia

34
(No Transcript)
35
(No Transcript)
36
Query Protocols
  • Protocol a strict set of rules that govern the
    exchange of information between computer devices
  • Query languages used automatically by software
    applications to query text databases.
  • Some are standards for querying CD-ROMs or as
    intermediate languages to query library systems.
  • Not intended for human use refer as protocols
    and not languages.

37
Query Protocols
  • Z39.50 query bibliographical information using a
    standard interface between the client and the
    host database manager which is independent of the
    client user interface and of the query database
    language at the host. Originally used for
    bibliographical information based on MARC format.
  • WAIS Wide Area Information Service popular
    before Web network publishing protocol and can
    query databases through the Internet.

38
www.ukoln.ac.uk/dlis/z3950/
39
(No Transcript)
40
Protocols for CD-ROM
  • Allows for flexibility in data communication
    between primary information providers and end
    users.
  • Significant cost savings - allows access to a
    variety of information without the need to buy,
    install, and train users for different data
    retrieval applications.
  • 3 protocols has been recommended
  • CCL (Common Comand Language)
  • CD-RDx (Compact Disk Read only Data exchange)
  • SFQL (Structured Full-text Query Language)

41
QUERY OPERATIONS
  • Users - difficult to formulate queries which are
    well-designed for retrieval purposes because they
    do not know the collection make-up and the
    retrieval environment.
  • Web search engines users spend a lot of time
    reformulating their queries to get effective
    retrieval.
  • First query formulation retrieve documents and
    examine for relevance - construct new improved
    query formulations - retrieve documents and
    examine for relevance - process is repeated until
    the user is satisfied.

42
QUERY OPERATIONS
  • 2 processes involved
  • expanding the original query with new terms
  • reweighting the terms in the expanded query.
  • 2 ways of improving initial query formulation
  • approaches based on feedback information from the
    user
  • approaches based on information derived from the
    set of documents initially retrieved (called the
    local set of documents)

43
User Relevance Feedback
  • Most popular query formulation strategy.
  • User is presented with a list of retrieved
    documents, examines them, and marks those which
    are relevant.
  • Only the top 10 or 20 ranked documents need to be
    examined.
  • Separates into relevant and non-relevant.
  • Select important terms attached to the retrieved
    and relevant documents only, and enhance
    importance of terms in new query formulation.

44
User Relevance Feedback
  • Expect new query will move towards the relevant
    documents and away from the non-relevant ones.
  • Advantages
  • Protects the user from the details of the query
    reformulation process because all the user has to
    do is reuse the terms
  • Breaks down the entire search process into a
    sequence of small steps which are easier to
    grasp.
  • Provides a control process designed to emphasis
    some terms and deemphasis others.

45
Automatic Local Analysis
  • Documents retrieved for a given query are
    examined immediately to determine terms for query
    expansion.
  • Similar to relevance feedback cycle but done
    without the assistance of the user automatic.
  • Local feedback strategies are based on expanding
    the query with terms correlated to the query
    terms local clusters built from local documents
    set.
Write a Comment
User Comments (0)
About PowerShow.com