Query Structures - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Query Structures

Description:

Query Structures – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 34
Provided by: hjs60
Category:

less

Transcript and Presenter's Notes

Title: Query Structures


1
Query Structures
2
  • Boolean Queries
  • Vector Queries
  • Extended Boolean Queries
  • Fuzzy Queries
  • Probabilistic Queries
  • c.f. Natural Language Queries, DB Queries

3
Query Structures
  • Query? ?? (Document?? ??)
  • ??, ??(syntax) ???? ??
  • ??? ??? ? ??
  • Parallel Process for Matching
  • Document Side
  • Data? ??-gtdocument? ??(ectosystem)
  • Document-gtinternal represention-gtformat for
    matching(endosystem)
  • Query Side
  • Information need-gtquery? ??(endosystem)
  • Query-gtinternal represention-gtformat for
    matching(endosystem)

4
3.1. Matching Criteria
  • Exact Match
  • Numerical or business DB
  • Range Match
  • Exact match? ??
  • Natural order(numeric or alphabetic)? ?? ??
  • ??, ??? ??
  • Approximate Match
  • Text? image DB
  • Document? query? ???? ??? ???? ??(measure)? ??
    evaluation function

5
  • Exact? Approximate match? ??
  • Ex) federal funding for energy development
    project, but the funding must be at least
    1,000,000

6
3.2 Boolean Queries
  • Boolean Query
  • Based on concepts from logic, or Boolean algebra
  • (list of) Terms joined by logical
    connectives(AND, OR, NOT)
  • Boolean Query? ?
  • restaurants AND (Midestern OR vegetarian) AND
    inexpensive
  • Expansion
  • Stemming restaurant AND (Midest OR Veget) AND
    inexpens
  • Thesaurus Midestern -gt list of specific countries

7
3.2 Boolean Queries
  • Proximity Operator
  • icing within three words of chocolate
  • if icing then chocolate
  • ?? ???? ??
  • 2 OF (A, B, C)
  • (A AND B) OR (A AND C) OR (B AND C)
  • 4 OF (peony, daisy, dahlia, lily, hosta, zinnia,
    marigold)
  • ?

8
3.2 Boolean Queries
  • Query? ??(query? ??)
  • ex) A AND B term A? ??? ?? document? ???, term
    B? ??? ?? document? ??? ???
  • ????? ??
  • ???? ? ??? ?? ??? ? ??
  • ex) information? retrieval? ?? ???? ???
  • ??? ??
  • 1. information? ???? ???? D1? ???
  • 2. retrieval? ???? ???? D2? ???
  • 3. D1? D2? ???? ???? D3? ???

9
3.2 Boolean Queries
  • ????(?? ??)
  • U ?? ??
  • D1, D2 ?? P1, P2? ???? ???? ??
  • 1. U-D1? P1? ???? ?? ?? ??????(not)
  • 2. D1nD2? P1? P2? ? ? ???? ?? ??? ????(and)
  • 3. D1?D2? P1?? P2? ???? ?? ??? ????(or)
  • 4. D1?D2-D1nD2? P1? P2? ????? ??? ?? ???? ?? ???
    ?? ??? ????(xor)

10
3.2 Boolean Queries
  • ex)
  • ??? information and retrieval or
  • not retrieval and science
  • ????
  • (doc1,doc3 n doc1,doc2,doc4) ?
  • doc1,doc2,doc3,doc4,doc5
  • (doc1, doc2, doc4 n doc2, doc3,
    doc4, doc5)
  • doc1 ? doc1, doc3, doc5 doc1, doc3,
    doc5

11
3.2 Boolean Queries
  • ??? 1 Lack of weighting mechanism
  • music by Beethoven, preferably a sonata
  • Beethoven AND Sonata ???? ?? ?? ??
  • Beethoven OR Sonata ?? ??? ?? ??
  • (Beethoven AND Sonata) OR Beethoven
  • ???? ????? Beethoven? ?? ??? ??? ??
  • ??? 2 Misstated Query (and? ??)

12
3.2 Boolean Queries
  • ??? 3 ?? ??
  • AND, OR
  • A OR A AND C
  • NOTgtANDgtOR ?? strict left-to-right order
  • ??? ????
  • ??? ?? ??(semantic) ??? ??
  • coffee AND croissant OR muffin
  • raincoat AND umbrella OR sunglasses
  • NOT
  • ?? ?? ? ??? ????
  • (NOT A) AND B AND C
  • ??? B AND C? ?? ??(B AND C? ??)

13
3.2 Boolean Queries
  • ??? 4 Highly Complex Query
  • ??? DNF, CNF?? recast
  • Disjunctive Normal Form(DNF)
  • Terms ??? ??, ?? ?? ? ???
  • Conjuncts AND? ?? ??? Terms
  • Disjuncts OR? ?? ??? Conjuncts
  • e.g. (concert AND dinner AND NOT play) OR
  • (swimming AND tennis) OR
  • (baseball AND NOT football)
  • ?? ??? ?? query?? ?? ??? ? ??? ??? ??

14
3.2 Boolean Queries
  • Full Disjunctive Normal Form
  • Each conjunct contain all of the possible terms
  • (A AND B) OR (A AND NOT C)
  • gt (A AND B AND C) OR (A AND B AND NOT C) OR
  • (A AND B AND NOT C) OR (A AND NOT B AND NOT
    C)
  • Conjunctive Normal Form(CNF)
  • e.g. (concert OR dinner OR NOT play) AND
  • (swimming OR tennis) AND
  • (baseball OR NOT football)
  • Normalization
  • Query? DNF? CNF? ??(transform)?? ?
  • Truth table? ??
  • True rows gt Full DNF

15
3.2 Boolean Queries
  • Normalization? ?
  • (A OR B) AND (C OR NOT D) AND (D OR B)

16
3.2 Boolean Queries
  • True rows of the table
  • Full DNF

17
3.2 Boolean Queries
  • Minimizing to simplest possible form
  • ?? ??? ?? ? ?? (A AND B AND C)? ?? ??
  • ?? ? ?? ??? ???? ?? ?? ???? ??
  • (A AND C AND D) OR (B AND C) OR (B AND (NOT D))
  • Full CNF
  • ???? false row??? full DNF? ???
  • DeMorgans Law
  • NOT (A AND B) (NOT A) OR (NOT B),
  • NOT (A OR B) (NOT A) AND (NOT B).
  • Law of Double Negation
  • NOT (NOT A) A

18
3.2 Boolean Queries
  • e.g. negation of query? DNF?
  • (A AND B AND NOT C) OR (NOT A AND C) OR
  • (B AND C) ??,
  • Negation? ?? ? ???? ??? ??
  • ???? (NOT A OR NOT B OR C) AND (A OR NOT C)
  • AND (NOT B
    OR NOT C)

19
3.2 Boolean Queries
  • ?? ??? ?? ???
  • ? conjunction? ??? ??? ??? ??? ???
  • A AND B? ??, ?? ??? ??? A? ???? ??? ???? ?? ???
    B? ???? ??? ???? ??
  • ???, query? ? term? ???? ??? ??? ?? ? ? ???, ??
    ???? ?? ???? ? ???? ??? ???? ?? ??? ?????

20
3.2 Boolean Queries
  • ??? 5 ??? ?? ??
  • Query? ????? ?? document? ??? ???? ??? ??? ???
    ??? ? ??
  • ???1 more restrictive query
  • ???2 ?? ???? ?? (??/??? ??)
  • ???? ??? sort?? ??? ??? ? ? ??
  • ??? ??? ??? ?? ??
  • (???? ?? sorting? ?? ??? ??? ???? ???? ???)

21
3.2 Boolean Queries
  • ??
  • ??? ????
  • ???? ???? ???? 23 ??? ??
  • ? ??? ??? ?? ?? ??? ?? ??? ???? ??? ??? ??(manual
    search??? ??)? ???? ??
  • ? ??? ??? ???? ???? ??? ??
  • ???? ???? ?????? ??

22
3.3 Vector Queries
  • Vector Model
  • Each document is represented by a vector, or
    ordered list of terms, rather than by a set of
    terms
  • Boolean model?? ??
  • Term representations (weights)
  • Methods of determining the similarity between a
    document and the query
  • Boolean model??? query? document ???? ??? ??????
    ??? ???? ???? ??

23
3.3 Vector Queries
  • Similarity Evaluation 0-1 vector, weight vector
  • Assigning weights to document terms in a vector
  • frequency count (?? a, an, the, of, )
  • user assigning
  • judging dilemma freely assigned weights
  • normalization

24
3.3 Vector Queries
  • Retrieval Determination
  • fixed number(by decreasing similarity) or
    threshold
  • Impractical
  • document?? components? ???? 0??. (vector?
    10000?? component(term, vocabulary)? ???? ???,
    ???? ? ? ? ??? term? ??? ? ??)
  • ???
  • ?? document? ??? ??? component? ??.
  • ??? ???? ???? ????.
  • dimensional compatibility - the comparison of
    two documents is always based on comparing the
    same terms in each document.
  • Expansion of the compact representation is needed.

25
Extended Boolean Queries Boolean Query Vector
Query
  • ? ??? ?? ??
  • Logical connectives, weights
  • Weighted Boolean query
  • Boolean operation Weights(0.0 1.0)
  • AW1 BW2
  • Query? ?? term A? ??? ??? ?? A, term B? ??? ???
    ?? B? ???, ?? ???, ???

26
Extended Boolean Queries Boolean Query Vector
Query
  • Distance
  • Distance between the document sets A and B
    corresponding to the term A and B
  • Minimum of the distances between a pair of
    elements
  • Element a document represented by term vector
  • If A contains m documents and B contains n
    documents, mn computations are needed

27
Extended Boolean Queries ??? ??
  • AW1 BW2(w11, w201 )
  • S weight? ?? ???? ??

28
Extended Boolean Queries ??? ?
A1,4,5,10,11,15,17,18,19,23 B1,2,4,5,7,8,10,1
3,17,23
A 0.8 OR B 0.4
w1 1
w2 1
2, 22
2, 8, 13, 22
29
Extended Boolean Queries ??? ??
  • ???
  • ??? ???? ??? ?? ?? 1.7, 1.4,
  • ?? ??? round up or down
  • ?? ??? ??? ?? ?? ??
  • A? weight? 0.6?? S? ???? 4??? ?? ??? 1??? 1??
    ????? ??
  • Random?? ???? ?? query? ?? ?? ??
  • ????? ??? query? ?? ?? ??? ?? ? ??
  • (A AND B) OR (A AND C) vs. A AND (B OR C)
  • Exercise 6

30
Fuzzy Queries
  • Ordinary Set vs. Fuzzy Set
  • Ordinary Set sharp edge (e.g. 6feet? ???
    tall)
  • Fuzzy Set membership grade
  • e.g. degree of tallness? ?? membership grade
  • 45 0.1, 58 0.45, 62 0,52, 610
    0.9
  • Boolean operator in fuzzy set S
  • query? ?? fuzzy function? ??? ?, ? document? ??
    ? ?? ??? ??.

31
Probabilistic Queries
  • Fuzzy Queries
  • membership grade function? 01??? ?? ???? ?? ???
    ??? ??? ? ??.
  • Probabilistic Query
  • the set returned from any query is supposed to
    consist of documents which satisfy that query
    with a probability higher than a specified
    threshold
  • ????
  • ?? ?????? ??? ??? ??? ???????

Prob(Document Satisfy)
Prob(Document not Satisfy) 1
32
Natural Language Queries
  • User friendly
  • Ungrammatical
  • Hard to understand for computers

33
IR and DB
  • Full Text retrieval System needs to combine,
  • Imprecise textual element
  • Precise numerical or other limit
  • ?? ?? ?? ???? DB ???? ?? ??
  • ? ???? ?? ??? ???? ???? ??? ??? ??? ?? ??
  • One Solution OODB Model
  • Object set of properties
  • textual portions numeric or fixed field
    portions
  • image components
  • Can be Commercial???
Write a Comment
User Comments (0)
About PowerShow.com