Title: Query Structures
1Query Structures
2- Boolean Queries
- Vector Queries
- Extended Boolean Queries
- Fuzzy Queries
- Probabilistic Queries
- c.f. Natural Language Queries, DB Queries
3Query Structures
- Query? ?? (Document?? ??)
- ??, ??(syntax) ???? ??
- ??? ??? ? ??
- Parallel Process for Matching
- Document Side
- Data? ??-gtdocument? ??(ectosystem)
- Document-gtinternal represention-gtformat for
matching(endosystem) - Query Side
- Information need-gtquery? ??(endosystem)
- Query-gtinternal represention-gtformat for
matching(endosystem)
43.1. Matching Criteria
- Exact Match
- Numerical or business DB
- Range Match
- Exact match? ??
- Natural order(numeric or alphabetic)? ?? ??
- ??, ??? ??
- Approximate Match
- Text? image DB
- Document? query? ???? ??? ???? ??(measure)? ??
evaluation function
5- Exact? Approximate match? ??
- Ex) federal funding for energy development
project, but the funding must be at least
1,000,000
63.2 Boolean Queries
- Boolean Query
- Based on concepts from logic, or Boolean algebra
- (list of) Terms joined by logical
connectives(AND, OR, NOT) - Boolean Query? ?
- restaurants AND (Midestern OR vegetarian) AND
inexpensive - Expansion
- Stemming restaurant AND (Midest OR Veget) AND
inexpens - Thesaurus Midestern -gt list of specific countries
73.2 Boolean Queries
- Proximity Operator
- icing within three words of chocolate
- if icing then chocolate
- ?? ???? ??
- 2 OF (A, B, C)
- (A AND B) OR (A AND C) OR (B AND C)
- 4 OF (peony, daisy, dahlia, lily, hosta, zinnia,
marigold) - ?
83.2 Boolean Queries
- Query? ??(query? ??)
- ex) A AND B term A? ??? ?? document? ???, term
B? ??? ?? document? ??? ??? - ????? ??
- ???? ? ??? ?? ??? ? ??
- ex) information? retrieval? ?? ???? ???
- ??? ??
- 1. information? ???? ???? D1? ???
- 2. retrieval? ???? ???? D2? ???
- 3. D1? D2? ???? ???? D3? ???
93.2 Boolean Queries
- ????(?? ??)
- U ?? ??
- D1, D2 ?? P1, P2? ???? ???? ??
- 1. U-D1? P1? ???? ?? ?? ??????(not)
- 2. D1nD2? P1? P2? ? ? ???? ?? ??? ????(and)
- 3. D1?D2? P1?? P2? ???? ?? ??? ????(or)
- 4. D1?D2-D1nD2? P1? P2? ????? ??? ?? ???? ?? ???
?? ??? ????(xor)
103.2 Boolean Queries
- ex)
- ??? information and retrieval or
- not retrieval and science
- ????
- (doc1,doc3 n doc1,doc2,doc4) ?
- doc1,doc2,doc3,doc4,doc5
- (doc1, doc2, doc4 n doc2, doc3,
doc4, doc5) - doc1 ? doc1, doc3, doc5 doc1, doc3,
doc5
113.2 Boolean Queries
- ??? 1 Lack of weighting mechanism
- music by Beethoven, preferably a sonata
- Beethoven AND Sonata ???? ?? ?? ??
- Beethoven OR Sonata ?? ??? ?? ??
- (Beethoven AND Sonata) OR Beethoven
- ???? ????? Beethoven? ?? ??? ??? ??
- ??? 2 Misstated Query (and? ??)
123.2 Boolean Queries
- ??? 3 ?? ??
- AND, OR
- A OR A AND C
- NOTgtANDgtOR ?? strict left-to-right order
- ??? ????
- ??? ?? ??(semantic) ??? ??
- coffee AND croissant OR muffin
- raincoat AND umbrella OR sunglasses
- NOT
- ?? ?? ? ??? ????
- (NOT A) AND B AND C
- ??? B AND C? ?? ??(B AND C? ??)
133.2 Boolean Queries
- ??? 4 Highly Complex Query
- ??? DNF, CNF?? recast
- Disjunctive Normal Form(DNF)
- Terms ??? ??, ?? ?? ? ???
- Conjuncts AND? ?? ??? Terms
- Disjuncts OR? ?? ??? Conjuncts
- e.g. (concert AND dinner AND NOT play) OR
- (swimming AND tennis) OR
- (baseball AND NOT football)
- ?? ??? ?? query?? ?? ??? ? ??? ??? ??
143.2 Boolean Queries
- Full Disjunctive Normal Form
- Each conjunct contain all of the possible terms
- (A AND B) OR (A AND NOT C)
- gt (A AND B AND C) OR (A AND B AND NOT C) OR
- (A AND B AND NOT C) OR (A AND NOT B AND NOT
C) - Conjunctive Normal Form(CNF)
- e.g. (concert OR dinner OR NOT play) AND
- (swimming OR tennis) AND
- (baseball OR NOT football)
- Normalization
- Query? DNF? CNF? ??(transform)?? ?
- Truth table? ??
- True rows gt Full DNF
153.2 Boolean Queries
- Normalization? ?
- (A OR B) AND (C OR NOT D) AND (D OR B)
163.2 Boolean Queries
- True rows of the table
- Full DNF
173.2 Boolean Queries
- Minimizing to simplest possible form
- ?? ??? ?? ? ?? (A AND B AND C)? ?? ??
- ?? ? ?? ??? ???? ?? ?? ???? ??
- (A AND C AND D) OR (B AND C) OR (B AND (NOT D))
- Full CNF
- ???? false row??? full DNF? ???
- DeMorgans Law
- NOT (A AND B) (NOT A) OR (NOT B),
- NOT (A OR B) (NOT A) AND (NOT B).
- Law of Double Negation
- NOT (NOT A) A
183.2 Boolean Queries
- e.g. negation of query? DNF?
- (A AND B AND NOT C) OR (NOT A AND C) OR
- (B AND C) ??,
- Negation? ?? ? ???? ??? ??
- ???? (NOT A OR NOT B OR C) AND (A OR NOT C)
- AND (NOT B
OR NOT C)
193.2 Boolean Queries
- ?? ??? ?? ???
- ? conjunction? ??? ??? ??? ??? ???
- A AND B? ??, ?? ??? ??? A? ???? ??? ???? ?? ???
B? ???? ??? ???? ?? - ???, query? ? term? ???? ??? ??? ?? ? ? ???, ??
???? ?? ???? ? ???? ??? ???? ?? ??? ?????
203.2 Boolean Queries
- ??? 5 ??? ?? ??
- Query? ????? ?? document? ??? ???? ??? ??? ???
??? ? ?? - ???1 more restrictive query
- ???2 ?? ???? ?? (??/??? ??)
- ???? ??? sort?? ??? ??? ? ? ??
- ??? ??? ??? ?? ??
- (???? ?? sorting? ?? ??? ??? ???? ???? ???)
213.2 Boolean Queries
- ??
- ??? ????
- ???? ???? ???? 23 ??? ??
- ? ??? ??? ?? ?? ??? ?? ??? ???? ??? ??? ??(manual
search??? ??)? ???? ?? - ? ??? ??? ???? ???? ??? ??
- ???? ???? ?????? ??
223.3 Vector Queries
- Vector Model
- Each document is represented by a vector, or
ordered list of terms, rather than by a set of
terms - Boolean model?? ??
- Term representations (weights)
- Methods of determining the similarity between a
document and the query - Boolean model??? query? document ???? ??? ??????
??? ???? ???? ??
233.3 Vector Queries
- Similarity Evaluation 0-1 vector, weight vector
- Assigning weights to document terms in a vector
- frequency count (?? a, an, the, of, )
- user assigning
- judging dilemma freely assigned weights
- normalization
243.3 Vector Queries
- Retrieval Determination
- fixed number(by decreasing similarity) or
threshold - Impractical
- document?? components? ???? 0??. (vector?
10000?? component(term, vocabulary)? ???? ???,
???? ? ? ? ??? term? ??? ? ??) - ???
- ?? document? ??? ??? component? ??.
- ??? ???? ???? ????.
- dimensional compatibility - the comparison of
two documents is always based on comparing the
same terms in each document. - Expansion of the compact representation is needed.
25Extended Boolean Queries Boolean Query Vector
Query
- ? ??? ?? ??
- Logical connectives, weights
- Weighted Boolean query
- Boolean operation Weights(0.0 1.0)
- AW1 BW2
- Query? ?? term A? ??? ??? ?? A, term B? ??? ???
?? B? ???, ?? ???, ???
26Extended Boolean Queries Boolean Query Vector
Query
- Distance
- Distance between the document sets A and B
corresponding to the term A and B - Minimum of the distances between a pair of
elements - Element a document represented by term vector
- If A contains m documents and B contains n
documents, mn computations are needed
27Extended Boolean Queries ??? ??
- AW1 BW2(w11, w201 )
- S weight? ?? ???? ??
28Extended Boolean Queries ??? ?
A1,4,5,10,11,15,17,18,19,23 B1,2,4,5,7,8,10,1
3,17,23
A 0.8 OR B 0.4
w1 1
w2 1
2, 22
2, 8, 13, 22
29Extended Boolean Queries ??? ??
- ???
- ??? ???? ??? ?? ?? 1.7, 1.4,
- ?? ??? round up or down
- ?? ??? ??? ?? ?? ??
- A? weight? 0.6?? S? ???? 4??? ?? ??? 1??? 1??
????? ?? - Random?? ???? ?? query? ?? ?? ??
- ????? ??? query? ?? ?? ??? ?? ? ??
- (A AND B) OR (A AND C) vs. A AND (B OR C)
- Exercise 6
30Fuzzy Queries
- Ordinary Set vs. Fuzzy Set
- Ordinary Set sharp edge (e.g. 6feet? ???
tall) - Fuzzy Set membership grade
- e.g. degree of tallness? ?? membership grade
- 45 0.1, 58 0.45, 62 0,52, 610
0.9 - Boolean operator in fuzzy set S
- query? ?? fuzzy function? ??? ?, ? document? ??
? ?? ??? ??.
31Probabilistic Queries
- Fuzzy Queries
- membership grade function? 01??? ?? ???? ?? ???
??? ??? ? ??. - Probabilistic Query
- the set returned from any query is supposed to
consist of documents which satisfy that query
with a probability higher than a specified
threshold - ????
- ?? ?????? ??? ??? ??? ???????
Prob(Document Satisfy)
Prob(Document not Satisfy) 1
32Natural Language Queries
- User friendly
- Ungrammatical
- Hard to understand for computers
33IR and DB
- Full Text retrieval System needs to combine,
- Imprecise textual element
- Precise numerical or other limit
- ?? ?? ?? ???? DB ???? ?? ??
- ? ???? ?? ??? ???? ???? ??? ??? ??? ?? ??
- One Solution OODB Model
- Object set of properties
- textual portions numeric or fixed field
portions - image components
- Can be Commercial???