Title: Fuzzy Database
1Fuzzy Database Information Retrieval
2(No Transcript)
3Similarity relation defined for the domain opinion
Query which sociologists are in considerable
agreement with Kass concerning policy Y?
- Fuzzy Relational Data Base Buckles, Petry
- Elements of the tuples contained in the relations
may be subsets of the domain universal set. - A similarity relation is defined on each domain
universal set.
4Fuzzy Data Base
- (Project (select Assessment where Name Kass and
Option Y) over Opinion) giving R1
Relation R1 Option
favorable
Retrieve the opinion of Kass concerning option Y
- (Project (select Expert where Field
Sociologist) over Name) giving R2
Relation R2 Name
Osborn Schreiber Cohen Specterman
Select all sociologists from the table of experts
5- (Project (select (Join R2 and Assessment over
Name) where Opinion Y) over Name, Opinion)
giving R3
Relation R3 Name Opinion Relation R3 Name Opinion
Obsorn Slightly favorable
Schreiber Favorable
Cohen Slightly negative
Specterman Highly favorable
List the opinions of the sociologists
- (Join R3 and R1 over Opinion) with THRES
(Opinion) ? 0.75 and THRES (Name) ? 0
Name Opinion Name Opinion
Obsorn, Schreiber Specterman Slightly favorable, favorable, highly favorable
6Information Retrieval
- IR process can be viewed as a knowledge
communication process, which involves learning
and problem solving strategies. - IR system controls the knowledge flow between
documents and the user. - An IR system is a computer system that allows
users to retrieve information from a document
connection stored in a data base. - Classical IR is built on Boolean algebra
framework.
7Fuzzy Information Retrieval
- Fuzzy relation for the grade of relevance between
index terms and documents - R X x Y ? 0,1
- Determined subjectively or objectively
- Number of occurrences publication dates,
document types - Fuzzy thesaurus
- T X x X ? 0,1
8Fuzzy Information Retrieval
- Information retrieval model
- Advantages
- R and T are more expressive and their
construction is more realistic - Fuzzy inquiry provides greater flexibility
9Information retrieval based on fuzzy associations
- Introduction
- Three components in information retrieval
- Fuzziness in a thesaurus first component
- Fuzziness in retrieval second component
- Fuzziness on output third component
- Classification of output
- Conclusion
10(No Transcript)
11(No Transcript)
12- Three components in information retrieval
- D d1,d2,,dn be a finite set of
documents for retrieval - W w1,w2,,wm denote a set of descriptors
- T D -gt0,1w.T(d) a subset of
descriptors in W indexed to the document d. - U(U T-1).U(w) documents have keyword w.
Information retrieval based on fuzzy associations
r
F
U
P
r
q
13- Fuzziness in a thesaurus first component
- Three type thesaurus (represented as binary
relation) - RT related terms
- NT narrower terms
- BT broader terms
- B(v,w) N(w,v) R(v,w) R(w,v)
- Method of automatic generation of thesauri
- Typicalcounting frequencies of simultaneous
occurrences of pairs of keywords in a set of
documents. - Fuzzy set model
- C c1,c2,,cp be a finite set of concepts
where each ci, i1,p represents a unit of
concept - HW -gt0,1p a fuzzy set valued function which
maps each keyword to its corresponding concepts
as a fuzzy set in C.
is concept of the word w.
14(No Transcript)
15 Even by present computers, its difficult to
calculate values of the fuzzy relation above
using array in straightforward way, since the
numbers of elements in W and D are very large(103
x105). Although techniques to handle sparse
matrices may be applied, there is another method
for generation R and N based on manipulation of
sequential files. The principle tool for this is
sorting. (a,b,c) means a record in which field
are a,b and c.(a,b,c) means a set of records
such as (a,b,c). Input a set D of documents,
Each document d ? D has a number of keywords in
W.A keyword may occur twice or more in a
document. The frequency of occurrence of wi in dk
is denoted by hik. Output a set of records
(wi,wj,R(wi,wj) for all pairs R(wi,wj)ltgt0
16Algorithm GFT (generation of a fuzzy
thesaurus). // Find pairs of keywords in every
document.// For all dk?D do find all keywords
wi?W and calculate hjk for all (wi,wj),wiltwj,
that are found in dk do make record (wi,wj,
min(hik,hjk)) output (wi,wj,min(hik,hjk) to
WORK1 repeat for all wi that are found in dk
do make record (wi, hjk) output (wi,hjk) to
WORK2 repeat repeat //sort WORK1 and
WORK2.// sort WORK1 into increasing order of the
key (wi,wj) sort WORK2 into increasing order of
the key wi
17//Calculate R.Scan WORK1 and WORK2.// for all
(wi,wj) in WORK1 do find all record for (wi,wj)
in WORK1 and all records for wi, and wj in
WORK2 R (wi,wj)??k min(hik,hjk)/(?k hik ?k hjk-
?kmin(hik,hjk)) output (wi,wj,R(wi,wj)) to an
output file repeat end-of algorithm GFT In a
foregoing paper an experimental calculation on
three thousand documents and thirty thousand
keywords was carried out using GFT based on
sorting shows a reasonable amount of 800 sec of
CPU time.
18//record (di,pi)// //before another record
(di,pi) satisfies either diltdj or// //di dj, pi
gt pj// Take the first record (d1,p1) in
work (D,P)lt-(d1,p1) for all dj in WORK do //the
djs are sequentially examined.// if D ltgt dj
then output (D,P) to to an output file
OUT (D,P)lt-(di,pj) endif repeat output(D,P) to
OUT //OUT contains exactly those records that
represent PUf(d,w) define by above//
19//Third step if necessary sort again.// sort OUT
into the decreasing order of the key p and print
OUT
20- Fuzziness in retrieval second component
- For the crisp case a retrieval through a
thesaurus given a keyword w is as follows. - Examine the thesaurus F and find all associated
terms v11,v12,,v1p. - Find subsets U(v11),U(v12),,U(v1p).
- Establish the retrieved set of documents as the
union of U(v11),U(v12),,U(v1p) ?1?i?p U(v1i) - Uf(d,w) 1 iff d?U(v1) for some v1 such that
- F(v1,w) 1,
- 0 otherwise.
21When the thesaurus F is fuzzy and U is
crisp Uf(d,w) max v?W min U(d,v),F(v,w).
This equation is valid also for a fuzzy relation
U(d,v). Algorithm FR(Fuzzy Retrieval). //First
step Find all records.// for all v such that
F(v,w) ltgt 0 in FT do for all d?U(v)
do p(d,v)lt-minU(d,v),F(v,w) output record
(d,p(d,v)) to a work file WORK
repeat repeat //second step Find values of
Uf.// sort WORK into increasing order of the
first key d and into decreasing order of the
second key p //the above sorting means that in
the resulting sequence,a//
22- End-of FR
- Fuzziness on output third component
- Fuzzy filter. EX
- (a) Find recent documents that have keyword w.
- (b) Find documents that have keywords w and are
- relevant to ones field of interest
- r r n g
- Classification of output
- Decreasing of membership
- Divide into layers
- Conclusion
- Problem for further studies
- Discussion of crisp techniques of advanced
indexing and retrieval using a fuzzy set model,
23- Studies of efficient algorithms for large scale
database. In particular, development of hardware
for information retrieval should be taken into
account. - Application of methods in fuzzy information
retrieval to related areas.
24????????????
- ?????
- ?????????
- ??????????????
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)