Fuzzy Database - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Fuzzy Database

Description:

which sociologists are in considerable agreement with Kass concerning policy Y? ... Classical IR is built on Boolean algebra framework. Fuzzy Information Retrieval ... – PowerPoint PPT presentation

Number of Views:703

Avg rating:3.0/5.0

Slides: 29

Provided by: lee72

Category:

more less

Transcript and Presenter's Notes

Title: Fuzzy Database

1
Fuzzy Database Information Retrieval
2
(No Transcript)
3
Similarity relation defined for the domain opinion
Query which sociologists are in considerable
agreement with Kass concerning policy Y?

Fuzzy Relational Data Base Buckles, Petry
Elements of the tuples contained in the relations
may be subsets of the domain universal set.
A similarity relation is defined on each domain
universal set.

4
Fuzzy Data Base

(Project (select Assessment where Name Kass and
Option Y) over Opinion) giving R1

Relation R1 Option
favorable
Retrieve the opinion of Kass concerning option Y

(Project (select Expert where Field
Sociologist) over Name) giving R2

Relation R2 Name
Osborn Schreiber Cohen Specterman
Select all sociologists from the table of experts
5

(Project (select (Join R2 and Assessment over
Name) where Opinion Y) over Name, Opinion)
giving R3

Relation R3 Name Opinion Relation R3 Name Opinion
Obsorn Slightly favorable
Schreiber Favorable
Cohen Slightly negative
Specterman Highly favorable
List the opinions of the sociologists

(Join R3 and R1 over Opinion) with THRES
(Opinion) ? 0.75 and THRES (Name) ? 0

Name Opinion Name Opinion
Obsorn, Schreiber Specterman Slightly favorable, favorable, highly favorable
6
Information Retrieval

IR process can be viewed as a knowledge
communication process, which involves learning
and problem solving strategies.
IR system controls the knowledge flow between
documents and the user.
An IR system is a computer system that allows
users to retrieve information from a document
connection stored in a data base.
Classical IR is built on Boolean algebra
framework.

7
Fuzzy Information Retrieval

Fuzzy relation for the grade of relevance between
index terms and documents
R X x Y ? 0,1
Determined subjectively or objectively
Number of occurrences publication dates,
document types
Fuzzy thesaurus
T X x X ? 0,1

8
Fuzzy Information Retrieval

Information retrieval model
Advantages
R and T are more expressive and their
construction is more realistic
Fuzzy inquiry provides greater flexibility

9
Information retrieval based on fuzzy associations

Introduction
Three components in information retrieval
Fuzziness in a thesaurus first component
Fuzziness in retrieval second component
Fuzziness on output third component
Classification of output
Conclusion

10
(No Transcript)
11
(No Transcript)
12

Three components in information retrieval
D d1,d2,,dn be a finite set of
documents for retrieval
W w1,w2,,wm denote a set of descriptors
T D -gt0,1w.T(d) a subset of
descriptors in W indexed to the document d.
U(U T-1).U(w) documents have keyword w.

Information retrieval based on fuzzy associations
r
F
U
P
r
q
13

Fuzziness in a thesaurus first component
Three type thesaurus (represented as binary
relation)
RT related terms
NT narrower terms
BT broader terms
B(v,w) N(w,v) R(v,w) R(w,v)
Method of automatic generation of thesauri
Typicalcounting frequencies of simultaneous
occurrences of pairs of keywords in a set of
documents.
Fuzzy set model
C c1,c2,,cp be a finite set of concepts
where each ci, i1,p represents a unit of
concept
HW -gt0,1p a fuzzy set valued function which
maps each keyword to its corresponding concepts
as a fuzzy set in C.

is concept of the word w.
14
(No Transcript)
15
Even by present computers, its difficult to
calculate values of the fuzzy relation above
using array in straightforward way, since the
numbers of elements in W and D are very large(103
x105). Although techniques to handle sparse
matrices may be applied, there is another method
for generation R and N based on manipulation of
sequential files. The principle tool for this is
sorting. (a,b,c) means a record in which field
are a,b and c.(a,b,c) means a set of records
such as (a,b,c). Input a set D of documents,
Each document d ? D has a number of keywords in
W.A keyword may occur twice or more in a
document. The frequency of occurrence of wi in dk
is denoted by hik. Output a set of records
(wi,wj,R(wi,wj) for all pairs R(wi,wj)ltgt0
16
Algorithm GFT (generation of a fuzzy
thesaurus). // Find pairs of keywords in every
document.// For all dk?D do find all keywords
wi?W and calculate hjk for all (wi,wj),wiltwj,
that are found in dk do make record (wi,wj,
min(hik,hjk)) output (wi,wj,min(hik,hjk) to
WORK1 repeat for all wi that are found in dk
do make record (wi, hjk) output (wi,hjk) to
WORK2 repeat repeat //sort WORK1 and
WORK2.// sort WORK1 into increasing order of the
key (wi,wj) sort WORK2 into increasing order of
the key wi
17
//Calculate R.Scan WORK1 and WORK2.// for all
(wi,wj) in WORK1 do find all record for (wi,wj)
in WORK1 and all records for wi, and wj in
WORK2 R (wi,wj)??k min(hik,hjk)/(?k hik ?k hjk-
?kmin(hik,hjk)) output (wi,wj,R(wi,wj)) to an
output file repeat end-of algorithm GFT In a
foregoing paper an experimental calculation on
three thousand documents and thirty thousand
keywords was carried out using GFT based on
sorting shows a reasonable amount of 800 sec of
CPU time.
18
//record (di,pi)// //before another record
(di,pi) satisfies either diltdj or// //di dj, pi
gt pj// Take the first record (d1,p1) in
work (D,P)lt-(d1,p1) for all dj in WORK do //the
djs are sequentially examined.// if D ltgt dj
then output (D,P) to to an output file
OUT (D,P)lt-(di,pj) endif repeat output(D,P) to
OUT //OUT contains exactly those records that
represent PUf(d,w) define by above//
19
//Third step if necessary sort again.// sort OUT
into the decreasing order of the key p and print
OUT
20

Fuzziness in retrieval second component
For the crisp case a retrieval through a
thesaurus given a keyword w is as follows.
Examine the thesaurus F and find all associated
terms v11,v12,,v1p.
Find subsets U(v11),U(v12),,U(v1p).
Establish the retrieved set of documents as the
union of U(v11),U(v12),,U(v1p) ?1?i?p U(v1i)
Uf(d,w) 1 iff d?U(v1) for some v1 such that
F(v1,w) 1,
0 otherwise.

21
When the thesaurus F is fuzzy and U is
crisp Uf(d,w) max v?W min U(d,v),F(v,w).
This equation is valid also for a fuzzy relation
U(d,v). Algorithm FR(Fuzzy Retrieval). //First
step Find all records.// for all v such that
F(v,w) ltgt 0 in FT do for all d?U(v)
do p(d,v)lt-minU(d,v),F(v,w) output record
(d,p(d,v)) to a work file WORK
repeat repeat //second step Find values of
Uf.// sort WORK into increasing order of the
first key d and into decreasing order of the
second key p //the above sorting means that in
the resulting sequence,a//
22

End-of FR
Fuzziness on output third component
Fuzzy filter. EX
(a) Find recent documents that have keyword w.
(b) Find documents that have keywords w and are
relevant to ones field of interest
r r n g
Classification of output
Decreasing of membership
Divide into layers
Conclusion
Problem for further studies
Discussion of crisp techniques of advanced
indexing and retrieval using a fuzzy set model,

Studies of efficient algorithms for large scale
database. In particular, development of hardware
for information retrieval should be taken into
account.
Application of methods in fuzzy information
retrieval to related areas.

24
????????????