Title: Extending databases to precision-controlled retrieval of qualitative information
1Extending databases to precision-controlled
retrieval of qualitative information
- Victor Polo de Gyves,1 Adolfo Guzman1,2, and
Serguei Levachkine2 - (1) SoftwarePro International,
- (2) Centro de Investigación en Computación,
- a.guzman_at_acm.org
2Qualitative variable
- A variable able to take a symbolic value
- A symbolic value is a set if can be considered E
its name or description - Palewhite, yellow, orange, beige
3Hierarchy
- For an element set E, a hierarchy H is another
set of elements where any element ei is a
symbolic value representing - An element of E or
- A Partition. Partition K is a partition of set S
if it is both a covering for S and an exclusive
set. The members of K are mutually exclusive and
collectively exhaust S. Each element of S is in
exactly one Kj. - And the union of every elements by represented by
ei is E. Example
4Confusion
- If I ask for an animal and a snake is given, Is
there a mistake? - I ask for a plant and an animal is given, which
is the error value? can this error be measured? - Measuring the error in using a qualitative value
r instead of another qualitative value s is
defined as follows1 - conf (r, r) conf (r, parent of r) 0
- conf (r, s) 1 conf (r, parent of s))
- The confusion in using r instead of s (the
desired value) is the number of descending links
from r ? s. - Conf(r,s) is not a distance, nor ultradistance
and it is not symmetric.
1 A. Guzman-Arenas and S. Levachkine. Hierarchies
Measuring Qualitative Variables. Lecture Notes
in Computer Science LNCS 2945 (Computational
Linguistics and Intelligent Text Processing),
(Springer-Verlag 2004). 262-274. ISSN 0372-9743.
5live_being
animal
plant
mammal
bird
snake
citric
pine
cat
dog
lemon
orange
conf (cat, mammal) 0 (If I am using a cat
instead of a mammal) conf (cat, animal) 0
conf (mammal, cat) 1 conf (cat,
dog) 1 conf (cat, bird) 1 conf (cat, lemon)
3
6Predicates with controlled confusion
- P is true? for x if1
- The object x satisfies predicate P with confusion
? if and only if - P is true for x when P does not contains
hierarchical variables - When pr is a hierarchical variable and P is of
the form (pr c), if and only if for the value v
of the property pr for the object x, v? c (if
the value v can be used instead of c with
confusion ?) - When P P1? P2, if and only if P1 is satisfied?
by x, or P2 is satisfied? by x. - When P P1 ? P2, if and only if P1 is satisfied?
by x and P2 is satisfied? by x. - When P P1, if and only if P1 is not satisfied?
by x.
1A. Guzman-Arenas and S. Levachkine. Graduated
errors in approximate queries using hierarchies
and ordered sets. Lecture Notes in Artificial
Intelligence LNAI 2972, (Springer-Verlag 2004).
139-148. ISSN 0302-9743
7(Ann (lives_in USA) (pet snake)), (Bill (lives_in
English_Speaking_Island) (pet citric)), (Fred
(lives_in USA) (pet cat)), (Tom (lives_in
Mexico)(pet cat)), (Sam (lives_in Cuba) (pet
pine)), (Pedro (lives_in Mexico)(pet dog)).
Predicate P (lives_inUSA) ?(petcat) P is
true? for E0 Fred E1 Fred, Tom , Pedro E2
Fred, Tom, Pedro, Ann E3 Fred, Tom, Pedro, Ann,
Sam, Bill
H1
North_America
Caribbean_Islands
Central_America
live_being
Canada
USA
Mexico
Guatemala
Costa_Rica
animal
plant
Honduras
English_speaking_islands
Spanish_speaking_islands
mammal
bird
snake
citric
pine
Cuba
Puerto Rico
Jamaica
cat
dog
lemon
orange
8Implementing conf(r, s) for databases
- Looking for the author of an assasination
- Witness 1 Witness 2 Witness 3
- - Red car Elegant car - Small car
- - Tall normal height
- - Scar in nose - Scar in the face
- Confusion can sort a database with suspects
first, the most probably ones and at the end the
least probably authors of the murdering. - An enterprise manager wishes to make a special
discount to their customers interested in office
products. - Given a document d on a digital library, sort
any documents in function of the subject
similarity with d. The most similar documents
will be closer. - Confusion delimitates objects from a database in
agreement with the closeness or similarity of
their properties.
9Example 1. (address california)1
confusion_at_/tesis/jdbc/jdbc/jars java
-classpath CLASSPATHpostgresql.jar Conexion
"select customers.name, customers.address,
conf(customers.address)from customers where
conf(customers.address,'california')lt1 order by
conf(customers.address)" ------------- SQL
traducido select customers.name,
customers.address, confusion.customers_address_nor
m."california"from customers_address_norm,
customers where ( confusion.customers_address_norm
.nombre customers.address AND
confusion.customers_address_norm."california"lt1
) order by confusion.customers_address_norm."calif
ornia" ------------- NAME ADDRESS
CALIFORNIA Tom's Hamburgers pasadena
0 Microsol silicon valley 0 East coast
meat florida 1 Media Tools new york
1 Texas fruits texas
1 confusion_at_/tesis/jdbc/jdbc/jars
Extended SQL (user)
Pure SQL (machine)
Result
10Example 2. Update (industrial branch, food)0
confusion_at_/tesis/jdbc/jdbc/jars java
-classpath CLASSPATHpostgresql.jar Conexion
"update customers set discount0.07 where
customers.name in conf(customers.industrial_branch
,'food')lt0" ------------- SQL traducido
update customers set discount0.07 where
customers.name in (select customers.name from
customers_industrial_branch_norm, customers where
( confusion.customers_industrial_branch_norm.nombr
e customers.industrial_branch AND
confusion.customers_industrial_branch_norm."food"lt
0 ) ) confusion_at_/tesis/jdbc/jdbc/jarspsql
conf conf select from customers
name industrial_branch address
discount ---------------------------------------
------------------------ Media Tools
computers new york 0
Garcia Productores tequila mexico
city 0 Microsol software
silicon valley 0 Tom's
Hamburgers food pasadena
0.07 East coast meat meat
florida 0.07 Luigi's italian food
italian food north america 0.07 Mole
Doña Rosa mexican food mexico
0.07 Texas fruits fruits
texas 0.07 Canada seeds
food canada 0.07
Extended SQL (user)
Pure SQL (machine)
Result
11Example 3 Options for a bachelor application. (
(profesion, informatica)2 ?(sistema, abierto
semestral)2 )3
confusion_at_dlguzman1 jars java -classpath
CLASSPATHpostgresql.jar Conexion "select
profesiones., conf(profesiones.profesion),conf(pr
ofesiones.sistema)from profesiones where
conf(profesiones.profesion,'informatica')lt2 AND
conf(profesiones.sistema,'abierto semestral')lt2
and conf(profesiones.sistema)conf(profesiones.pro
fesion)lt3 order by conf(profesiones.profesion)co
nf(profesiones.sistema)" ------------- SQL
traducido select profesiones.,
confusion.profesiones_profesion_norm."informatica"
,confusion.profesiones_sistema_norm."abierto
semestral"from profesiones_sistema_norm,
profesiones_profesion_norm, profesiones where (
confusion.profesiones_profesion_norm.nombre
profesiones.profesion AND confusion.profesiones_pr
ofesion_norm."informatica"lt2 ) AND (
confusion.profesiones_sistema_norm.nombre
profesiones.sistema AND confusion.profesiones_sist
ema_norm."abierto semestral"lt2 ) and
confusion.profesiones_sistema_norm."abierto
semestral"confusion.profesiones_profesion_norm."i
nformatica"lt3 order by confusion.profesiones_prof
esion_norm."informatica"confusion.profesiones_sis
tema_norm."abierto semestral"
Extended SQL
Pure SQL
12ESCUELA PROFESION DISPONIBLE SISTEMA
INFORMATICA ABIERTO SEMESTRAL uam
informatica 5 abierto trimestral 0
1 unam informatica 30
escolarizado anual 0 2 ipn
informatica 20 escolarizado semestral 0
2 monterrey informatica 20
escolarizado semestral 0 2 ipn
computacion 30 escolarizado anual 1
2 unam computacion 30
escolarizado semestral 1 2 monterrey
computacion 10 escolarizado semestral 1
2 uam quimica 5 abierto
trimestral 2 1 Finalizando... Fin. co
nfusion_at_dlguzman1 jars
Result
13The method to create confusion tables
A
c) Calculate and create the confusion table
associated to each attribute of the entity
A alimento, nutritivo, basura, fruta,
carne AxA A2 a,a,a,n,a,b,a,f,a,c
, n,a,n,n,n,b,n,f,n,c,
b,a,b,n,b,b,b,f,b,c,
f,a,f,n,f,b,f,f,f,c,
c,a,c,n,c,b,c,f,c,c
conf(A2) conf(a,a), conf(a,n),
conf(a,b), ..., conf(c,c)
conf(A2) 0, 1, 1, 2, 2, 0, 0, 1, 1, 1, 0,
1, 0, 2, 2, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0,
conf select from mascota_alimento_norm
nombre alimentonutritivobasurafrutaverdura -
------------------------------------------- a
limento 0 1 1 2
2 nutritivo 0 0 1 1
1 basura 0 1 0 2
2 fruta 0 0 1 0
1 verdura 0 0 1 1 0
14Steps to write predicates
- Step 1 Create predicate
- Step 2 Write the XSQL sentence
- Step 3 Execute the program
15Step 1. Writing the predicate
- I whish to find every baseball players living in
Xochimilco with confusion 1. - The predicate is
- (sport baseball) ? (address xochimilco)1
16Step 2. Writing XSQL
- The user looks the predicate and writes it in
XSQL - (sport baseball) ? (address xochimilco)1
- It is rewritten by the user as
- conf(sport, baseball)lt1 AND conf(address,
xochimilco)lt1
17Step 3. Converting to Pure SQL
- A program converts the expression in XSQL to pure
SQL, able to be executed in any database. - (sport baseball) ? (address xochimilco)1
- conf(sport, baseball)lt1 AND conf (address,
xochimilco)lt1 - The program converts it to
- (confusion.friends_sport_norm.nombre
friends.sport AND confusion.friends_sport_norm.ba
seball"lt1) - AND
- (confusion.friends_address_norm.nombre
friends.address AND confusion.friends_address_norm
."xochimilco"lt1)
18Confusion tables
- The previous expression refers to the tables
confusion.friends_address_norm - and confusion.friends_sport_norm
- Our program creates the tables just the first
time - Using the hierarchical tree and the method to
create confusion tables - Using these tables the query is more efficient
because there is no need to recalculate the
confusion values
19Conclusions
- Using hierarchies our program extends any
database to make it able to recover objects with
controlled precision - Allows to delete and update objects with
controlled precision - The controlled precision organizes the objects in
a smart way