Title: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES
1SOFT COMPUTING TECHNIQUES FOR STATISTICAL
DATABASES
- Miroslav Hudec
- INFOSTAT Bratislava
- MSIS 2009
2Introduction
- Soft computing (by fuzzy logic)
- Database query (SQL - fuzzy)
- case study
- Data classification (usual - fuzzy)
- case study
- Conclusion
3Soft computing
The essential property of soft computing (SC) is
to soften hard computing (HC) techniques for
coping with the imprecision, ambiguity and
uncertainty. HC uses two-valued logic (e.g. the
element satisfies or not the criterion) Fuzzy
logic as a part of SC uses many valued logic
(e.g. the element can partly satisfy the
criterion) Computing with words is inspired by
the human capability to perform a wide variety of
tasks without exact measurements and
computations. (Flexible database query.
Interesting for statistical IS?)
4Database queries (SQL)
two-valued logic
select from Table where attribute_p gt P and
attribute_r lt R.
5SQL and fuzzy queries
many-valued logic
two-valued logic
fuzzy
big small about
logical operators and, or 1 and 1 1 0 and 1
0 one function for and and or operator
0,7 and 0,358?
(0.358)
(0.2506)
for 0,1 logic minimum and product become
ordinary and operator
6Case study
select district, roads, area from T where roads
is Big and area is Small
The length of road indicator is represented by
Big value fuzzy set with these parameters
Ld200km and Lp 300km. The Small value fuzzy
set with parameters Lp450km2 and Lg 650km2
describes the area of district attribute.
7Solution
If SQL was used, this additional valuable
information would remain hidden.
8Discussion
For the very soft gradation, the infinite number
of SQL queries has to be used. In case of fuzzy
queries, one query is sufficient.
- The advantages of this approach for users are as
follows - the connection to a database (connection string)
and data accessing (SQL command) do not have to
be modified - users do not need to learn a new query language
- the interface supports (quasi) natural language
- presenting of obtained data is in similar way as
from SQL - but with additional valuable information
- users see data behind the corner (colored areas
in table) - and can take into account possible
interested data.
9Data classification
two-valued logic
How to solve this problem without additional
calculation?
Approximate reasoning and fuzzy logic
10Data classification
many-valued logic
classify_into classCx select attributes from
tables, views
The same GLC
11Case study
In this case study municipalities are
classified according to the percentage of needs
for the winter road maintenance.
This example contains following fuzzy rules If
Road is Small and Snow is Small Then Maintenance
is Small If Road is Small and Snow is Big Then
Maintenance is Medium If Road is Big and
Snow is Small Then Maintenance is Medium If Road
is Big and Snow is Big Then Maintenance is Big.
(0.1)
(0.5)
(0.9)
12Case study
classify_into S select from Table where
roads is Small and snow is Small classify_into
M select from Table where (roads is
Small and snow is Big) or (roads is Big and snow
is Small) classify_into B select from
Table where roads is Big and snow is Big.
13Case study
If classical classification were used, this
additional valuable information would remain
hidden (Softer classification between objects
T1-T4).
If classical classification were used, this
additional valuable information would remain
hidden.
If classical classification were used, this
additional valuable information would remain
hidden.
If classical classification were used, this
additional valuable information would remain
hidden.
If classical classification were used, this
additional valuable information would remain
hidden.
14Implementation
15SQL and fuzzy approach
SQL queries are useful when a clean and exact
boundary between selected and non selected data
is required (faster and less calculations). Fuzzy
queries provide flexibility for the definition
of query and inclusion of records that almost
meet the query criterion (more operations, more
information). User decides which type of query
is better for each task.
16Conclusion
This approach allows users of statistical
information systems to use their approximate
reasoning during work with data. When users work
with usual software tools they have to change
their many-valued logical thinking (approximate
reasoning) into the two-valued computer logic.
This fuzzy approach supports work with
linguistic expressions on the client side,
nevertheless it does not need any modification of
relational databases.
17Thank you for your attention