Title: Quality Driven Information Searching: A Tale of Two Projects
1Quality Driven Information Searching A Tale of
Two Projects
Mikhaila Burgess, Department of Computer
Science, Cardiff University. Supervisors
Prof. W.A.Gray Prof. N.J.Fiddian
CONOISE Tuesday 8th April 2003
2Presentation Outline
- Overview of our work
- Quality Framework
- Experimental domain
- Dr Felix Naumann
- Quality driven query answering
- The HiQIQ MetaSearch Engine project
- Comparing our two projects
- References
3Information Searching
- Knowledge of database query language
- Specificity of database queries
- Preference SQL still needs language knowledge
- Information overload in distributed information
systems - Different levels of quality
4PhD Hypothesis
- It is possible to develop a fuller, flexible
quality framework, which can be used to
dynamically create user-defined quality filters
of varying granularity, incorporating multiple
quality metrics, which can then be used to focus
information searches onto relevant search
domains.
5Quality Framework
- Three main categories Cost, Utility, Time
- 60 metrics
- Stored with related information
- Too small to see clearer version at end of
handout
6Domain-Specific Quality Frameworks
- Using generic model as basis
- Contains
- relevant generic quality metrics
- domain-specific quality metrics
- mapping information - between available data and
quality metrics
7University Framework Sample
Suitability sub-category of Utility
8Selecting Desired Metrics
9Current Experimental Domain
- University Data
- Teaching
- Research
- Funding
- Etc
- Data from Times and Guardian
- Find best institution based on weighted qualities
10Quality-Driven Query Answering for Integrated
Information Systems
- PhD Thesis of Dr Felix Naumann
- satisfyingly answering user queries against a
given global schema of an integrated information
system.
11Problem Addressed in Thesis
- Given a set of data sources, plus a user demand
- Identify best sources to answer the query
- Efficiently effectively combine those sources
to a best plan - Integrate plan results in best possible way
12Information Quality
- an aggregated value of multiple IQ-criteria
- Set of IQ criteria assimilation of previous
research in this area - R. Wang and D. Strong (TDQM)
- T. Redman (information management)
- Chen et al (web query processing)
- plus others
13IQ Criteria
- Content-related properties relating to actual
data retrieved - Accuracy, Completeness, Customer Support,
Documentation, - Interpretability, Relevancy, Value Added
- Technical measure aspects determined by soft-
hardware of source, network, user, etc - Availability, Latency, Price, Response Time,
Security, Timeliness - Intellectual subjective aspects of data source
- Believability, Objectivity, Reputation
- Instantiation-related concerning presentation of
current instance of data - Amount of Data, Representational Conciseness,
Representational Consistency, - Understandability, Verifiability
14IQ-Criteria Selection
- Depends on four factors
- Application Domain
- Users
- Provider of Integrated System
- Assessment of Criteria - assigning numerical
values (IQ-scores) to IQ-criteria
15Information Quality Assessment
- Where scores are determined by user experience,
personal views, etc
Determined by process of querying, so scores are
not fixed they vary between queries
Analysing the data itself
16IQ Assessment Methods
17Information Quality Assessment
- Absolute IQ-scores not important, but rather
their relative values - Problems
- Many IQ-criteria are subjective in nature
- Sources dont often publish useful quality
metadata - Difficult with large amounts of data
- Autonomous sources means subject to large changes
in quality
18Ranking Algorithms
- Simple Additive Weighting (SAW)
- Technique for Order Preference by Similarity to
Ideal Solution (TOPSIS) - Analytical Hierarchy Process (AHP)
- Elimination et Choice Translating Reality
(ELECTRE) - Data Envelopment Analysis (DEA)
19Potential Areas of Application
- Stock Information Systems
- Metasearch Engine
- Travelling Guides
- Distributed Molecular Biology Databases
- Current situation only first two suggestions
developed
20The HiQIQ Project
- HiQIQ - High Quality Information Querying
- Developing a MetaSearch Engine
- A mediator based architecture
- Each source is modelled as a view against the
global schema of the mediator - Each view is rated by a set of IQ criteria
- Aim is to find the optimal query result, based on
the IQ criteria
21HiQIQ MetaSearch Engine Architecture
- Important features
- Wrappers
- Mediator
- User Interface
22Main Differences
23Current Future Work
- Further development of Quality Toolkit
- Further development of Experimental System
- Ranking Method SAW ? TOPSIS
- Comparing metric weightings, metric values,
weightings values combined - Internet data many sources
24Summary
- Both projects born from similar problem
- Looking at problem from different perspective
- Complementary
25Felix Naumann References
- Naumann, F. (2002) Quality-Driven Query
Answering for Integrated Information Systems.
PhD Thesis, Lecture Notes in Computer Science
2261, Springer-Verlag, Berlin. - Naumann, F. (2001) From Databases to
Information Systems Information Quality Makes The
Difference. Proc. Int.l Conf. on Information
Quality (IQ 2001), MIT, Cambridge, USA. - Naumann, F. (1998) Data Fusion and Data
Quality. Proc. New Techniques and Technologies
for Statistics Seminar (NTTS '98), Sorrent,
Italy. - Project home page - http//www.hiqiq.de/
26Other References
- Burgess, M., Gray, W.A., Fiddian, N.J. (2002)
Establishing a Taxonomy of Quality for Use in
Information Filtering. Proc. 19th British
National Conference on Databases (BNCOD 19),
Sheffield, UK103-113, Springer LNCS 2405. - Chen, Y., Zhu, Q., Wang, N. (1998) Query
Processing with Quality Control in the World Wide
Web. World Wide Web, 1(4)241-255. - Redman, T.C. (1996) Data Quality for the
Information Age. Artech House, Boston. - Wang, R.Y., Strong, D.M. (1996) Beyond
Accuracy What Data Quality Means to Data
Consumers. Journal of Management Information
Systems. 12(4)5-34. - Project URL http//www.cs.cf.ac.uk/user/M.Burgess
/phd/index.html