Quality Driven Information Searching: A Tale of Two Projects PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: Quality Driven Information Searching: A Tale of Two Projects


1
Quality Driven Information Searching A Tale of
Two Projects
Mikhaila Burgess, Department of Computer
Science, Cardiff University. Supervisors
Prof. W.A.Gray Prof. N.J.Fiddian
CONOISE Tuesday 8th April 2003
2
Presentation Outline
  • Overview of our work
  • Quality Framework
  • Experimental domain
  • Dr Felix Naumann
  • Quality driven query answering
  • The HiQIQ MetaSearch Engine project
  • Comparing our two projects
  • References

3
Information Searching
  • Knowledge of database query language
  • Specificity of database queries
  • Preference SQL still needs language knowledge
  • Information overload in distributed information
    systems
  • Different levels of quality

4
PhD Hypothesis
  • It is possible to develop a fuller, flexible
    quality framework, which can be used to
    dynamically create user-defined quality filters
    of varying granularity, incorporating multiple
    quality metrics, which can then be used to focus
    information searches onto relevant search
    domains.

5
Quality Framework
  • Three main categories Cost, Utility, Time
  • 60 metrics
  • Stored with related information
  • Too small to see clearer version at end of
    handout

6
Domain-Specific Quality Frameworks
  • Using generic model as basis
  • Contains
  • relevant generic quality metrics
  • domain-specific quality metrics
  • mapping information - between available data and
    quality metrics

7
University Framework Sample
Suitability sub-category of Utility
8
Selecting Desired Metrics
9
Current Experimental Domain
  • University Data
  • Teaching
  • Research
  • Funding
  • Etc
  • Data from Times and Guardian
  • Find best institution based on weighted qualities

10
Quality-Driven Query Answering for Integrated
Information Systems
  • PhD Thesis of Dr Felix Naumann
  • satisfyingly answering user queries against a
    given global schema of an integrated information
    system.

11
Problem Addressed in Thesis
  • Given a set of data sources, plus a user demand
  • Identify best sources to answer the query
  • Efficiently effectively combine those sources
    to a best plan
  • Integrate plan results in best possible way

12
Information Quality
  • an aggregated value of multiple IQ-criteria
  • Set of IQ criteria assimilation of previous
    research in this area
  • R. Wang and D. Strong (TDQM)
  • T. Redman (information management)
  • Chen et al (web query processing)
  • plus others

13
IQ Criteria
  • Content-related properties relating to actual
    data retrieved
  • Accuracy, Completeness, Customer Support,
    Documentation,
  • Interpretability, Relevancy, Value Added
  • Technical measure aspects determined by soft-
    hardware of source, network, user, etc
  • Availability, Latency, Price, Response Time,
    Security, Timeliness
  • Intellectual subjective aspects of data source
  • Believability, Objectivity, Reputation
  • Instantiation-related concerning presentation of
    current instance of data
  • Amount of Data, Representational Conciseness,
    Representational Consistency,
  • Understandability, Verifiability

14
IQ-Criteria Selection
  • Depends on four factors
  • Application Domain
  • Users
  • Provider of Integrated System
  • Assessment of Criteria - assigning numerical
    values (IQ-scores) to IQ-criteria

15
Information Quality Assessment
  • Where scores are determined by user experience,
    personal views, etc

Determined by process of querying, so scores are
not fixed they vary between queries
Analysing the data itself
16
IQ Assessment Methods
17
Information Quality Assessment
  • Absolute IQ-scores not important, but rather
    their relative values
  • Problems
  • Many IQ-criteria are subjective in nature
  • Sources dont often publish useful quality
    metadata
  • Difficult with large amounts of data
  • Autonomous sources means subject to large changes
    in quality

18
Ranking Algorithms
  • Simple Additive Weighting (SAW)
  • Technique for Order Preference by Similarity to
    Ideal Solution (TOPSIS)
  • Analytical Hierarchy Process (AHP)
  • Elimination et Choice Translating Reality
    (ELECTRE)
  • Data Envelopment Analysis (DEA)

19
Potential Areas of Application
  • Stock Information Systems
  • Metasearch Engine
  • Travelling Guides
  • Distributed Molecular Biology Databases
  • Current situation only first two suggestions
    developed

20
The HiQIQ Project
  • HiQIQ - High Quality Information Querying
  • Developing a MetaSearch Engine
  • A mediator based architecture
  • Each source is modelled as a view against the
    global schema of the mediator
  • Each view is rated by a set of IQ criteria
  • Aim is to find the optimal query result, based on
    the IQ criteria

21
HiQIQ MetaSearch Engine Architecture
  • Important features
  • Wrappers
  • Mediator
  • User Interface

22
Main Differences
23
Current Future Work
  • Further development of Quality Toolkit
  • Further development of Experimental System
  • Ranking Method SAW ? TOPSIS
  • Comparing metric weightings, metric values,
    weightings values combined
  • Internet data many sources

24
Summary
  • Both projects born from similar problem
  • Looking at problem from different perspective
  • Complementary

25
Felix Naumann References
  • Naumann, F. (2002) Quality-Driven Query
    Answering for Integrated Information Systems.
    PhD Thesis, Lecture Notes in Computer Science
    2261, Springer-Verlag, Berlin.
  • Naumann, F. (2001) From Databases to
    Information Systems Information Quality Makes The
    Difference. Proc. Int.l Conf. on Information
    Quality (IQ 2001), MIT, Cambridge, USA.
  • Naumann, F. (1998) Data Fusion and Data
    Quality. Proc. New Techniques and Technologies
    for Statistics Seminar (NTTS '98), Sorrent,
    Italy.
  • Project home page - http//www.hiqiq.de/

26
Other References
  • Burgess, M., Gray, W.A., Fiddian, N.J. (2002)
    Establishing a Taxonomy of Quality for Use in
    Information Filtering. Proc. 19th British
    National Conference on Databases (BNCOD 19),
    Sheffield, UK103-113, Springer LNCS 2405.
  • Chen, Y., Zhu, Q., Wang, N. (1998) Query
    Processing with Quality Control in the World Wide
    Web. World Wide Web, 1(4)241-255.
  • Redman, T.C. (1996) Data Quality for the
    Information Age. Artech House, Boston.
  • Wang, R.Y., Strong, D.M. (1996) Beyond
    Accuracy What Data Quality Means to Data
    Consumers. Journal of Management Information
    Systems. 12(4)5-34.
  • Project URL http//www.cs.cf.ac.uk/user/M.Burgess
    /phd/index.html
Write a Comment
User Comments (0)
About PowerShow.com