Christoph F. Eick

About This Presentation

Title:

Christoph F. Eick

Description:

Title: Database Clustering and Summary Generation Author: eick Last modified by: Christoph Eick Created Date: 11/6/1998 8:08:29 PM Document presentation format – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 25

Provided by: eic91

Learn more at: https://www2.cs.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: Christoph F. Eick

1
Christoph F. Eicks Areas of Interest

Knowledge Discovery in Data and Data Mining (KDD)
Expertise in developing and using data mining
techniques and tools --- mostly for structured
data collections (also started some work
concerning images)
Database Clustering / Generalizing Data Mining
Techniques for Databases
Preprocessing in KDD
Constructive Induction, Symbolic Regression, and
Genetic Programming
Agent-based Technologies
Ontologies and Semantic Brokering
The InfoSleuth Information Gathering System
Integration of Agent-based Technologies and
Knowledge Discovery/Data Mining
Knowledge-based Systems, Expert Systems, and
Knowledge Acquisition
Using Bayesian Technology to Assist Decision
Making (in Medicine and other domains)
Computerization of Medical Practice Guidelines
Genetic Programming and Evolutionary Techniques
Sound background in Data Models, Databases, and AI

2
Data Miningfor the Health Sciences

Christoph F. Eick
www.cs.uh.edu/ceick/eick-uw.html
ceick_at_u.washington.edu
University of Houston
Organization
1. Health Care and Computer Science
2. Promising Technologies
2.1 KDD / Data Mining
2.2 Agent-based Systems
2.3 Shared Ontologies and Knowledge
Brokering
3. Summary and Conclusion

3
1. Health Care and Computer Science

Not too long ago (e.g. 1989)
Offline data / Missing data / hand written
reports
Computer that cannot talk to each other
Lack of standardization (Tower of Babel, too many
languages)
Human is frequently the gold standard
Today faster computers, cheaper computers,
better computer networks, electronic scanners,
better connectivity, the internet,...
We have a lot of computerized knowledge on almost
any aspects of human health(a well of knowledge)
We have much more computing power to conduct
complex data analysis tasks
New Problems
How can we find anything?
How do we gather information that is distributed
over various computer systems and represented
using different formats?
If we find something, how do we know that it is
complete?
How can this large amount of information be
analyzed?
What information can we trust?

4
Promising Newer Technologies to Cope with the
Information Flood

Knowledge Discovery and Data Mining (KDD)
Agent-based Technologies
Shared Ontologies and Knowledge Brokering
Non-traditional data analysis techniques
Structural Search and Indexing Techniques

5
Knowledge Discovery in Data and Data Mining
(KDD)
Let us find something interesting!

Definition KDD is the non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data
(Fayyad)
Frequently, the term data mining is used to refer
to KDD.
Many commercial and experimental tools and tool
suites are available (see http//www.kdnuggets.com
/siftware.html)
Field is more dominated by industry than by
research institutions

6
What is KDD?

Definition KDD is the non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data
(Fayyad)
The identified knowledge is used to
make predictions
classify new examples
summarize the content of data collections and
documents to facilitate understanding, decision
making, and for supporting search and indexing
support graphical visualization to aid human in
discovering deeper patterns
Example applications
learn to classify brain tissue from examples
predict a patients life expectancy from his
medical history
summarize/cluster/mine clinical trial reports

7
General KDD Steps
Data sources
Selected/Preprocessed data
Transformed data
Extracted information
Knowledge
Select/preprocess
Transform
Data mine
Interpret/Evaluate/Assimilate
Data preparation
8
KDD and Classical Data Analysis

KDD is less focused than data analysis in that it
looks for interesting patterns in data classical
data analysis centers on analyzing particular
relationships in data. The notion of
interestingness is a key concept in KDD.
Classical data analysis centers more on
generating and testing pre-structured hypothesis
with respect to a given sample set.
KDD is more centered on analyzing large volumes
of data (many fields, many tuples, many tables,
).
In a nutshell the the KDD-process consists of
preprocessing (generating a target data set),
data mining (finding something interesting in the
data set), and post processing (representing the
found pattern in understandable form and
evaluated their usefulness in a particular
domain) classical data analysis is less
concerned with the the preprocessing step.
KDD involves the collaboration between multiple
disciplines namely, statistics, AI,
visualization, and databases.
KDD employs non-traditional data analysis
techniques (neural networks, decision trees,
fuzzy logic, evolutionary computing,).

9
Key Ideas Agent-based Technologies

Agents operate independently and anticipate user
needs (P. Maes)
Agent help users suffering from information
overload (O. Etzioni) rather to mimic human
intelligence
Agents are important because the allow users to
interoperate with modern applications such as
electronic commerce and information retrieval.
Most of these applications assume that components
are added dynamically and that they will be
autonomous (serve different users and providers
to fill different goals) and heterogeneous. (M.
Singh)
Essentially, agent-based architectures are
characterized by three key features autonomy,
adaptation, and cooperation. Agent-based systems
are computational systems in which several agents
interact for their own good and for the good of
the overall system.
In an agent-based architecture services are
provided in the context of a community of loosely
coupled agents of various types in a distributed
environment.
Agents are aware of their environment and
capable of communicating with other agents that
belong to the same agent community.

10
Simplified View of Agent-based Systems
Mediator Agents
End User Agents
Service Provider Agents
Agents that act on behalf of end users that look
for services
Agents that act as a matchmaker between service
providers and end users
Agents that act on behalf of service providers
Conversation Layer
Message Layer
11
A few more things on Agents

Why do agent-based systems show promise for
health care?
Scalability
Tasks to be solved involve the collaboration
between different groups
Well suited for the world-wide web
Health care is a dynamically changing environment
Establish standards (as a by product)
Third International Conference on AUTONOMOUS
AGENTS (Agents '99), Seattle, Washington, May
1-5, 1999 (http//www.cs.washington.edu/research/a
gents99/)

12
Generating Models

The goal of model generation (sometimes also
called predictive data mining) is the creation,
evaluation, and use of models to make predictions
and to understand the relationships between
various variables that are described in a data
collection. Typical example application include
generate a model to that predicts a students
academic performance based on the applicants data
such as the applicants past grades, test scores,
past degree,
generate a model that predicts (based on economic
data) which stocks to sell, hold, and buy.
generate a model to predict if a patient suffers
from a particular disease based on a patients
medical and other data .
Neural networks, decision trees, naïve Bayesian
classifiers and networks, many other statistical
techniques, fuzzy logic and neuro-fuzzy systems
are the most popular model generation tools in
the KDD area.
All model generation tools and environments
employ the basic train/evaluate/predict cycle.

13
Participants in an Agent-basedData Analysis /
KDD Society
Data Analysts
Data Collection Providers
Tool Builders
End Users (Managers, Doctors, Decision Makers,
Gamblers,...)
14
Problems of Model Generation

It is difficult to find appropriate data
collections.
Sharing of models is not supported.
Model generation is mostly performed in a
centralized environment, not taking advantage of
distributed computed computing technology.
Degree of tool standardization is low, which
makes more difficult to use different tools for
the same data analysis problems.
Evaluation of claims with respect to to the
performance models is very difficult. Problem
the model itself, as well as tools and data
collection that were used to generate the model
are not accessible online.

15
Agent-based Model Generation

Model generation services are provided in the
context of a community of loosely coupled agents
of various types in a distributed environment.
Model generation tools are accessed using a
unified interface.
Tool providers and data collection providers
offer their services to data analysts and
end-users via the internet. New forms of
collaboration can easily be supported in this
environment
data analysts no longer run the tools on their
own computing environment
brokering techniques can be used to find
interesting data collections, suitable tools,
useful models, and available ontologies.
tool developers offer tool services on the
internet charging one-time tool use fee.

16
Model Generation Agent Communities
Data Collection Provider
Resource Generation Tool
Model
Model
Data Collection
Resource Agent
Model Generation Browser
End User
Resource Agent
Data Collection Broker
Model Broker
Model Generation Browser
Data Collection
Tool Broker
Data Collection

Data Analyst
Model Generation Tool
Model Generation Tool
Agent-based Model Generation Community
Tool Developer
Tool Integration Tool
17
Shared Ontologies

Ontologies are content theories about sorts of
objects, properties of objects, and relationship
between objects that are possible in a specified
domain of knowledge (Chandrasekaran)
We consider ontologies to be domain theories
that specify a domain-specific vocabulary of
entities, classes, properties, predicates, and
functions, and a set of relationships that
necessarily hold among those vocabulary items
(Fikes)
Shared ontologies form the basis for domain
specific knowledge representation languages
(Chandrasekaran)
If we could develop ontologies that could be
used as the basis of multiple systems, they would
share a common terminology that would facilitate
sharing and reuse (W. Swartout)
Ontologies play an important role for the
standardization of terminology in medicine (e.g.
UMLS) and other domains
Ontologies can serve as the glue between
knowledge that is represented at different,
usually heterogeneous information sources.

18
What are Ontologies good for?

As a shared conceptual model of a particular
application domain that describes the semantics
of the objects that are part of the domain, and
captures knowledge that is inherent to the
particular domain --- idea knowledge base .
Ontologies provide a vocabulary for representing
knowledge about a domain and for describing
specific situations in a domain (tool for
defining and describing domain-specific
vocabularies) --- idea language for
communication
For data/knowledge translation and transformation
(provide a solution to the translation problem
between different terminologies) for fusion and
refinement of existing knowledge --- idea
interoperation
For matchmaking between users, agents, and
information resources in agent-based systems ---
idea collaboration, brokering focus of
next slides
As reusable building blocks to build systems that
solve particular problems in the application
domain --- idea model reuse
Summary Ontologies can be used as building
block components of knowledge bases, object
schema for object-oriented systems, conceptual
schema for data bases, structured glossaries for
human collaborations, vocabularies for
communication between agents, class definitions
for conventional software system, etc. (Fikes)

19
Ontologies and Brokering

Service providers describe their capabilities in
terms of a domain (or task) ontology
Agents that seek services describe their needs in
terms of a domain (or task) ontology
Broker agents server as matchmakers between
service providers and service seekers by finding
suitable agents and by evaluating the extent to
which they can provide those services relying on
a semantic brokering approach.
Various languages have been advocated in the
recent years to specify ontologies OKBC,
CKML/OML, ONTOLINGUA, XML, UMLS,...

20
Service Provider Agents
End User Agents
A Traditional Approach
Search Engine
Specify keywords with respect to the documents
they are looking for
Clinical Trial Report
Abstract Clinical Trial Report
Summary
Semantic Brokering Approach
Service Provider Agents
End User Agents
Semantic Brokering
Specify subset of ontology
Clinical Trial Report
Subset of an Ontology
Summary
matchmaking
21
Example Semantic Brokering
Data Analysts Information Requirement
Patient
Result Semantic Brokering ((DataCollection1 nil
((missing slot weight)
(contradictory (lt age 15) (gt age 40))
(DataCollection2 t) (DataCollection3 t ((gt age
60)(gt weight 300)))
Agegt40
weight
Intensive-Care- Patient
Hours-in-intensive-care
Data Collection1
Data Collection2
Data Collection3
Patient
Patient
Patient
Agelt15
age
Agegt60
weight
Weightgt300
Intensive-Care- Patient
Intensive-Care- Patient
Intensive-Care- Patient
Hours-in-intensive-care
Hours-in-intensive-care
Hours-in-intensive-care
22
Critical Problems with Respect to Shared
Ontologies

Scientific communities have to agree on
ontologies otherwise, the whole approach is
flawed.
Development of ontologies for a particular domain
is a difficult task (see Digital Anatomist
project at UW, development of UMLS). The
development of user friendly, and intelligent
knowledge acquisition tools is very important for
the successful development of shared ontologies.
Expressiveness of languages that are used to
define ontologies limits what can be done with
domain ontologies.
Reasoning capabilities are important for systems
that use shared ontologies (we need a language to
specify ontologies and an inference engine that
can reason with the given ontologies)
finding inconsistencies in knowledge bases, for
finding errors at data entry
semantic brokering
more intelligent mappings between terms
...

23
Promising Technologies to Use theFlood of Data
for Providing Better Health Care
Agent-based Systems Structural Indexing
Techniques
Software Development Environments Knowledge Acqui
sition Tools
KDD Visualization Traditional Data Analysis
Techniques
The Well of Knowledge
Database Technology
Shared Ontologies
Semantic Brokering
24
References

WWW-Links
http//www.nlm.nih.gov/pubs/cbm/umlscbm.html
(UMLS)
http//ksl-web.stanford.edu/Reusable-ontol/P001.ht
ml (Richard Fikes (Stanford University) Slide
Show on Reusable Ontologies
http//www.kdnuggets.com/index.html (KDD Nuggets
Directory Data Mining and Knowledge Discovery
Resources)
http//www.mcc.com/projects/infosleuth/
(InfoSleuth (MCC) --- an Agent-based System for
Information Gathering)
http//www.cs.cmu.edu/softagents/ (CMU
Intelligent Software Agents Page)
Papers
Special Issue IEEE Intelligent Systems on Coming
to Terms with Ontologies, Jan./Feb. 1999.
Special Issue IEEE Intelligent System on
Unmasking Intelligent Agents, March/April 1999.
Special Issue Communications of the ACM on Data
Mining, vol. 39, no. 11, November 1996.