Title: Santiago Eibe, M' Angel Hidalgo, Ernestina Menasalvas
1- Santiago Eibe, M. Angel Hidalgo, Ernestina
Menasalvas - Facultad de Informatica.
- Universidad Politecnica de Madrid
- UKDU- Berlin 2006
2Agenda
- Motivation and introduction
- Previous work
- The approach
3Motivation Data Mining Scenarios
- Mobile Health Monitoring mobile phones are being
used to help manage certain health conditions - Web Mining Applications
- Automotive continuous on-board monitoring and
mining vehicle data streams ( H. Kargupta, et al
04). VEDAS A Mobile and Distributed Data Stream
Mining System for Real-Time Vehicle Monitoring.
SIAM04 ) - Network Intrusion use data mining techniques to
discover consistent and useful patterns to detect
security anomalies - Autonomous Space Applications
- remote control stations
- on-board platforms
- mining of data collected from satellite sensors
- And more ...
4Evaluation and deployment in the new environment
Who is evaluating? Is the miner available?
What results should be deployed
5Data Mining evaluation in Ubiq scenaria
CRISP-DM (ASSESS MODEL) The data mining
engineer interprets the models according to his
domain knowledge, the data mining success
criteria and the desired test design
- Is the data miner going to be available?
- Multiple viewpoints integrated context
Visualization as a catalyst Models
design Requirements extraction Usability and
reusability Autonomy
6Related work
- J. Branch, B. Szymanski, R. Wolff, C. Gianella,
H. Kargupta. (2006). In-Network Outlier Detection
in Wireless Sensor Networks. (ICDCS) - S. Datta, K. Bhaduri, C. Giannella, R. Wolff, H.
Kargupta. (2006). Distributed Data Mining in
Peer-to-Peer Networks. (Invited submission to the
IEEE Internet Computing special issue on
Distributed Data Mining), - K. Liu, K Bhaduri, K. Das, P. Nguyen, H. Kargupta
(2006). Client-side Web Mining for Community
Formation in Peer-to-Peer Environments. SIGKDD
workshop on web usage and analysis (WebKDD). - Accessing and analyzing data from a ubiquitous
computing environment offer many challenges. One
of this is related to human-computer interaction.
From a Knowledge Discovery point of view,
important human-computer interaction issues are
collaborative problem .
7Related work
- Charu C. Aggarwal, Jiawei Han, Jianyong Wang,
Philip S. Yu A Framework for On-Demand
Classification of Evolving Data Streams. IEEE
Trans. Knowl. Data Eng. 18(5) 577-589 (2006) - When the data streams are fast and continuous, it
becomes important to analyze and predict the
trends quickly in online fashion - Dietrich Wettschereck, Alípio Jorge, Steve Moyle
Visualization and Evaluation Support of Knowledge
Discovery through the Predictive Model Markup
Language. KES 2003 493-501 - Towards Effective and Interpretable Data Mining
by Visual Interaction . Charu Aggarwal ACM02 - Therefore, a natural strategy would be to devise
a system which is centered around a
human-computer interactive process. In such a
system, the particular data mining task can be
divided between the human and the computer in
such a way that each entity performs the task
that it is most well suited to. The active
participation of the user has the additional
advantage that he has a better understanding of
the final results
8Previous results SolEuNet
- Steve Moyle Collaborative Data Mining. The Data
Mining and Knowledge Discovery Handbook 2005
1043-1056 - RAMSYS is a web-based infrastructure for
collaborative data mining. It is being developed
in the SolEuNet European Project for virtual
enterprise services in data mining and decision
support. Central to RAMSYS is the data of sharing
the current best understanding to foster
efficient collaboration.
9WHAT we proposeInfoVis, the catalyst for the
UDM process
- Simplify evaluation in ubiquitous environments
through visualization - Collaborative evaluation awareness model
- Fill the gap between data miner and the domain
expert - Domain expert can participate in the
collaborative process - Capture Semantics of the underlying process
(mining and business) - Tasks Descriptions to minimize the number and
importance of user (analyst of business expert)
errors in the process - User descriptions to cover the diversity of
interpretation - Device descriptions
- Systematic approach to forming and reproducing
visual data mining model evaluation in ubiquitous
environments
10Visualization and ubiquity the challenges
- Outputs to different devices/users (diversity
accessibility) - Inputs from different devices/users
- Data and knowledge presentation must respect the
limits imposed by the combination of ubiquitous
devices and general human perceptual and
cognitive limitations (e.g., display resolution),
and the specific requirements on accessibility
posed by peoples diversity - Location transparency
- Context sensitive
- Track changes
- Support feedback and iteration
- Support data streams mining
- Visualization of information related to patterns
extraction together with pattern is a must to
understand and track changes
11Visual Evaluation FrameworkDesigning Tasks
- Visualization Requirements Extraction
Requirement engeneering - Prototyped based (user cases)
- Output Requirement Extraction
- Format
- Location
- User expertice
- Scope
- context
- Visual Metaphoras Design
- From a visual evaluation components library of
previous designs - reusability
- Defining new (ad-hoc) metaphoras
12Visual Evaluation Framework
13Scenes
- Data Store
- Centralized or distributed
- Continuos load and management of the data
- Dealing with stream, windows, data aging
- Modeling Scene
- Mining Scene
- At least on data mining model
- Optionally
- User model
- Recommedation model
- Obedience model
- Ubiquitous Scene
- Descriptions related to ubiquitous environment
- Awareness model
14Scenes(II)
- Transformation Scene
- Two main types
- Visualization-related
- from rules to graphs, filters, plug-ins, ...
- Ubiquity-related
- integration, adaptability, awareness,
- Visualization Scene
- Visual controls
- Visualization algorithms
- Controls and mechanisms to aid user interaction
- etc
15Actors and channels
- Actors Different user profiles
- Human
- Machine (servicies)
- Channels
- Constraints to limit interactions in the model
- Support the underlying semantics for
visualization - Example
- If a user wants to transform a tree into a graph
and that is not possible then there will not be a
channel - We have templates to allow definitions of new
interaction
16Advantages of the approach so far
- Models user behaviour in the evaluation
- Models semantics of the context
- Models interactions
- Usability
- Reusability
- Interaction of the user will be minimized
- A step towards collaborative mining
- Some degree of autonomy
17Thanks!