Information Management and Data mining - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Information Management and Data mining

Description:

Data sharing: data must be available 'anywhere, any time, and in almost any form' ... Enhance data sharing on P2P networks to offer the same high quality access to ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 23
Provided by: siteUo
Category:

less

Transcript and Presenter's Notes

Title: Information Management and Data mining


1
Information Management and Data mining
  • Presented by Dr. Herna L Viktor
  • Others Dr. Iluju Kiringa
  • Dr. Thomas Tran
  • Dr. Liam Peyton

2
  • Information overload The amount of knowledge in
    the world has doubled in the past ten (10) years
    and is doubling every 18 months American
    Society of Training and Documentation (ASTD)
  • Massive Petabytes (250) data repositories E.g.
    it is estimated that Google maintains 4 Petabytes
    of RAM.
  • E-Commerce and the Web A digital marketplace
    eHealth
  • Data sharing data must be available anywhere,
    any time, and in almost any form
  • The Digital Rosetta Stone Our digital heritage
    is in danger of being lost due to the silent
    obsolesce of current technology
  • OUR RESEARCH
  • How do we share/store/preserve this data?
  • What information can we use to improve our
    decision making?
  • How do we obtain/extract and explore the hidden
    knowledge?

3
Information Management and Data Mining Research
Five Themes
  • Data/Information Management
  • (T1) Dr. Iluju Kiringa Data Sharing
  • (T2) Dr. Herna L Viktor Relational and
    multimedia data mining
  • (T3) Dr. Thomas Tran Software agents for
    e-Commerce
  • (T4) Dr. Herna L Viktor Long-term preservation
    of data
  • (T5) Dr. Liam Peyton Accessible data warehousing
    for e-health

4
(T1) Data Sharing Dr. Iluju Kiringa
  • Data must be available anywhere, any time, and
    in almost any form thus we must cope with
  • very large networks of data sources
  • complex heterogeneity among the sources
  • Inconsistent data across the sources
  • data sharing and exchange between the sources
  • etc.
  • Several applications illustrate this need
  • Genomic data
  • E-health
  • Enterprise alliances

5
Background and GoalsDr. Iluju Kiringa
  • Background data sharing on peer-to-peer networks
  • P2P networks are open-ended networks of
    distributed computational nodes (peers)
  • Each peer can directly exchange data and/or
    services with a set of other peers
  • Peers act autonomously, including for
    joining/leaving
  • Peers are not subject to global control in the
    form of global registries, global services,
    global resource management, or global schema and
    data repository
  • Mostly used for sharing files (plain text, songs,
    movies, video, etc) some examples are
  • Napster, Gnitella, Kaaza file sharing
    applications
  • Seti_at_home distributed computing application
  • Research Goal
  • Enhance data sharing on P2P networks to offer the
    same high quality access to data that the
    classical distributed relational DBMSs offer

6
Data Sharing Research IssuesDr. Iluju Kiringa
  • Heterogeneity management
  • Interoperability of peer databases
  • Syntactic and semantics heterogeneity
  • Dynamics and scale management
  • Protocols for peer databases to join/leave
    networks
  • Query processing via propagation
  • Query propagation through the network
  • Query optimization
  • Data coordination using
  • update propagation
  • distributed triggers
  • Transaction processing
  • Design non-classical transaction models and
    correctness criteria
  • Implement the models
  • Service-oriented architecture
  • Design and compare several possible architecture
    for a peer DBMS
  • Implement some of these architectures
  • Deploy a real retwork
  • Applications

7
(T2) Data MiningDr. Herna L Viktor
  • Multi-relational data mining and link mining
  • Aim to directly mine a relational database,
    without extensive preprocessing or flattening

8
Data MiningDr. Herna L Viktor
  • Multimedia (2D and 3D) data mining
  • Searching for similarities in multimedia
    databases
  • Locating clusters of images, 3D objects
  • Classifying images, 3D objects within a cluster
  • Application
  • Anthropometry (poster)
  • Health care
  • Cultural Heritage

9
(T3) Software Agents in E-Commerce Dr. Thomas
Tran
  • The concept of an agent provides a convenient and
    powerful way to describe a complex software
    entity that is capable of acting with a certain
    degree of autonomy in order to accomplish tasks
    on behalf of its user.
  • An agent is defined in terms of its behavior.

10
Supporting Decision Making Dr. Thomas Tran
  • Designing Intelligent Business Software Agents
    for E-Commerce
  • Modeling Trust and Reputation in E-Commerce
  • Developing Agent-Based Frameworks for Mobile
    Business
  • Designing Recommender Systems for E-Commerce

11
(T4) Long-term preservation of dataDr. Herna L
Viktor
  • The Digital Rosetta Stone
  • The life-time of a digital file is only a few
    decades
  • We might need the digital file in 50 years
  • Our repositories may become data morgues,
    containing data which are in formats that cannot
    be interpreted by present and future generations.
  • Towards a solution

12
Long-term preservation of dataDr. Herna L Viktor
  • Research issues
  • scalability of information and infrastructure
  • managing heterogonous data sources
  • handling updating of hardware and software
  • transparent storage, management and retrieval

to investigate effective ways to store, maintain
and analyze digital objects over a very long
period of time (50 years ) Approach Detachment
from original media Transparent migration to new
technologies Emulate old software on new
technologies
13
Long-term preservation of dataDr. Herna L Viktor
  • Architectural framework

14
(T5) Evolving E-Health Business Processes Around
Accessible Data WarehousesDr. Liam Peyton
  • Goals
  • Process improvement to take advantage of
    e-technologies and Data warehouse (DW)
  • Methodology to specify, automate, manage, and
    analyze DW-oriented, e-health processes
  • Addresses privacy, confidentiality, quality, and
    consent, as well as heavy legacy (and often
    manual) processes and regulatory environments
  • Activities
  • Simulation of Ottawa Hospital Data Warehouse and
    environment
  • Business Intelligence prototype Infection
    control data mart, Discharge process data mart
  • Quality Assurance Framework and Portal

15
Assessment Framework Tied to Operational Systems,
Performance MGT Data Warehouse Strategy
Stakeholders
Use Case Maps
Goals
Reports
PIQ
Tasks
Business Systems Processes
Performance Mgt Systems Processes
PIQ measures the effectiveness of Reports to
measure effectiveness of Organization in meetings
its goals.
Data Warehouse
16
In Summary Vast, evolving repositories
17
  • Google in 2003 had between 2 and 5 petabytes of
    hard-disk storage. A more recent calculation,
    dated June 27, 2006, suggests that the Google
    cluster may now have 4 petabytes of RAM, on the
    same order of magnitude as the quantity of hard
    disk space that was estimated only three years
    earlier.
  • As of October 15, 2005, all the files being
    shared on Kazaa totaled around 54 petabytes.
  • 15 petabytes of data will be generated each year
    in particle physics experiments using CERNs
    Large Hadron Collider, due to be launched in 2007
  • In 2007, NOAA maintains approximately 1 Petabyte
    of climate data. NOAA expects that their
    Comprehensive Large Array-data Stewardship System
    (CLASS) library will hold 20 Petabyte of data by
    2011, 140 Petabyte by 2020

18
In Summary Vast, evolving repositories
  • Our research aims to develop new, efficient ways
    to manage, share and analyze such data

19
Graduate studentsDr Thomas Tran
  • Grad Students
  • Richong Zhang (PhD)
  • Zhiyong Weng (MCS)
  • Vikas Kumar (MCS)
  • Xiaoguang Ma (MCS)
  • Tapu Kumar Ghose (MCS)
  • Catherine Cormier (MSc)
  • Hong Chen (MSc)
  • Bo Zhan (MCS, co-supervised with Prof. Liam
    Peyton)
  • Yao Gu (MCS, part time)

20
Graduate students and their projectsDr. Herna L
Viktor
  • Hongyu Guo (PhD) Multi-view learning
  • Rana Awada (PhD) XML database mining (prelim)
  • Nadia Azam (M.Sc.) Link-based clustering
  • Bo Wang (M.Sc.) A storage resource broker agent
    for long-term preservation
  • Divine Muhivu (M.Sc.) Data integration through
    link mining
  • Isis Pena Sanchez (M.Sc) Interestingness
    mesaurements for data mining
  • Minjie Shao (M.Sc.) Mining the adverse effects
    of medication
  • Xiaomei Xia (M.Sc.) Distributed data warehouse
    query processing
  • Joining us Julie Doyle, PhD- Long-term
    preservation of data
  • Collaborations NRC, Faculty of Management

21
Graduate studentsDr. Liam Peyton
  • Masters Students
  • Sepideh Ghanavati
  • Pierre Seguin
  • Bo Zhan
  • Collaboration with
  • Prof. Daniel Amyot (Ottawa)
  • Prof. Greg Richards (Ottawa)
  • Prof. Michael Weiss (Carleton)
  • Dr. Alan Forster (Ottawa Hospital)

22
Graduate students and collaborationsDr. Iluju
Kiringa
  • Have implemented an experimental peer DBMS
  • This is joint work with
  • Renee Miller (Toronto)
  • John Mylopoulos (Toronto Trento)
  • Vasiliki Kantere (Athens -- NTUA)
  • Anastasios Kementsietsidis (Edinburgh)
  • Several students in Toronto
  • Lei Jiang
  • Dan Zhao
  • Patricia Rodriguez
  • and Ottawa
  • Mehedi Masud
  • Anisur Rahman
  • Irfan Maki
  • Several alumni
  • More (strong) students are needed !!!!!
  • Here is a link to visit http//www.cs.toronto.edu
    /db/hyperion
Write a Comment
User Comments (0)
About PowerShow.com