Data mining: methods and applications - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Data mining: methods and applications

Description:

Algorithmic basic research in areas where the results can be ... Spatial data (onomastics) Research topics: methods. Finding structure in large 0-1 data sets ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 18
Provided by: mann84
Category:

less

Transcript and Presenter's Notes

Title: Data mining: methods and applications


1
Data mining methods and applications
  • Heikki Mannila
  • FDK Scientific Advisory Board Meeting Nov. 28,
    2002

2
Why?
  • Lots of data
  • Novel types of data
  • Computational techniques gaining in importance
  • Algorithmic and probabilistic methods are needed

3
Approach and goals
  • Application area ? concept formation ?
    algorithmic question ? algorithm ? analysis ?
    back to practice
  • Algorithmic basic research in areas where the
    results can be put into practice
  • Understanding the fundamental properties of data
    summarization and analysis

4
Computational methods
  • Pattern discovery
  • Finding recurrent patterns in large data sets
  • Succinct representations of data
  • Combinatorial algorithms
  • Dynamic programming etc.
  • Probabilistic modeling and analysis (Markov Chain
    Monte Carlo)
  • Combining pattern discovery and combinatorial
    algorithms

5
Application areas
  • Genome structure
  • Understanding the types of variation in genomes
    (within and between species)
  • Gene mapping
  • Gene expression
  • Document data
  • Telecommunications data
  • Spatial data (onomastics)

6
Research topics methods
  • Finding structure in large 0-1 data sets
  • Similarity between complex data objects and
    descriptions
  • Foundations of data analysis
  • Condensed representations for classes of queries
  • Probabilistic logic and its uses
  • Pattern discovery and Bayesian analysis
  • Combining different types of data (sequence,
    spatial, temporal)

7
Structure in 0-1 data sets
  • I.e., document data
  • Find collections of words that tend to co-occur
    topics / m-out-of-n concepts
  • Patterns and probabilistic modeling
  • Leads to questions in generalizing Bonferronis
    inequality
  • Distances between representations

8
Research topics applications
  • Genome structure
  • Haplotype blocks how is the variation in the
    human genome structured?
  • Finding recurrent sources in sequences

9
Haplotype blocks
  • Is the variation between the genomes of
    individuals distributed uniformly?
  • Data aligned 0-1 sequences
  • Task find whether the data have a block
    structure
  • Concept block structure segmentation centers
    variation

10
Blocks, cont.
  • Quality of structure description length
  • Algorithm Dynamic programming and clustering ?
    block boundaries
  • Dynamic programming ? estimates for the strength
    of block boundaries

11
(No Transcript)
12
Finding recurrent sources for sequences
  • Data long sequences (genome, telecom,
    paleoecology)
  • Problem do some parts of the sequence come from
    the same source?
  • Probabilistic model underneath
  • Concept (k,h)-segmentation

13
(k,h)-segmentations
  • A (multidimensional) time series
  • A segmentation of the sequence into k pieces, but
    with at most h different levels
  • Normal segmentation into homogenous pieces
    (k,k)-segmentation

14
(k,h)-segmentation, cont.
  • NP-hard in the general case
  • Status unknown for dimension 1
  • -approximation algorithm for L2
  • 3-approximation for L1
  • Simple and practical algorithms
  • Good performance
  • Results in applications

15
(No Transcript)
16
Future topics
  • Theory and applications of data mining
  • Combinatorial algorithms with probabilistic
    notions statistics
  • Foundations condensed representations, machine
    learning and probabilistic logic
  • Spatial and spatio-temporal data
  • Genome structure
  • Linguistic data, process data,

17
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com