Data Mining general ideas - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Data Mining general ideas

Description:

Scrubbing, selecting, cleansing, preprocessing,... Eliminate redundancy ... Reserve relevant preprocessing for the data analysis. Data analysis. Techniques: ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 9
Provided by: josignaci
Category:

less

Transcript and Presenter's Notes

Title: Data Mining general ideas


1
Data Mining general ideas
2
Data Mining a definition
Art/Science of uncovering non-trivial,
valuable information from a large database
  • Emphasis on
  • non-obvious (difficult)
  • useful (cost vs benefit)
  • large (automatic)
  • Yet, no rules, provided that the process is
    efficient in
  • time, space and human resources.

3
Three big steps
Data preparation
Data analysis
Neural Networks
Decision making
4
Data preparation
Extract / Integrate data Transform Select Cleanse
Data warehouse
50-80 project time
5
Scrubbing, selecting, cleansing, preprocessing,
  • Eliminate redundancy
  • Eliminate irrelevant data
  • Deal with missing data
  • mean, clever substitute, interpolate, ignore, ?
  • Correct errors
  • Outliers
  • Check consistency
  • Reserve relevant preprocessing for the data
    analysis

6
Data analysis
  • Techniques
  • Decision trees
  • Association rules
  • Polynomial regressions
  • Genetic algorithms
  • Neural networks
  • Conceptual tasks
  • Classification
  • Optimization
  • Interpolation
  • Modeling
  • Prediction
  • Goals
  • Target marketing
  • Market segmentation
  • Process control
  • Sales forecasting
  • Market laws

Neural networds are a general purpose analysis
tool based on machine learning from patterns. As
a mathematical tool, they implement bayesian
inference. They build non-linear models from
examples.
7
Decision making
(Sometimes undestimated in neural network culture)
Data analysis may seem unscrutable
  • Data analist must deeply understand the problem
  • Results must be fairly presented
  • Post-processing and merging with subjective
    factors
  • is often necessary
  • Strict validation is necessary

8
Why is Data Mining not widely used?
  • Why commercial suites (Office, Lotus,
    StarOffice,) do
  • not include neural networks (or other advanced
    tools)?
  • Bold exploitation is often meaningless or
    non-competitive.
  • Common sense and good training are needed to
    work
  • out a valuable neural net model high cost
  • We are living through the early stages of the
    Information Era
Write a Comment
User Comments (0)
About PowerShow.com