I: Introduction to Data Mining

About This Presentation

Title:

Description:

Number of Views:19

Avg rating:3.0/5.0

Slides: 14

Provided by: Compu265

Learn more at: https://www2.cs.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: I: Introduction to Data Mining

1
I Introduction to Data Mining

2
Teaching Plan for the Next 5 Weeks

3
Knowledge Discovery in Data and Data Mining
(KDD)
Let us find something interesting!

Definition KDD is the non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data
(Fayyad)
Frequently, the term data mining is used to refer
to KDD.
Many commercial and experimental tools and tool
suites are available (see http//www.kdnuggets.com
/siftware.html)
Field is more dominated by industry than by
research institutions

4
Why Mine Data? Commercial Viewpoint

Lots of data is being collected and warehoused
Web data, e-commerce
purchases at department/grocery stores
Bank/Credit Card transactions
Computers have become cheaper and more powerful
(? machine learning techniques become applicable)
Competitive Pressure is Strong
Provide better, customized services for an edge
(e.g. in Customer Relationship Management)

5
Why Mine Data? Scientific Viewpoint

6
Mining Large Data Sets - Motivation

The Data Gap
Total new disk (TB) since 1995
Number of analysts
7
Data Mining Tasks

8
Classification Example
categorical
categorical
continuous
class
Learn Classifier
Training Set
9
Classifying Galaxies
Courtesy http//aps.umn.edu

Early

Intermediate
Late

10
What is Clustering?

Given a set of objects, each having a set of
attributes, and a similarity measure among them,
find clusters such that
Objects in one cluster are more similar to one
another.
Objects in separate clusters are less similar to
one another.
Similarity Measures
Euclidean Distance if attributes are continuous.
Other Problem-specific Measures.

11
Clustering of SP 500 Stock Data

Observe Stock Movements every day.
Clustering points Stock-UP/DOWN
Similarity Measure Two points are more similar
if the events described by them frequently happen
together on the same day.
We used association rules to quantify a
similarity measure.

12
Association Rule Discovery Definition

Given a set of records each of which contain some
number of items from a given collection
Produce dependency rules which will predict
occurrence of an item based on occurrences of
other items.

Rules Discovered Milk --gt Coke
Diaper, Milk --gt Beer
13
Sequential Pattern Discovery Definition

Given is a set of objects, with each object
associated with its own timeline of events, find
rules that predict strong sequential dependencies
among different events.
Rules are formed by first discovering patterns.
Event occurrences in the patterns are governed by
timing constraints.

Write a Comment

User Comments (0)