Title: Data Mining in Database
1Data Mining in Database
- Hongzhi Li
- Department of Computing Information Science
- Queens University
- Kingston, Ontario
- hongzhi_at_cs.queensu.ca
- www.ace.uwaterloo.ca/liho/project/832
2Introduction
- Why data mining in databases?
- Whats the requirements for data mining?
- What kind of knowledge can be mined?
3Why Data Mining?
- Knowledge Discovery (KDD)
- --Overall process of discovering useful
knowledge
- Data Mining (Computer-driven exploration)
- -- Query formulation problem.
- -- Visualize and understand of a large data
set. - -- Data growth rate too high to be handled
manually.
- Data Warehouses (Human-driven exploration)
- -- Querying summaries of transactions, etc.
Decision support
- Traditional Database (Transactions)
- -- Querying data in well-defined processes.
Reliable storage
4Requirements of Data Mining
- Handling of different type of data
- Efficiency and scalability of algorithm
- Usefulness, certainty and expressiveness of
result - Expression of various kinds of mining results
- Interactive mining knowledge at multiple levels
- Mining information from different sources of data
- Protection of privacy and data security
5Data Mining Techniques
- What kind of databases to work on?
- relational data object-oriented
Internet information miner - What kind of techniques to be utilized?
- data-driven, query-driven miner
- generalization-based, pattern-based mining
. - What kind of knowledge to be mined
- association rules, characteristic rules,
classification rules, clustering - discriminant rules
6Mining different kinds of knowledge
- Association rules
- Data generalization and summarization
- Data classification
- Data clustering
- Pattern-based similarity search
- Path traversal patterns
7Mining Association Rules
- A customer buys (one brand of) milk, s/he usually
buys (another brand of) bread
- Discover strong association rules only!
- Interestingness of discovered association rules
- Usually mining generalized and multiple-level
association rules - Algorithm Efficiency to count the large
itemsets - -- Apriori and DHP
8Multi-level Data Generalization, Summarization,
and Characterization
- How many customers buy milk in Kingston area?
(2, 1, , BasicFood at Bath, at Montreal ) - (Data warehousing)
- Two approaches
- -- data cube approach
- -- attribute-oriented induction approach
9Data Classification
- A customer averagely spends 7 hrs to look for
clothes and buy 2 pieces in each week - -- Whats the gender of this customer?
- Classification Methods
- -- Decision Tree
- -- Nearest Neighbor
10Clustering Analysis
- A group of customers
- Some like to buy lots of beers
- Some like to buy lots of cigarettes
- They can be grouped as hard-working
people - (or
Mental-Homeless??)
- Distance-based clustering
- Statistical-based clustering
11Pattern-based Similarity Search
- Search the database for a customer
- Spend about 7 hrs shopping each week and buy
about 100 grocery in Kingston area
-- Object-relative similarity query All-pair
similarity query -- Whole matching / subsequence
matching -- Similarity measure Comparison in
spatial / frequency domain Subsequence of
arbitrary length, scaling and translation
12Mining Path Traversal Patterns
- Customer surfing your companys Web page, whats
their preferable routing?
-- For a distributed information environment
WWW and on-line services
13Summary
- Methods
- -- Diversity / Rich Functionalities
- Applications
- -- Quest from IBM
- Challenges
- -- data mining in advanced DBS
- -- Mining multiple kinds of knowledge at
multiple - abstraction levels
Take a minute to enjoy my Knowledge Discovery and
have fun ?
14Knowledge Discovery (Case to learn from)
It was so cold, the bird froze and fell to the
ground in a large field
15Knowledge I Discovered.
- Not everyone who drops shit on you is your enemy
2. Not everyone who gets you out of shit is your
friend
3. And when you are in deep shit, keep your mouth
shut!
So, I shut up and hear you singing ?