Title: CSC5120 Tutorial 1
1COMP537
Knowledge Discovery in Databases Overview
Prepared by Raymond Wong Presented by Raymond
Wong raywong_at_cse
2Course Details
- Reference books/materials
- Papers
- Data Mining Concepts and Techniques. Jiawei Han
and Micheline Kamber. Morgan Kaufmann Publishers
(2nd edition) - Introduction to Data Mining. Pang-Ning Tan,
Michael Steinbach, Vipin Kumar Boston Pearson
Addison Wesley (2006)
3Area
- DB or AI
- This course can count towards one of the areas
ONLY and cannot be double counted towards the
required credits
4Course Details
- Grading Scheme
- Assignment 30
- Project 30
- Final Exam 40
5Assignment
- If the students can answer the selected questions
in class correctly, - for each corrected answer, I will give him/her a
coupon - This coupon can be used to waive one question in
an assignment - which means that s/he can get full marks for this
question without answering this question
6Assignment
- Guideline
- For each assignment, each student can waive at
most one question only. - s/he can waive any question he wants and obtain
full marks for this question (no matter whether
s/he answer this question or not) - s/he may also answer this question. But, we will
also mark it but will give full marks to this
question. - When the student submits the assignment,
- please staple the coupon to the submitted
assignment - please write down the question no. s/he wants to
waive on the coupon
7Project
- Each project is completed by a group.
- The number of students in a group depends on the
class size. - The duration of each presentation depends on the
class size. - It will be announced soon.
8Project
- Project Type (One of the following)
- Survey
- Implementation-oriented Project
- Research-oriented Project
Your group only needs to read about 23 papers
Your group only needs to read about 12 papers
You can read some papers and conduct research
9Project
- Project Type (One of the following)
- Survey
- Implementation-oriented Project
- Research-oriented Project
- Proposal
- Presentation
- Final report
- Proposal
- Presentation
- Final report
- Coding
- Proposal
- Presentation
- Final report (containing your proposed
methodology) - Coding (if any)
10Project
- Project Topic
- Some pre-selected topics/papers
- Your own choice
- For fairness, please do not choose the topic
which is closely related to your own research
11Exam
- You are allowed to bring a calculator with you.
- Please remember to prepare a calculator for the
exam
12Major Topics
- Association
- Clustering
- Classification
- Data Warehouse
- Data Mining over Data Streams
- Web Databases
- Multi-criteria Decision Making
131. Association
We are interested in the items/itemsets with
frequency gt 2
Frequent Pattern (or Frequent Item)
Frequent Pattern (or Frequent Item)
Frequent Pattern (or Frequent Itemset)
141. Association
We are interested in the items/itemsets with
frequency gt 2
Association Rule1. Apple ? Orange(
customers who buy apple will probably buy
orange.) 2. Orange ? Apple(
customer who buy orange will probably buy apple.)
100
2
3
67
2
2
Problem to find all frequent patterns and
association rules
15Major Topics
- Association
- Clustering
- Classification
- Data Warehouse
- Data Mining over Data Streams
- Web Databases
- Multi-criteria Decision Making
162. Clustering
Cluster 2(e.g. High Score in Historyand Low
Score in Computer)
Cluster 1(e.g. High Score in Computer and Low
Score in History)
Problem to find all clusters
17Major Topics
- Association
- Clustering
- Classification
- Data Warehouse
- Data Mining over Data Streams
- Web Databases
- Multi-criteria Decision Making
183. Classification
Suppose there is a person.
Decision tree
19Major Topics
- Association
- Clustering
- Classification
- Data Warehouse
- Data Mining over Data Streams
- Web Databases
- Multi-criteria Decision Making
204. Warehouse
Query
Users
Databases
Need to wait for a long time (e.g., 1 day to 1
week)
Data Warehouse
Users
Databases
Pre-computed results
21Major Topics
- Association
- Clustering
- Classification
- Data Warehouse
- Data Mining over Data Streams
- Web Databases
- Multi-criteria Decision Making
225. Data Mining over Static Data
- Association
- Clustering
- Classification
Output (Data Mining Results)
Static Data
235. Data Mining over Data Streams
- Association
- Clustering
- Classification
Output (Data Mining Results)
Unbounded Data
Real-time Processing
24Major Topics
- Association
- Clustering
- Classification
- Data Warehouse
- Data Mining over Data Streams
- Web Databases
- Multi-criteria Decision Making
256. Web Databases
Raymond Wong
26How to rank the webpages?
27Major Topics
- Association
- Clustering
- Classification
- Data Warehouse
- Data Mining over Data Streams
- Web Databases
- Multi-criteria Decision Making
287. Multi-criteria Decision Making
Suppose we want to look for a hotel which is
close to a beach.
3 hotels
Suppose we compare hotel a and hotel b
- We know that hotel a is better
- than hotel b
- because
- Price of hotel a is smaller
- Distance of hotel a is smaller
We have two attributes. Which hotel should we
select?
297. Multi-criteria Decision Making
Suppose we want to look for a hotel which is
close to a beach.
3 hotels
Suppose we compare hotel a and hotel c
- We cannot determine hotel a is better
- than hotel c (wrt two attributes).
- We cannot determine hotel c is better
- than hotel a (wrt two attributes)..
- This is because
- Price of hotel a is smaller
- Distance of hotel c is smaller
We have two attributes. Which hotel should we
select?