CSC5120 Tutorial 1 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

CSC5120 Tutorial 1

Description:

Pang-Ning Tan, Michael Steinbach, Vipin Kumar Boston : Pearson Addison Wesley (2006) ... Hotel. 3 hotels. Suppose we want to look for a hotel which is close to ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 30
Provided by: raymon45
Category:

less

Transcript and Presenter's Notes

Title: CSC5120 Tutorial 1


1
COMP537
Knowledge Discovery in Databases Overview
Prepared by Raymond Wong Presented by Raymond
Wong raywong_at_cse
2
Course Details
  • Reference books/materials
  • Papers
  • Data Mining Concepts and Techniques. Jiawei Han
    and Micheline Kamber. Morgan Kaufmann Publishers
    (2nd edition)
  • Introduction to Data Mining. Pang-Ning Tan,
    Michael Steinbach, Vipin Kumar Boston Pearson
    Addison Wesley (2006)

3
Area
  • DB or AI
  • This course can count towards one of the areas
    ONLY and cannot be double counted towards the
    required credits

4
Course Details
  • Grading Scheme
  • Assignment 30
  • Project 30
  • Final Exam 40

5
Assignment
  • If the students can answer the selected questions
    in class correctly,
  • for each corrected answer, I will give him/her a
    coupon
  • This coupon can be used to waive one question in
    an assignment
  • which means that s/he can get full marks for this
    question without answering this question

6
Assignment
  • Guideline
  • For each assignment, each student can waive at
    most one question only.
  • s/he can waive any question he wants and obtain
    full marks for this question (no matter whether
    s/he answer this question or not)
  • s/he may also answer this question. But, we will
    also mark it but will give full marks to this
    question.
  • When the student submits the assignment,
  • please staple the coupon to the submitted
    assignment
  • please write down the question no. s/he wants to
    waive on the coupon

7
Project
  • Each project is completed by a group.
  • The number of students in a group depends on the
    class size.
  • The duration of each presentation depends on the
    class size.
  • It will be announced soon.

8
Project
  • Project Type (One of the following)
  • Survey
  • Implementation-oriented Project
  • Research-oriented Project

Your group only needs to read about 23 papers
Your group only needs to read about 12 papers
You can read some papers and conduct research
9
Project
  • Project Type (One of the following)
  • Survey
  • Implementation-oriented Project
  • Research-oriented Project
  • Proposal
  • Presentation
  • Final report
  • Proposal
  • Presentation
  • Final report
  • Coding
  • Proposal
  • Presentation
  • Final report (containing your proposed
    methodology)
  • Coding (if any)

10
Project
  • Project Topic
  • Some pre-selected topics/papers
  • Your own choice
  • For fairness, please do not choose the topic
    which is closely related to your own research

11
Exam
  • You are allowed to bring a calculator with you.
  • Please remember to prepare a calculator for the
    exam

12
Major Topics
  • Association
  • Clustering
  • Classification
  • Data Warehouse
  • Data Mining over Data Streams
  • Web Databases
  • Multi-criteria Decision Making

13
1. Association
We are interested in the items/itemsets with
frequency gt 2
Frequent Pattern (or Frequent Item)
Frequent Pattern (or Frequent Item)
Frequent Pattern (or Frequent Itemset)
14
1. Association
We are interested in the items/itemsets with
frequency gt 2
Association Rule1. Apple ? Orange(
customers who buy apple will probably buy
orange.) 2. Orange ? Apple(
customer who buy orange will probably buy apple.)
100
2
3
67
2
2
Problem to find all frequent patterns and
association rules
15
Major Topics
  • Association
  • Clustering
  • Classification
  • Data Warehouse
  • Data Mining over Data Streams
  • Web Databases
  • Multi-criteria Decision Making

16
2. Clustering
Cluster 2(e.g. High Score in Historyand Low
Score in Computer)
Cluster 1(e.g. High Score in Computer and Low
Score in History)
Problem to find all clusters
17
Major Topics
  • Association
  • Clustering
  • Classification
  • Data Warehouse
  • Data Mining over Data Streams
  • Web Databases
  • Multi-criteria Decision Making

18
3. Classification
Suppose there is a person.
Decision tree
19
Major Topics
  • Association
  • Clustering
  • Classification
  • Data Warehouse
  • Data Mining over Data Streams
  • Web Databases
  • Multi-criteria Decision Making

20
4. Warehouse
Query
Users
Databases
Need to wait for a long time (e.g., 1 day to 1
week)
Data Warehouse
Users
Databases
Pre-computed results
21
Major Topics
  • Association
  • Clustering
  • Classification
  • Data Warehouse
  • Data Mining over Data Streams
  • Web Databases
  • Multi-criteria Decision Making

22
5. Data Mining over Static Data
  • Association
  • Clustering
  • Classification

Output (Data Mining Results)
Static Data
23
5. Data Mining over Data Streams
  • Association
  • Clustering
  • Classification

Output (Data Mining Results)

Unbounded Data
Real-time Processing
24
Major Topics
  • Association
  • Clustering
  • Classification
  • Data Warehouse
  • Data Mining over Data Streams
  • Web Databases
  • Multi-criteria Decision Making

25
6. Web Databases
Raymond Wong
26
How to rank the webpages?
27
Major Topics
  • Association
  • Clustering
  • Classification
  • Data Warehouse
  • Data Mining over Data Streams
  • Web Databases
  • Multi-criteria Decision Making

28
7. Multi-criteria Decision Making
Suppose we want to look for a hotel which is
close to a beach.
3 hotels
Suppose we compare hotel a and hotel b
  • We know that hotel a is better
  • than hotel b
  • because
  • Price of hotel a is smaller
  • Distance of hotel a is smaller

We have two attributes. Which hotel should we
select?
29
7. Multi-criteria Decision Making
Suppose we want to look for a hotel which is
close to a beach.
3 hotels
Suppose we compare hotel a and hotel c
  • We cannot determine hotel a is better
  • than hotel c (wrt two attributes).
  • We cannot determine hotel c is better
  • than hotel a (wrt two attributes)..
  • This is because
  • Price of hotel a is smaller
  • Distance of hotel c is smaller

We have two attributes. Which hotel should we
select?
Write a Comment
User Comments (0)
About PowerShow.com