Data Intelligence and Mining - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Data Intelligence and Mining

Description:

Make sure you have a specific time every cycle that you can meet in. Quiz 1 on Thursday ... Splenda vs. Sweet N' Low. WebPage X -- WebPage Y. Web Structure Mining ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 11
Provided by: jiaw192
Category:

less

Transcript and Presenter's Notes

Title: Data Intelligence and Mining


1
Data Intelligence and Mining
  • CSCI 317A

2
Announcements
  • Teams
  • Topics
  • Make sure you have a specific time every cycle
    that you can meet in
  • Quiz 1 on Thursday
  • First 20 minutes or so
  • HW 1 on Monday (Individual)
  • Clearly typed and organized report
  • Summer Research Opportunity
  • http//www.bsi.umn.edu/
  • June 4 through August 10, 2007
  • Stipend of 5,500
  • junior or a senior undergraduate student
  • Deadline February 28 (2 recommendation letters)

3
Recap
  • What is classification?
  • Purpose?
  • How is it done?
  • Application
  • What is outlier/deviation/anomaly detection?
  • Application

4
Association Rule Mining
  • Given a set of records each containing a number
    of items
  • Different type of data
  • Produce dependency rules which will predict
    occurrence of an item based on occurrences of
    other items
  • MBR
  • Supermarket shelf management or Marketing and
    Sales Promotions
  • Wal-Mart!
  • Efficiency
  • 50 items
  • (250 1 Peta or a million billion)

Rules Discovered Milk --gt Coke
Diaper, Milk --gt Beer
5
Association Rule Discovery Application 1
  • Marketing and Sales Promotion
  • Let the rule discovered be
  • Dip --gt Tortilla Chips
  • Chips as consequent gt Can be used to determine
    what should be done to boost its sales
  • Sales on Dip!
  • Could be brand-specific
  • Dip in the antecedent gt Can be used to check
    which products would be affected if the store
    discontinues selling a certain Dip brand

6
Association Rule Discovery Application 2
  • Supermarket shelf management
  • Goal To identify items that are bought together
    by sufficiently many customers.
  • Approach Process the point-of-sale data
    collected with barcode scanners to find
    dependencies among items
  • Dip --gt Tortilla Chips
  • Expensive brands
  • Splenda vs. Sweet N Low
  • WebPage X --gt WebPage Y
  • Web Structure Mining

7
Association Rule Discovery
  • Among Genes G1 ? G2
  • G1s expression depend upon G2s
  • Might be involved in the same biological systems
    or pathways
  • Finer clusters with directionality

8
Conferences and Journals on Data Mining
  • KDD Conferences
  • ACM SIGKDD Int. Conf. on Knowledge Discovery in
    Databases and Data Mining (KDD)
  • SIAM Data Mining Conf. (SDM)
  • (IEEE) Int. Conf. on Data Mining (ICDM)
  • Conf. on Principles and practices of Knowledge
    Discovery and Data Mining (PKDD)
  • Pacific-Asia Conf. on Knowledge Discovery and
    Data Mining (PAKDD)
  • ACM Symposium on Applied Computing (SAC) Bio
    and DM tacks
  • Other related conferences
  • ACM SIGMOD
  • VLDB
  • (IEEE) ICDE
  • WWW, SIGIR
  • ICML, CVPR, NIPS
  • Journals
  • Data Mining and Knowledge Discovery (DAMI or
    DMKD)
  • IEEE Trans. On Knowledge and Data Eng. (TKDE)
  • KDD Explorations
  • ACM Trans. on KDD

9
Recommended Reference Books
  • S. Chakrabarti. Mining the Web Statistical
    Analysis of Hypertex and Semi-Structured Data.
    Morgan Kaufmann, 2002
  • R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
    Classification, 2ed., Wiley-Interscience, 2000
  • T. Dasu and T. Johnson. Exploratory Data Mining
    and Data Cleaning. John Wiley Sons, 2003
  • U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and
    R. Uthurusamy. Advances in Knowledge Discovery
    and Data Mining. AAAI/MIT Press, 1996
  • U. Fayyad, G. Grinstein, and A. Wierse,
    Information Visualization in Data Mining and
    Knowledge Discovery, Morgan Kaufmann, 2001
  • J. Han and M. Kamber. Data Mining Concepts and
    Techniques. Morgan Kaufmann, 2nd ed., 2006
  • D. J. Hand, H. Mannila, and P. Smyth, Principles
    of Data Mining, MIT Press, 2001
  • T. Hastie, R. Tibshirani, and J. Friedman, The
    Elements of Statistical Learning Data Mining,
    Inference, and Prediction, Springer-Verlag, 2001
  • T. M. Mitchell, Machine Learning, McGraw Hill,
    1997
  • G. Piatetsky-Shapiro and W. J. Frawley. Knowledge
    Discovery in Databases. AAAI/MIT Press, 1991
  • P.-N. Tan, M. Steinbach and V. Kumar,
    Introduction to Data Mining, Wiley, 2005
  • S. M. Weiss and N. Indurkhya, Predictive Data
    Mining, Morgan Kaufmann, 1998
  • I. H. Witten and E. Frank, Data Mining
    Practical Machine Learning Tools and Techniques
    with Java Implementations, Morgan Kaufmann, 2nd
    ed. 2005

10
Additional References
  • Barnett, V. and T. Lewis (1994) Outliers in
    Statistical Data. New York Wiley
  • Afifi, A.A., and Azen, S.P. (1972), Statistical
    analysis a computer oriented sapproach, Academic
    Press, New York.
  • Huber, P.J. (1985), Projection pursuit, The
    Annals of Statistics, 13(2), 435-475.
  • David J. Marchette and Jeffrey L. Solka Using
    data images for outlier detection  Computational
    Statistics Data Analysis, Volume 43, Issue 4,
    28 August 2003, Pages 541-552
  • Joliffe, I.T. (1986) Principal Component
    Analysis, Springer-Verlag, New York.
  • Robust Regression and Outlier Detection (Wiley
    Series in Probability and Statistics) by Peter J.
    Rousseeuw, Annick M. Leroy , Wiley-Interscience
    (September 19, 2003)
  • J. Han and M. Kamber. Data Mining Concepts and
    Techniques. Morgan Kaufmann, 2nd ed., 2006
  • P. Tan, M. Steinbach and V. Kumar, Introduction
    to Data Mining, Wiley, 2005
Write a Comment
User Comments (0)
About PowerShow.com