CS 590M Fall 2001: Security Issues in Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

CS 590M Fall 2001: Security Issues in Data Mining

Description:

Auto. Hardware. I. T. Types of associations. Machine-learning base: classification ... Known items to protect. What if we don't know what we want to protect? ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 15
Provided by: clif8
Category:

less

Transcript and Presenter's Notes

Title: CS 590M Fall 2001: Security Issues in Data Mining


1
CS 590M Fall 2001 Security Issues in Data Mining
  • Lecture 5 Association Rules,
  • Sequential Associations

2
Why Association Rules?
  • Understand attributes, not entities
  • Discover relationships that
  • Show some dependency between attributes
  • Are interesting
  • Give an understanding of the data space

3
Formal Definition
  • Data
  • Itemsi1,,in
  • Transactions Tt1,,tm where ti ij1, , ijk
  • Support Given A?I, supp(A) t ? T t ? A
    / T
  • Goal Find rules A?B with support s and
    confidence c where
  • A, B ? I, A ? B ?
  • s supp(A ? B), c supp(A ? B) / supp(A)

4
Sample Market Basket
5
Types of associations
  • Machine-learning base classification / decision
    rules
  • Entities independent, unordered
  • Find rules leading to target class
  • To get rule sets, re-run for all classes as
    targets
  • Market-basket
  • Collection of related entities with same key
  • Can be modeled as independent entities, sparse
    data
  • Sequential
  • Like market basket, but group by distance rather
    than same key

6
Historical Association Rule Learning
  • Decision tree converted to rules
  • ID3, as discussed in previous lecture
  • Direct production of decision rules
  • CN2, others
  • Problem Algorithms dont scale well to many
    practical problems

7
Database community contribution Market Basket
Association Rules
  • Rakesh Agrawal, Tomasz Imielinski, and Arun
    Swami. Mining association rules between sets of
    items in large databases. In Proc. of the ACM
    SIGMOD Conference on Management of Data, pages
    207--216, Washington, D.C., May 1993.
  • Rakesh Agrawal, Ramakrishnan Srikant Fast
    Algorithms for Mining Association Rules in Large
    Databases. VLDB 1994 487-499

8
Database community contribution Market Basket
Association Rules
  • Practical problems often have sparse data
  • Many attributes, few items per transaction
  • Goal is typically search for high support
  • High support broad impact
  • High confidence not crucial (as opposed to
    classification)
  • Very Large data sets(main-memory algorithms
    impractical)

9
A-Priori Algorithm
  • Observation if A has support s, then
  • ?i ? A, supp(i) s
  • Gives bottom-up algorithm
  • Find single items with support s
  • Just look at transaction subsets with those items
    for pairs
  • Recurse

10
A-Priori Algorithm
  • First, generate all large itemsets
  • Sets X ? I such that supp(X) s (threshold)
  • Captures supp(A ? B) s part of problem
  • Second, find high-confidence rules that are
    subsets of X
  • B Xi , A X-B
  • To find confidence, need supp(A)But A will be in
    all large itemsets dont need to go back to the
    database!

11
A-Priori Algorithm
  • L1 large 1-itemsets
  • for ( k 2 Lk-1 ? ? k )
  • Ck select p.i1, p.iY, , p.ik-1, q.ik-1 from
    Lk-1 p, Lk-1 q where p.i1 q.i1, , p.ik-2
    q.ik-2
  • ? transactions t ? T
  • Ct subset(Ck, t) // Candidates contained in
    t
  • ? candidates c ? Ct c.count
  • Lk c ? Ck c.count ? minsup
  • Answer ?k Lk

12
Frequent episodes for sequential associations
  • Heikki Mannila, Hannu Toivonen, and A. Inkeri
    Verkamo Discovering Frequent Episodes in
    Sequences. In First International Conference on
    Knowledge Discovery and Data Mining (KDD'95), 210
    - 215, Montreal, Canada, August 1995. AAAI Press.
  • Instead of transaction, items grouped by sliding
    window in time
  • Same basic idea as A-Priori

13
Frequent Episodes Definition
  • Event types E
  • Event (A,t) where A in E
  • Sequence S((A1,t1),,(An,tn))
  • Frequent episode F (Ai, , Aj) where
  • ? tl, tm such that t1?tlltlttm ? tn tm-tl ?
    window
  • count( ((Ai,tl), , (Aj, tm)) ) ? support

14
Applications/Issues in Security
  • Frequent episodes in intrusion detection data
  • What does this tell us?
  • Preventing the discovery of associations
  • Known items to protect
  • What if we dont know what we want to protect?
Write a Comment
User Comments (0)
About PowerShow.com