AI Week 15 Machine Learning: Data Mining : Association Rule Mining, Associative Classification, Applications - PowerPoint PPT Presentation

About This Presentation
Title:

AI Week 15 Machine Learning: Data Mining : Association Rule Mining, Associative Classification, Applications

Description:

AI Week 15 Machine Learning: Data Mining : Association Rule Mining, Associative Classification, Applications Lee McCluskey, room 3/10 Email lee_at_hud.ac.uk – PowerPoint PPT presentation

Number of Views:275
Avg rating:3.0/5.0
Slides: 19
Provided by: LeeMcC6
Category:

less

Transcript and Presenter's Notes

Title: AI Week 15 Machine Learning: Data Mining : Association Rule Mining, Associative Classification, Applications


1
AI Week 15Machine LearningData Mining
Association Rule Mining, Associative
Classification,Applications
  • Lee McCluskey, room 3/10
  • Email lee_at_hud.ac.uk
  • http//scom.hud.ac.uk/scomtlm/cha2555/

2
Last Week
Data Mining -- as inducing rule classifiers from
classified training examples.
3
Association Rule Mining(ARM)
  • This is an unsupervised learning activity -
    briefly, looking for strong associations between
    features in data.
  • Definitions A transactional database is a set of
    transactions eg the details of individual
    sales.
  • A transaction can be though of as an item-set
    where each item is an attribute-value
  • height6, temp 20. weather warm
  • As a special case we could have nominal item sets
  • bread, cheese, milk

4
Association Rule Mining(ARM) Important
Definitions
  • An association rule is an expression
  • X gt Y
  • where X, Y are item-sets, and
  • The support of an association rule is defined as
    the proportion of transactions in the database
    that contain
  • X U Y.
  • The confidence of an association rule is defined
    as the probability that a transaction contains Y
    given that it contains X, that is
  • no of transactions containing (X U Y) / no of
    transactions containing X

5
Aims of ARM
  • Given a transactional database D, the association
    rule problem is to find all rules that have
    supports and confidences greater than certain
    user-specified thresholds, denoted by minimum
    support (MinSupp) and minimum confidence
    (MinConf), respectively.
  • The aim is the discovery of the most significant
    associations between the items in a transactional
    data set. This process involves primarily the
    discovery of so called frequent item-sets, i.e.
    item-sets that occurred in the transactional data
    set above MinSupp and MinConf.

6
Example
  • A trader deals in the following currencies in a
    series of 8 transactions
  • 1 Sterling Yen Dollar Euro
  • 2 Dollar Euro Rand Sterling Ruble
  • 3 Pesos Euro Ruble Rupee Yen
  • 4 Rupee Sterling Ruble Euro Dollar
  • 5 Sterling Dinars Rand Yen
  • 6 Pesos Kroner Sterling Dollar
  • 7 Ruble Rupee Kroner Sterling Pesos
  • 8 Dollar Euro Sterling
  • What is the SUPPORT and CONFIDENCE of the
    following rules?
  • Ruble ? Rupee
  • Sterling, Euro ? Ruble
  • Sterling, Euro ? Ruble, Pesos
  • Find an association rule from the set of
    transactions that has
  • - at least 2 items in its antecedents,
  • - better support and better confidence than both
    rules above.

7
Example
Sterling Yen Dollar Euro
Sterling Yen Dollar Euro
Sterling Yen Dollar Euro
Pesos Kroner Sterling Dollar
Sterling Dinars Rand Yen
Dollar Euro Sterling
X
Pesos Euro Ruble Rupee Yen
Ruble Rupee Kroner Sterling Pesos
Dollar Euro Rand Sterling Ruble
X u Y
R
X gt Y Ruble gt Rupee
Rupee Sterling Ruble Euro Dollar
8
Associative Classification
  • If we fuse ARM and classification rule mining we
    get Associative Classification use the
    association technique, but learning about
    particular items or item sets.
  • Associative Classification is a branch in data
    mining that combines classification and
    association rule mining. In other words, it
    utlises association rule discovery methods in
    classification data sets.
  • Typically
  • Find Association Rules using ARM
  • Sift out the Class Association Rules ones
    that have the class of interest on their Right
    Hand Sides

9
Validation in Rule Discovery
  • Multi-stage Data Mining pipelines are fraught
    with various kinds of errors / bias
  • the integrity of the data at each stage of the DM
    process and the reliability of the results are
    particularly important.
  • DM usually uses cross validation, where the
    data is split into a training set and a testing
    set, and the results of the data miner applied to
    the training set is compared to the training set.
    Not really applicable to rule discovery.
  • Key idea Look for trends/associations in the
    data that are output from the process and that
    represent known associations in the application
    domain.

10
DM Application 1 Discovering trends from patient
data in the area of Diabetic Retinopathy
  • Diabetic Retinopathy Basically damage to the
    eyes caused by Diabetes, sometimes leading to
    blindness
  • HUGE problem as diabetes on the increase. If you
    are a long term diabetic then your are very
    likely suffer some retina damage
  • Clinics keep large amounts of data on patients
    who are treated in various ways, over long
    periods of time.

11
Diabetic Retinopathy Application
  • Data of 20,000 patients over 18 years
  • Much data cleaning and inference precedes mining
    replacing missing values, noise, anomalies etc
  • Focus in one a smaller number of patients with a
    yearly screening (- timestamp) over a period of
    4 years
  • Attribute Examples (there are several hundred)
  • Age_at_Exam ,
  • Present_Treatment,
  • calculated_age_at_diagnosis,
  • Retinopathy_in_R_Eye (RE_RET),
  • Retinopathy_in_L_Eye (RE_RET),
  • calculated_diabetes_type,
  • calculated_diabetes_duration

12
Trend Mining
  • Item-sets that have an increasing support over a
    series of time-stamped instances (events) are
    called emerging patterns
  • The changing support for sets of items during
    each event can indicate trends in the data. For
    example, the presence of a particular treatment
    over a period of time may lead to the alleviation
    of a symptom.

13
Diabetic Retinopathy Application
  • Aim - to find trends in the data e.g. (ficticous
    example)
  • calculated_diabetes_duration gt Y
  • Age_at_Exam in 60,70
  • Present_Treatment drugX
  • calculated_age_at_diagnosis in 50,60 gt
  • Retinopathy_in_R_Eye (RE_RET) low
  • Retinopathy_in_L_Eye (RE_RET) low
  • Increasing trend ..
  • people who have had diabetes for a certain
    length of time, whose age is in there 60s, who
    were diagnosed in their 50s, who have been
    taking treatmentX, often have low DR levels
  • Increasing trend adds support for the
    association.

14
DM Application 1 Road Traffic Control
15
Example in Road Traffic Control
16
Example in Road Traffic Control
  • Data ..
  • Numeric Data Record from individual CARS
  • (date, time, position, actual speed, expected
    speed)
  • Textual Data of INCIDENTS
  • (date, time start, time cleared, position,
    severity, road type, area, incident category,
    cause, road-effect, traffic-effect, reporter ..)
  • Data Sources ..
  • ANPR, Mobile Phones, Road (Vehicle) Sensors,
    Environment Sensors

17
Applications in Road Traffic Control
  • associations between variations in speeds with
    near-future incidents
  • effect of a particular type of incident (eg
    roadworks) on average speeds on nearby trunk
    roads
  • looking for predictors in "heavy/slow traffic"
    incidents look for associations with speed
    variations or accidents on roads downstream from
    the incident position (hence causing the
    incident)
  • looking for associations between speeds around a
    bypass and a later "heavy traffic" incident
    within the town bypassed

18
Conclusions
  • Data Mining is a powerful set of techniques to
    help discover hidden knowledge
  • It can be supervised or unsupervised.
  • Association Rule Mining
  • Associative Classification
  • are important classes of technique used in DM
Write a Comment
User Comments (0)
About PowerShow.com