Title: AI Week 15 Machine Learning: Data Mining : Association Rule Mining, Associative Classification, Applications
1AI Week 15Machine LearningData Mining
Association Rule Mining, Associative
Classification,Applications
- Lee McCluskey, room 3/10
- Email lee_at_hud.ac.uk
- http//scom.hud.ac.uk/scomtlm/cha2555/
2Last Week
Data Mining -- as inducing rule classifiers from
classified training examples.
3Association Rule Mining(ARM)
- This is an unsupervised learning activity -
briefly, looking for strong associations between
features in data. - Definitions A transactional database is a set of
transactions eg the details of individual
sales. - A transaction can be though of as an item-set
where each item is an attribute-value - height6, temp 20. weather warm
- As a special case we could have nominal item sets
- bread, cheese, milk
4 Association Rule Mining(ARM) Important
Definitions
- An association rule is an expression
- X gt Y
- where X, Y are item-sets, and
- The support of an association rule is defined as
the proportion of transactions in the database
that contain - X U Y.
- The confidence of an association rule is defined
as the probability that a transaction contains Y
given that it contains X, that is - no of transactions containing (X U Y) / no of
transactions containing X
5Aims of ARM
- Given a transactional database D, the association
rule problem is to find all rules that have
supports and confidences greater than certain
user-specified thresholds, denoted by minimum
support (MinSupp) and minimum confidence
(MinConf), respectively. - The aim is the discovery of the most significant
associations between the items in a transactional
data set. This process involves primarily the
discovery of so called frequent item-sets, i.e.
item-sets that occurred in the transactional data
set above MinSupp and MinConf.
6Example
- A trader deals in the following currencies in a
series of 8 transactions - 1 Sterling Yen Dollar Euro
- 2 Dollar Euro Rand Sterling Ruble
- 3 Pesos Euro Ruble Rupee Yen
- 4 Rupee Sterling Ruble Euro Dollar
- 5 Sterling Dinars Rand Yen
- 6 Pesos Kroner Sterling Dollar
- 7 Ruble Rupee Kroner Sterling Pesos
- 8 Dollar Euro Sterling
- What is the SUPPORT and CONFIDENCE of the
following rules? - Ruble ? Rupee
- Sterling, Euro ? Ruble
- Sterling, Euro ? Ruble, Pesos
- Find an association rule from the set of
transactions that has - - at least 2 items in its antecedents,
- - better support and better confidence than both
rules above.
7Example
Sterling Yen Dollar Euro
Sterling Yen Dollar Euro
Sterling Yen Dollar Euro
Pesos Kroner Sterling Dollar
Sterling Dinars Rand Yen
Dollar Euro Sterling
X
Pesos Euro Ruble Rupee Yen
Ruble Rupee Kroner Sterling Pesos
Dollar Euro Rand Sterling Ruble
X u Y
R
X gt Y Ruble gt Rupee
Rupee Sterling Ruble Euro Dollar
8Associative Classification
- If we fuse ARM and classification rule mining we
get Associative Classification use the
association technique, but learning about
particular items or item sets. - Associative Classification is a branch in data
mining that combines classification and
association rule mining. In other words, it
utlises association rule discovery methods in
classification data sets. - Typically
- Find Association Rules using ARM
- Sift out the Class Association Rules ones
that have the class of interest on their Right
Hand Sides
9Validation in Rule Discovery
- Multi-stage Data Mining pipelines are fraught
with various kinds of errors / bias - the integrity of the data at each stage of the DM
process and the reliability of the results are
particularly important. - DM usually uses cross validation, where the
data is split into a training set and a testing
set, and the results of the data miner applied to
the training set is compared to the training set.
Not really applicable to rule discovery. - Key idea Look for trends/associations in the
data that are output from the process and that
represent known associations in the application
domain.
10DM Application 1 Discovering trends from patient
data in the area of Diabetic Retinopathy
- Diabetic Retinopathy Basically damage to the
eyes caused by Diabetes, sometimes leading to
blindness - HUGE problem as diabetes on the increase. If you
are a long term diabetic then your are very
likely suffer some retina damage - Clinics keep large amounts of data on patients
who are treated in various ways, over long
periods of time.
11Diabetic Retinopathy Application
- Data of 20,000 patients over 18 years
- Much data cleaning and inference precedes mining
replacing missing values, noise, anomalies etc - Focus in one a smaller number of patients with a
yearly screening (- timestamp) over a period of
4 years - Attribute Examples (there are several hundred)
- Age_at_Exam ,
- Present_Treatment,
- calculated_age_at_diagnosis,
- Retinopathy_in_R_Eye (RE_RET),
- Retinopathy_in_L_Eye (RE_RET),
- calculated_diabetes_type,
- calculated_diabetes_duration
12Trend Mining
- Item-sets that have an increasing support over a
series of time-stamped instances (events) are
called emerging patterns - The changing support for sets of items during
each event can indicate trends in the data. For
example, the presence of a particular treatment
over a period of time may lead to the alleviation
of a symptom.
13Diabetic Retinopathy Application
- Aim - to find trends in the data e.g. (ficticous
example) - calculated_diabetes_duration gt Y
- Age_at_Exam in 60,70
- Present_Treatment drugX
- calculated_age_at_diagnosis in 50,60 gt
- Retinopathy_in_R_Eye (RE_RET) low
- Retinopathy_in_L_Eye (RE_RET) low
- Increasing trend ..
- people who have had diabetes for a certain
length of time, whose age is in there 60s, who
were diagnosed in their 50s, who have been
taking treatmentX, often have low DR levels - Increasing trend adds support for the
association.
14DM Application 1 Road Traffic Control
15Example in Road Traffic Control
16Example in Road Traffic Control
- Data ..
- Numeric Data Record from individual CARS
- (date, time, position, actual speed, expected
speed) - Textual Data of INCIDENTS
- (date, time start, time cleared, position,
severity, road type, area, incident category,
cause, road-effect, traffic-effect, reporter ..) - Data Sources ..
- ANPR, Mobile Phones, Road (Vehicle) Sensors,
Environment Sensors
17Applications in Road Traffic Control
- associations between variations in speeds with
near-future incidents - effect of a particular type of incident (eg
roadworks) on average speeds on nearby trunk
roads - looking for predictors in "heavy/slow traffic"
incidents look for associations with speed
variations or accidents on roads downstream from
the incident position (hence causing the
incident) - looking for associations between speeds around a
bypass and a later "heavy traffic" incident
within the town bypassed
18Conclusions
- Data Mining is a powerful set of techniques to
help discover hidden knowledge - It can be supervised or unsupervised.
- Association Rule Mining
- Associative Classification
- are important classes of technique used in DM