Title: Data mining with Association rules in law
1- Data mining with Association rules in law
- Andrew Stranieri and John Zeleznikow
- Donald Berman Laboratory for Information
Technology and Law, La Trobe University, - Australia
- http//www.cs.latrobe.edu.au/research/dbc/dbc.html
- stranier_at_cs.latrobe.edu.au
2Overview
- Data mining or Knowledge discovery from databases
has not been appropriately exploited in law to
date. - Association rules are useful in that they
suggest hypotheses for future research - Association rules integrated into the generic
actual argument model can assist in identifying
the most plausible claim from given data items in
a forward inference way or the likelihood of
missing data values in a backward inference way
3- What is data mining ? What is knowledge discovery
from databases KDD?
- Frawley et al 1991 define knowledge discovery
in databases (KDD) as the 'non trivial extraction
of nontrivial of implicit, previously unknown,
and potentially useful information from data - KDD encompasses a number of different technical
approaches, such as clustering, data
summarization, learning classification rules,
finding dependency networks, analyzing changes,
and detecting anomalies - KDD has only recently emerged because we only
recently have been gathering vast quantities of
data - Good introductory text by Thornton (1992),
Weiss, S., and Indurkya, (1998).Weiss, S and
Kulikowski, (1992)
4- Mangasarian et al (1997) Breast Cancer
diagnosis. A sample from breast lump mass is
assessed by - mammagrophy (not sensitive 68-79)
- data mining from FNA test results and visual
inspection (65-98) - surgery (100 but invasive, expensive)
- Basket analysis. People who buy nappies also buy
beer - NBA. National Basketball Association of America.
Player pattern profile. Bhandary et al (1997) - Credit card fraud detection
- Stranieri/Zeleznikow (1997) predict family law
property outcomes - Rissland and Friedman (1997) discovers a change
in the concept of good faith in US Bankruptcy
cases - Pannu (1995) discovers a prototypical case from a
library of cases - Wilkins and Pillaipakkamnatt (1997) predicts the
time a case takes to be heard - Veliev et al (1999) association rules for
economic analaysis
5- Overview of process of knowledge discovery in
databases ?
6- Finding patterns in data or fitting models to
data - Categories of techniques
- Predictive (classification neural networks, rule
induction, linear, multiple regression) - Segmentation (clustering, k-means, k-median)
- Summarisation (associations, visualisation)
- Change detection/modelling
7- Association rules are a data mining technique
- An association rules tell us something about the
association between two attributes - Agrawal et al (1993) developed the first
association rule algorithm, Apriori - A famous (but unsubstantiated AR) from a
hypothetical supermarket transaction database is
if nappies then beer (80) Read this as nappies
are bought implies beer are bought 80 of the
time - Association rules have only recently been
applied to law with promising results - Association rules can automatically discover
rules that may prompt an analyst to think of
hypothesis they would otherwise have considered
8- Confidence and support of an association rule
- 80 is the confidence of the rule if nappies
then beer (80). This is calculated by n2/n1
where - n1 no of records where nappies are bought
- n2 no of records where nappies were bought and
beer was also bought. - if 1000 transactions for nappies, and of those,
800 also had beer then confidence is 80. - A rule may have a high confidence but not be
interesting because it doesnt apply to many
records in the database. i.e. no. of records
where nappies were bought with beer / total
records. - Rules that may be interesting have a confidence
level and support level above a user set
threshold
9- Interesting rules Confidence and support of an
association rule
- if 1000 transactions for nappies, and of those,
800 also had beer then confidence is 80. - A rule may have a high confidence but not be
interesting because it doesnt apply to many
records in the database. i.e. no. of records
where nappies were bought with beer / total
records. - Rules that may be interesting have a confidence
level and support level above a user set
threshold
10Association rule screen shot with A-Miner from
Split Up data set
- In 73.4 of cases where the wife's needs are
some to high then the husband's future needs are
few to some. - Prompts an analyst to posit plausible hypothesis
e.g. it may be the case that the rule reflects
the fact that more women remain custodial parents
of the children following divorce than men do.
The women that have some to high needs may do so
because of their obligation to children.
11Association rules in law
- Association rules generators are typically
packaged with very expensive data mining suites.
We developed A-Miner (available from authors) for
a PC platform. - Typically, too many association rules are
generated for feasible analysis. So, our current
research involves exploring metrics of
interesting to restrict numbers of rules that
might be interesting - In general, structured data is not collected in
law as it is in other domains so very large
databases are rare - Our current research involves 380,000 records
from a Legal Aid organization data base that
contains data on client features. - ArgumentDeveloper shell that can be used by
judges to structure their reasoning in a way that
will facilitate data collection and reasoning
12Generic/actual argument model for sentencing
armed robbery
13Association rules can be used for forward and
backward inferences in the generic/actual
argument model for sentencing armed robbery
14Forward inference confidence
- In the sentence actual argument database the
following outcomes were noted for the inputs
suggested
57 0.1 0 12 2 10 16 0 0 0
15Backward inference constructing the strongest
argument
If all the items you suggest AND
If extremely serious pattern of priors then
imprisonment If very serious pattern of priors
then imprisonment If serious pattern of priors
then imprisonment If not so serious pattern of
priors then imprisonment If no prior convictions
then imprisonment
90 2
75 7
68 17
78 17
2 3
16Conclusion
- Data mining or Knowledge discovery from databases
has not been appropriately exploited in law to
date. - Association rules are useful in that they
suggest hypotheses for future research - Association rules integrated into the generic
actual argument model can assist in identifying
the most plausible claim from given data items in
a forward inference way or the likelihood of
missing data values in a backward inference way
17The process of constructing an argument