Data mining with Association rules in law - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Data mining with Association rules in law

Description:

Mangasarian et al (1997) Breast Cancer diagnosis. A sample from breast lump ... A famous (but unsubstantiated AR) from a hypothetical supermarket transaction ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 18
Provided by: andrewst6
Category:

less

Transcript and Presenter's Notes

Title: Data mining with Association rules in law


1
  • Data mining with Association rules in law
  • Andrew Stranieri and John Zeleznikow
  • Donald Berman Laboratory for Information
    Technology and Law, La Trobe University,
  • Australia
  • http//www.cs.latrobe.edu.au/research/dbc/dbc.html
  • stranier_at_cs.latrobe.edu.au

2
Overview
  • Data mining or Knowledge discovery from databases
    has not been appropriately exploited in law to
    date.
  • Association rules are useful in that they
    suggest hypotheses for future research
  • Association rules integrated into the generic
    actual argument model can assist in identifying
    the most plausible claim from given data items in
    a forward inference way or the likelihood of
    missing data values in a backward inference way

3
  • What is data mining ? What is knowledge discovery
    from databases KDD?
  • Frawley et al 1991 define knowledge discovery
    in databases (KDD) as the 'non trivial extraction
    of nontrivial of implicit, previously unknown,
    and potentially useful information from data
  • KDD encompasses a number of different technical
    approaches, such as clustering, data
    summarization, learning classification rules,
    finding dependency networks, analyzing changes,
    and detecting anomalies
  • KDD has only recently emerged because we only
    recently have been gathering vast quantities of
    data
  • Good introductory text by Thornton (1992),
    Weiss, S., and Indurkya, (1998).Weiss, S and
    Kulikowski, (1992)

4
  • Examples of KDD studies
  • Mangasarian et al (1997) Breast Cancer
    diagnosis. A sample from breast lump mass is
    assessed by
  • mammagrophy (not sensitive 68-79)
  • data mining from FNA test results and visual
    inspection (65-98)
  • surgery (100 but invasive, expensive)
  • Basket analysis. People who buy nappies also buy
    beer
  • NBA. National Basketball Association of America.
    Player pattern profile. Bhandary et al (1997)
  • Credit card fraud detection
  • Stranieri/Zeleznikow (1997) predict family law
    property outcomes
  • Rissland and Friedman (1997) discovers a change
    in the concept of good faith in US Bankruptcy
    cases
  • Pannu (1995) discovers a prototypical case from a
    library of cases
  • Wilkins and Pillaipakkamnatt (1997) predicts the
    time a case takes to be heard
  • Veliev et al (1999) association rules for
    economic analaysis

5
  • Overview of process of knowledge discovery in
    databases ?

6
  • Phase 4. Data mining
  • Finding patterns in data or fitting models to
    data
  • Categories of techniques
  • Predictive (classification neural networks, rule
    induction, linear, multiple regression)
  • Segmentation (clustering, k-means, k-median)
  • Summarisation (associations, visualisation)
  • Change detection/modelling

7
  • Association rules are a data mining technique
  • An association rules tell us something about the
    association between two attributes
  • Agrawal et al (1993) developed the first
    association rule algorithm, Apriori
  • A famous (but unsubstantiated AR) from a
    hypothetical supermarket transaction database is
    if nappies then beer (80) Read this as nappies
    are bought implies beer are bought 80 of the
    time
  • Association rules have only recently been
    applied to law with promising results
  • Association rules can automatically discover
    rules that may prompt an analyst to think of
    hypothesis they would otherwise have considered

8
  • Confidence and support of an association rule
  • 80 is the confidence of the rule if nappies
    then beer (80). This is calculated by n2/n1
    where
  • n1 no of records where nappies are bought
  • n2 no of records where nappies were bought and
    beer was also bought.
  • if 1000 transactions for nappies, and of those,
    800 also had beer then confidence is 80.
  • A rule may have a high confidence but not be
    interesting because it doesnt apply to many
    records in the database. i.e. no. of records
    where nappies were bought with beer / total
    records.
  • Rules that may be interesting have a confidence
    level and support level above a user set
    threshold

9
  • Interesting rules Confidence and support of an
    association rule
  • if 1000 transactions for nappies, and of those,
    800 also had beer then confidence is 80.
  • A rule may have a high confidence but not be
    interesting because it doesnt apply to many
    records in the database. i.e. no. of records
    where nappies were bought with beer / total
    records.
  • Rules that may be interesting have a confidence
    level and support level above a user set
    threshold

10
Association rule screen shot with A-Miner from
Split Up data set
  • In 73.4 of cases where the wife's needs are
    some to high then the husband's future needs are
    few to some.
  • Prompts an analyst to posit plausible hypothesis
    e.g. it may be the case that the rule reflects
    the fact that more women remain custodial parents
    of the children following divorce than men do.
    The women that have some to high needs may do so
    because of their obligation to children.

11
Association rules in law
  • Association rules generators are typically
    packaged with very expensive data mining suites.
    We developed A-Miner (available from authors) for
    a PC platform.
  • Typically, too many association rules are
    generated for feasible analysis. So, our current
    research involves exploring metrics of
    interesting to restrict numbers of rules that
    might be interesting
  • In general, structured data is not collected in
    law as it is in other domains so very large
    databases are rare
  • Our current research involves 380,000 records
    from a Legal Aid organization data base that
    contains data on client features.
  • ArgumentDeveloper shell that can be used by
    judges to structure their reasoning in a way that
    will facilitate data collection and reasoning

12
Generic/actual argument model for sentencing
armed robbery
13
Association rules can be used for forward and
backward inferences in the generic/actual
argument model for sentencing armed robbery
14
Forward inference confidence
  • In the sentence actual argument database the
    following outcomes were noted for the inputs
    suggested

57 0.1 0 12 2 10 16 0 0 0
15
Backward inference constructing the strongest
argument
If all the items you suggest AND
If extremely serious pattern of priors then
imprisonment If very serious pattern of priors
then imprisonment If serious pattern of priors
then imprisonment If not so serious pattern of
priors then imprisonment If no prior convictions
then imprisonment
90 2
75 7
68 17
78 17
2 3
16
Conclusion
  • Data mining or Knowledge discovery from databases
    has not been appropriately exploited in law to
    date.
  • Association rules are useful in that they
    suggest hypotheses for future research
  • Association rules integrated into the generic
    actual argument model can assist in identifying
    the most plausible claim from given data items in
    a forward inference way or the likelihood of
    missing data values in a backward inference way

17
The process of constructing an argument
Write a Comment
User Comments (0)
About PowerShow.com