Decision Making with Uncertainty and Data Mining - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Decision Making with Uncertainty and Data Mining

Description:

Desheng Wu University of Science & Technology of China. ADMA'05: Wuhan, China, 22-24 July 2005 ... A means to reflect uncertainty. Grey related analysis ... – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0
Slides: 42
Provided by: ait3
Category:

less

Transcript and Presenter's Notes

Title: Decision Making with Uncertainty and Data Mining


1
Decision Making with Uncertainty and Data Mining
  • David L. Olson University of Nebraska
  • Desheng Wu University of Science Technology
    of China
  • ADMA05 Wuhan, China, 22-24 July 2005

2
Decision Making under Uncertainty
  • Uncertainty exists in data
  • Imprecise data
  • Missing data
  • Human subjectivity
  • Fuzzy set theory
  • A means to reflect uncertainty
  • Grey related analysis (interval vague)
  • A type of fuzzy set data

3
Monte Carlo Simulation
  • Analytic models preferred
  • But simulation needed if
  • High levels of uncertainty make analytic models
    too messy to calculate
  • High levels of complexity make analytic models
    intractable

4
Fuzzy Simulation
  • Fuzzy input often expressed in trapezoidal form
  • Minimum, range of most likely, maximum
  • Triangular, interval special cases
  • Can be analyzed through Monte Carlo

5
Fuzzy Distribution Forms
  • Trapezoidal
  • Triangular
  • Interval

6
Grey Related Analysis
  • Deng 1982
  • Means to incorporate uncertainty
  • Incomplete or unknown elements
  • Interval numbers
  • Standardize through norms
  • Transform index values through product operations
  • Minimize distance to ideal, max from nadir
  • Simple, practical
  • Dont require large sample sizes, nonparametric

7
Demonstration MCDM
  • MultiCriteria Decision Making
  • Modern decision making complex
  • Need to balance tradeoffs among conflicting
    criteria (attributes objectives goals)
  • Fuzzy MCDM
  • Alternative scores on each criterion uncertain
  • Measures of weights vary across group members

8
Implementations of Fuzzy Multiattribute Idea
  • Fuzzy theory
  • DuBois Prade 1980
  • Rough sets
  • Pawlak 1982
  • Grey sets
  • Interval analysis Moore 1966 1979
  • Deng 1982
  • Vague sets Gau Buehrer 1993
  • Probability theory
  • Pearl 1988

9
PROMETHEE
  • J.P. Brans, P. Vincke, B. Mareschal (Belgium)
  • basically a workable ELECTRE
  • PROMETHEE I partial order
  • PROMETHEE II full ranking
  • GAIA graphical (concordance analysis)

10
criteria scales
  • I -0 if indifferent or worse, 1 if better
  • II -0 if not better by parameter q, 1 if
  • III -d is degree better than alternative
  • 0 if not better by parameter q
  • d/p if between q p, 1 if dgtp
  • IV -step 0 if dltq .5 if qltdltp 1 if dgtp
  • V - slope
  • VI - normal

11
Promethee Criteria
  • II INTERVAL
  • III TRIANGULAR
  • V TRAPEZOIDAL
  • Promethee doesnt use value function
  • But demonstrates the incorporation of fuzzy input
    into MCDM

12
Demo Model
  • Group Decision
  • Conservative, Liberal, Business
  • Energy Options
  • S1 Nuclear
  • S2 Coal
  • S3 Conservation
  • S4 Import
  • Criteria
  • C1 Cost (minimize)
  • C2 Pollution (miniimize)
  • C3 Risk of catastrophe (minimize)
  • C4 Energy Independence (maximize)

13
Weights for each group memberTrapezoidal (grey
related)
14
Cost Scores for each group memberTrapezoidal
15
MethodWu, Olson, Liang
  • Use grey related analysis
  • Inputs are uncertain
  • Use alpha-cut method to convert trapezoidal into
    interval
  • Simulate
  • Very complex preference model
  • Know distribution of uncertainty
  • Possibility that different alternatives may turn
    out to be preferred

16
Simulation Output
17
Monte Carlo Simulation of Grey Related Data
  • Given interval data
  • Draw uniform random number
  • Assume value that proportion from minimum to
    maximum
  • Do this for every interval number
  • These become crisp numbers for this sample
  • Calculate outcomes
  • Value
  • Get probabilistic picture of outcomes in complex
    system involving uncertainty (grey related
    intervals)

18
DemonstrationOlson Wu
  • Hiring decision
  • Multiple criteria, Six applicants
  • Criteria
  • C1 Experience in business
  • C2 Experience in function
  • C3 Education
  • C4 Leadership
  • C5 Adaptability
  • C6 Age
  • C7 Aptitude for Teamwork

19
Alternative Performance Matrix
20
Grey Related Weights
21
Grey Related data
  • Weights interval
  • Scores interval
  • Used Grey Related model to identify best for each
    simulation run
  • Best average weighted distance to reference point
  • Reflect both min to ideal, max from nadir
  • Ran 1,000 replications for each of 10 seeds

22
Probabilities of Best
23
Implications
  • Crisp Grey Related
  • Isabel is the best choice
  • Antonio very close
  • Alberto, Fernando not far back
  • SIMULATION
  • Isabels probability of being best is 0.41
  • Antonio 0.35, Alberto 0.19, Fernando 0.05
  • Fabio Rafaela never won
  • Simulation provides better picture

24
Simulation of Grey Related Data in Data Mining
  • Decision tree analysis (PolyAnalyst)
  • Real credit card data
  • 1,000 observations (900 train 100 test)
  • 140 default, 860 no problem
  • 65 available explanatory variables (used 26)
  • Due to imbalance, initial models degenerate
  • Called all test cases OK
  • Differential cost models also degenerate
  • Called all test cases default

25
Fuzzified Data
  • Of 26 explanatory variables
  • 5 binary
  • 1 categorical
  • 20 continuous
  • Fuzzified into 3 categories each
  • Case by case, roughly equally sized categories

26
Decision Tree Models
  • Minimum support minimum of 1
  • PolyAnalyst allowed
  • Optimistic split of criteria
  • Pessimistic split of criteria
  • Different decision tree model each run

27
Continuous Data Output
  • Varied degree of perturbation (uncertainty)
  • Continuous Data
  • Many models overlapping
  • Three unique decision trees
  • Used a total of 8 explanatory variables
  • Categorical Data
  • Four unique decision trees
  • Used a total of 7 explanatory variables

28
Continuous Model 1
  • Bal/Pay ratio lt 6.44 NO
  • Bal/Pay ratio 6.44
  • Utilization lt 1.54 Default
  • Utilization 1.54
  • AvgPayment lt 3.91 NO
  • AvgPayment 3.91 Default

29
Continuous Model 2
  • Bal/Pay ratio lt 6.44 NO
  • Bal/Pay ratio 6.44 Default

30
Continuous Model 3
  • Bal/Pay ratio lt 6.44 NO
  • Bal/Pay ratio 6.44
  • Utilization lt 1.54 Default
  • Utilization 1.54
  • AvgRevolvePay lt 2.28 Default
  • AvgRevolvePay 2.28 NO

31
Categorical Model 1
  • Bal/Pay ratio high
  • CreditLine high
  • CalcIntRate I mid NO
  • CalcIntRate I NOT mid Default
  • CreditLine NOT high Default
  • Bal/Pay ratio NOT high NO

32
Categorical Model 2
  • Bal/Pay ratio high
  • CreditLine low
  • ChangeLine mid
  • PurchBal low Default
  • PurchBal NOT low NO
  • ChangeLine low NO
  • ChangeLine high Default
  • CreditLine high
  • CalcIntRate I mid NO
  • CalcIntRate I NOT mid Default
  • CreditLine mid Default
  • Bal/Pay ratio NOT high NO

33
Categorical Model 3
  • Bal/Pay ratio high Default
  • Bal/Pay ratio NOT high NO

34
Categorical Model 4
  • Bal/Pay ratio high
  • CreditLine low
  • ChangeLine mid
  • PurchBal low Default
  • PurchBal NOT low NO
  • ChangeLine low NO
  • Residence 0 Default
  • Residence 1 or 2 NO
  • ChangeLine high Default
  • CreditLine high
  • CalcIntRate I mid NO
  • CalcIntRate I NOT mid Default
  • CreditLine mid Default
  • Bal/Pay ratio NOT high NO

35
Continuous 1 Coincidence matrix
36
Simulation Output Continuous 1(Crystal Ball
test set accuracy)
37
Continuous 1
  • Simulation accuracy of 100 observations, 1000
    simulation runs
  • perturbation -0.25,0.25 0.67-0.73
  • perturbation -0.50,0.50 0.65-0.74
  • perturbation -1,1 0.62-0.75
  • perturbation -2,2 0.58-0.74
  • perturbation -3,3 0.57-0.74
  • perturbation -4,4 0.56-0.75

38
Mean Model AccuracyMeasured on Test Set
39
Inferences
  • Continuous models declined in accuracy more than
    categorical
  • Categorizing data one basic form of fuzziness

40
Applying to Data Mining
  • Easiest way to apply fuzzy concepts to data
    mining
  • CATEGORIZE DATA
  • Simulation a way to deal with fuzzy data
  • Application of Simulation to fuzzy data mining
    not as simple
  • Large scale data sets
  • Create additional columns
  • Still very promising research area

41
Conclusions
  • Interesting research directions
  • Simulation in data mining
  • Fuzzy data is probabilistic, so simulation seems
    appropriate
  • Simulation involves a lot more work than closed
    form (CRISP) simplifications
  • Group preference aggregation
  • Fuzzy data may be fuzzy due to different group
    member opinions
  • Interesting ways to aggregate
Write a Comment
User Comments (0)
About PowerShow.com