Decision Making with Uncertainty and Data Mining - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Decision Making with Uncertainty and Data Mining

Description:

... B. Mareschal (Belgium) basically a workable ELECTRE PROMETHEE I: partial order PROMETHEE II: full ranking GAIA: graphical (concordance analysis) ... – PowerPoint PPT presentation

Number of Views:223
Avg rating:3.0/5.0
Slides: 42
Provided by: CBA478
Category:

less

Transcript and Presenter's Notes

Title: Decision Making with Uncertainty and Data Mining


1
Decision Making with Uncertainty and Data Mining
  • David L. Olson University of Nebraska
  • Desheng Wu University of Science Technology
    of China
  • ADMA05 Wuhan, China, 22-24 July 2005

2
Decision Making under Uncertainty
  • Uncertainty exists in data
  • Imprecise data
  • Missing data
  • Human subjectivity
  • Fuzzy set theory
  • A means to reflect uncertainty
  • Grey related analysis (interval vague)
  • A type of fuzzy set data

3
Monte Carlo Simulation
  • Analytic models preferred
  • But simulation needed if
  • High levels of uncertainty make analytic models
    too messy to calculate
  • High levels of complexity make analytic models
    intractable

4
Fuzzy Simulation
  • Fuzzy input often expressed in trapezoidal form
  • Minimum, range of most likely, maximum
  • Triangular, interval special cases
  • Can be analyzed through Monte Carlo

5
Fuzzy Distribution Forms
  • Trapezoidal
  • Triangular
  • Interval

6
Grey Related Analysis
  • Deng 1982
  • Means to incorporate uncertainty
  • Incomplete or unknown elements
  • Interval numbers
  • Standardize through norms
  • Transform index values through product operations
  • Minimize distance to ideal, max from nadir
  • Simple, practical
  • Dont require large sample sizes, nonparametric

7
Demonstration MCDM
  • MultiCriteria Decision Making
  • Modern decision making complex
  • Need to balance tradeoffs among conflicting
    criteria (attributes objectives goals)
  • Fuzzy MCDM
  • Alternative scores on each criterion uncertain
  • Measures of weights vary across group members

8
Implementations of Fuzzy Multiattribute Idea
  • Fuzzy theory
  • DuBois Prade 1980
  • Rough sets
  • Pawlak 1982
  • Grey sets
  • Interval analysis Moore 1966 1979
  • Deng 1982
  • Vague sets Gau Buehrer 1993
  • Probability theory
  • Pearl 1988

9
PROMETHEE
  • J.P. Brans, P. Vincke, B. Mareschal (Belgium)
  • basically a workable ELECTRE
  • PROMETHEE I partial order
  • PROMETHEE II full ranking
  • GAIA graphical (concordance analysis)

10
criteria scales
  • I -0 if indifferent or worse, 1 if better
  • II -0 if not better by parameter q, 1 if
  • III -d is degree better than alternative
  • 0 if not better by parameter q
  • d/p if between q p, 1 if dgtp
  • IV -step 0 if dltq .5 if qltdltp 1 if dgtp
  • V - slope
  • VI - normal

11
Promethee Criteria
  • II INTERVAL
  • III TRIANGULAR
  • V TRAPEZOIDAL
  • Promethee doesnt use value function
  • But demonstrates the incorporation of fuzzy input
    into MCDM

12
Demo Model
  • Group Decision
  • Conservative, Liberal, Business
  • Energy Options
  • S1 Nuclear
  • S2 Coal
  • S3 Conservation
  • S4 Import
  • Criteria
  • C1 Cost (minimize)
  • C2 Pollution (miniimize)
  • C3 Risk of catastrophe (minimize)
  • C4 Energy Independence (maximize)

13
Weights for each group memberTrapezoidal (grey
related)
C1 Cost C2 Pollution C3 Risk C4 Independent
Conservative 0.4, 0.5, 0.7, 0.8 0, 0, 0.05, 0.1 0, 0.03, 0.05, 0.15 0.05, 0.1, 0.15, 0.25
Liberal 0.05, 0.1, 0.15, 0.2 0.2, 0.4, 0.5, 0.6 0.2, 0.3, 0.4, 0.6 0.03, 0.05, 0.1, 0.15
Business 0.25, 0.27, 0.29, 0.3 0.12, 0.15, 0.2, 0.25 0.16, 0.2, 0.25, 0.3 0.25, 0.3, 0.35, 0.4
14
Cost Scores for each group memberTrapezoidal
S11 Nuclear S12 Coal S13 Conserve S14 Import
Conservative 0, 0.05, 0.1, 0.2 0.3, 0.4, 0.5, 0.7 0.6, 0.75, 0.85, 0.9 0.6, 0.7, 0.75, 0.8
Liberal 0.3, 0.5, 0.6, 0.8 0.5, 0.6, 0.7, 0.9 0.6, 0.7, 0.85, 0.95 0.6, 0.7, 0.8, 0.9
Business 0.4, 0.5, 0.6, 0.7 0.7, 0.75, 0.85, 0.9 0.8, 0.9, 0.95, 1.0 0.75, 0.8, 0.85, 0.9
15
MethodWu, Olson, Liang
  • Use grey related analysis
  • Inputs are uncertain
  • Use alpha-cut method to convert trapezoidal into
    interval
  • Simulate
  • Very complex preference model
  • Know distribution of uncertainty
  • Possibility that different alternatives may turn
    out to be preferred

16
Simulation Output
Nuclear Coal Conserve Import
Conservative 0 0 0.99 0.01
Liberal 0.07 0.34 0.39 0.20
Business 0.24 0.36 0.19 0.21
Consensus 0.02 0.18 0.47 0.07
17
Monte Carlo Simulation of Grey Related Data
  • Given interval data
  • Draw uniform random number
  • Assume value that proportion from minimum to
    maximum
  • Do this for every interval number
  • These become crisp numbers for this sample
  • Calculate outcomes
  • Value
  • Get probabilistic picture of outcomes in complex
    system involving uncertainty (grey related
    intervals)

18
DemonstrationOlson Wu
  • Hiring decision
  • Multiple criteria, Six applicants
  • Criteria
  • C1 Experience in business
  • C2 Experience in function
  • C3 Education
  • C4 Leadership
  • C5 Adaptability
  • C6 Age
  • C7 Aptitude for Teamwork

19
Alternative Performance Matrix
C1-bus C2-funct C3-educ C4-lead C5-adapt C6- age C7-team
Antonio .65-.85 .75-.95 .25-.45 .45-.85 .05-.45 .45-.75 .75-1.0
Fabio .25-.45 .05-.25 .65-.85 .30-.65 .30-.75 .05-.25 .05-.45
Alberto .45-.65 .20-.80 .65-.85 .50-.80 .35-.90 .20-.45 .75-1.0
Fernand .85-1.0 .35-.75 .65-.85 .15-.65 .30-.70 .45-.80 .35-.70
Isabel .50-.95 .65-.95 .45-.65 .65-.95 .05-.50 .45-.80 .50-.90
Rafaela .65-.85 .15-.35 .45-.65 .25-.75 .05-.45 .45-.80 .10-.55
20
Grey Related Weights
Criteria Weights
C1 Experience-Business 0.20-0.35
C2 Experience-Job Function 0.30-0.55
C3 Educational Background 0.05-0.30
C4 Leadership Capacity 0.25-0.50
C5 Adaptability 0.15-0.45
C6 Age 0.05-0.30
C7 Aptitude for Teamwork 0.25-0.55
21
Grey Related data
  • Weights interval
  • Scores interval
  • Used Grey Related model to identify best for each
    simulation run
  • Best average weighted distance to reference point
  • Reflect both min to ideal, max from nadir
  • Ran 1,000 replications for each of 10 seeds

22
Probabilities of Best
Anton Fabio Alberto Fernand Isabel Rafaela
Crisp Grey - - - - X -
Interval avg 0.358 0 0.189 0.047 0.410 0
min 0.336 0 0.168 0.040 0.384 0
max 0.393 0 0.210 0.053 0.429 0
Trapezoidal 0.354 0 0.189 0.044 0.409 0
min 0.328 0 0.171 0.035 0.382 0
max 0.381 0 0.206 0.051 0.424 0
23
Implications
  • Crisp Grey Related
  • Isabel is the best choice
  • Antonio very close
  • Alberto, Fernando not far back
  • SIMULATION
  • Isabels probability of being best is 0.41
  • Antonio 0.35, Alberto 0.19, Fernando 0.05
  • Fabio Rafaela never won
  • Simulation provides better picture

24
Simulation of Grey Related Data in Data Mining
  • Decision tree analysis (PolyAnalyst)
  • Real credit card data
  • 1,000 observations (900 train 100 test)
  • 140 default, 860 no problem
  • 65 available explanatory variables (used 26)
  • Due to imbalance, initial models degenerate
  • Called all test cases OK
  • Differential cost models also degenerate
  • Called all test cases default

25
Fuzzified Data
  • Of 26 explanatory variables
  • 5 binary
  • 1 categorical
  • 20 continuous
  • Fuzzified into 3 categories each
  • Case by case, roughly equally sized categories

26
Decision Tree Models
  • Minimum support minimum of 1
  • PolyAnalyst allowed
  • Optimistic split of criteria
  • Pessimistic split of criteria
  • Different decision tree model each run

27
Continuous Data Output
  • Varied degree of perturbation (uncertainty)
  • Continuous Data
  • Many models overlapping
  • Three unique decision trees
  • Used a total of 8 explanatory variables
  • Categorical Data
  • Four unique decision trees
  • Used a total of 7 explanatory variables

28
Continuous Model 1
  • Bal/Pay ratio lt 6.44 NO
  • Bal/Pay ratio 6.44
  • Utilization lt 1.54 Default
  • Utilization 1.54
  • AvgPayment lt 3.91 NO
  • AvgPayment 3.91 Default

29
Continuous Model 2
  • Bal/Pay ratio lt 6.44 NO
  • Bal/Pay ratio 6.44 Default

30
Continuous Model 3
  • Bal/Pay ratio lt 6.44 NO
  • Bal/Pay ratio 6.44
  • Utilization lt 1.54 Default
  • Utilization 1.54
  • AvgRevolvePay lt 2.28 Default
  • AvgRevolvePay 2.28 NO

31
Categorical Model 1
  • Bal/Pay ratio high
  • CreditLine high
  • CalcIntRate I mid NO
  • CalcIntRate I NOT mid Default
  • CreditLine NOT high Default
  • Bal/Pay ratio NOT high NO

32
Categorical Model 2
  • Bal/Pay ratio high
  • CreditLine low
  • ChangeLine mid
  • PurchBal low Default
  • PurchBal NOT low NO
  • ChangeLine low NO
  • ChangeLine high Default
  • CreditLine high
  • CalcIntRate I mid NO
  • CalcIntRate I NOT mid Default
  • CreditLine mid Default
  • Bal/Pay ratio NOT high NO

33
Categorical Model 3
  • Bal/Pay ratio high Default
  • Bal/Pay ratio NOT high NO

34
Categorical Model 4
  • Bal/Pay ratio high
  • CreditLine low
  • ChangeLine mid
  • PurchBal low Default
  • PurchBal NOT low NO
  • ChangeLine low NO
  • Residence 0 Default
  • Residence 1 or 2 NO
  • ChangeLine high Default
  • CreditLine high
  • CalcIntRate I mid NO
  • CalcIntRate I NOT mid Default
  • CreditLine mid Default
  • Bal/Pay ratio NOT high NO

35
Continuous 1 Coincidence matrix
Model 0 Model 1
Actual 0 43 16 59
Actual 1 14 27 41
57 43 0.70
36
Simulation Output Continuous 1(Crystal Ball
test set accuracy)
37
Continuous 1
  • Simulation accuracy of 100 observations, 1000
    simulation runs
  • perturbation -0.25,0.25 0.67-0.73
  • perturbation -0.50,0.50 0.65-0.74
  • perturbation -1,1 0.62-0.75
  • perturbation -2,2 0.58-0.74
  • perturbation -3,3 0.57-0.74
  • perturbation -4,4 0.56-0.75

38
Mean Model AccuracyMeasured on Test Set
Crisp 0.25 0.50 1.00 2.00 3.00 4.00
Con1 0.70 0.70 0.70 0.68 0.67 0.66 0.65
Con2 0.67 0.67 0.67 0.67 0.67 0.66 0.66
Con3 0.71 0.71 0.70 0.69 0.67 0.67 0.66
CON 0.693 0.693 0.690 0.680 0.670 0.667 0.657
Cat1 0.70 0.70 0.68 0.67 0.66 0.66 0.65
Cat2 0.70 0.70 0.70 0.69 0.68 0.67 0.67
Cat3 0.70 0.70 0.70 0.69 0.69 0.68 0.67
Cat4 0.70 0.70 0.70 0.69 0.68 0.67 0.67
CAT 0.700 0.700 0.700 0.688 0.678 0.670 0.665
39
Inferences
  • Continuous models declined in accuracy more than
    categorical
  • Categorizing data one basic form of fuzziness

40
Applying to Data Mining
  • Easiest way to apply fuzzy concepts to data
    mining
  • CATEGORIZE DATA
  • Simulation a way to deal with fuzzy data
  • Application of Simulation to fuzzy data mining
    not as simple
  • Large scale data sets
  • Create additional columns
  • Still very promising research area

41
Conclusions
  • Interesting research directions
  • Simulation in data mining
  • Fuzzy data is probabilistic, so simulation seems
    appropriate
  • Simulation involves a lot more work than closed
    form (CRISP) simplifications
  • Group preference aggregation
  • Fuzzy data may be fuzzy due to different group
    member opinions
  • Interesting ways to aggregate
Write a Comment
User Comments (0)
About PowerShow.com