Shannon entropy as a measure for making decisions - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Shannon entropy as a measure for making decisions

Description:

The goal: Shannon entropy, which acts as an aid for making decisions ... GoldenEye. 119. 202. 96. 27. 8. 4.000. Toy Story. 5. 4. 3. 2. 1. Opinion. N. of occurrences ... – PowerPoint PPT presentation

Number of Views:1018
Avg rating:3.0/5.0
Slides: 44
Provided by: velblodVid
Category:

less

Transcript and Presenter's Notes

Title: Shannon entropy as a measure for making decisions


1
Shannon entropy as a measure for making decisions
  • Dra. Josefina López Herrera
  • Departamento de Lenguajes y Sistemas
  • Universitat Politècnica de Catalunya
  • C/Colom 11, 08222 Terrassa
  • jlopez_at_lsi.upc.edu

2
Index
  • Introduction
  • Recommendation method
  • Case Study
  • Diagnostic Method
  • Conclusions and Future work

3
Introduction
  • The goal Shannon entropy, which acts as an aid
    for making decisions
  • The tool a measurement based on Shannon entropy
    that characterize the tastes of consumers or a
    diagnosis according to the symptoms of a patient
    over a variety of illnesses.
  • The methodology the new product is recommended
    if the variation of that measurement is less than
    a certain value. The correct diagnosis is that
    that minimizes the variation of this measurement.

4
Index
  • Introduction
  • Recommendation method
  • Case Study
  • Diagnostic Method
  • Conclusions and Future work

5
Recommendation Method
Stage 1 Dynamic allocation of weights to the
attributes that characterize the products
Stage 2 Calculation of the Shannon entropy
of the customers product attributes before and
after to add the new product to recommend.
Stage3 Recommendation (or not)
6
Recommendation Method - Stage 1 Dynamic weights
allocation
  • An analysis by type of service will be made
    (food, household-electric products, movies, etc),
    being type of service a set of homogenous
    services.
  • All the attributes have the same importance for a
    particular service.
  • The characterization of the product or service in
    attributes.

Table1. Products perfectly identified by the
weights of its attributes given by an expert.
7
Recommendation Method - Stage 1 Dynamic weights
allocation
  • The attributes must have assigned a weight or
    value selected from a discrete rank of allowed
    values.
  • The historical information of the customer is
    available.
  • The weights of each attribute of the service are
    calculated from the opinions of the users of the
    service.

Table2. Transactions of the registered purchases
of a Customer for period 01.01.2005 to the
05.03.2005.
8
Recommendation Method - Stage 1 Dynamic weights
allocation
  • We define Pjn as the weight (opinion given by the
    users) of attribute j of service n .
  • In order to obtain the optimal weights that
    define each one of the attributes (j) of the
    service (n), we will use the Xs value (value
    between min Pjn Xs max Pjn), provided that
    Xs is the value of between all the possible ones
    that minimizes equation (1).

(1)
k is the number of opinions of that service, Xm
is a constant, aprox 0.3679 and is the value of
pi that corresponds to the maximum of the
function (2)
9
Recommendation Method - Stage 1 Dynamic weights
allocation
(2)
  • If pi is a probability then H will be the Shannon
    Entropy.
  • When the values of pi are in the interval of 0
    and 1 the values of H have the representation
    that is in Fig. 1, with a maximum in the value of
    pi Xm.

Figure 1. Representation of function p log2 p
0, 1
10
Recommendation Method - Stage 1 Dynamic weights
allocation
  • Multiplying the value by Xm
    is due to the necessity to limit the value of the
    variable that determines Hminjn to those values
    that correspond solely to the area limited by Xm,
    to eliminate the decreasing part of the function.
  • Using equation (1) to obtain the optimal opinion
    has its justification by analogy to the
    calculation of the value of the arithmetic
    average (M) of a series of values included in the
    rank 0..1. We can calculate M (defined in the
    rank min Sjlt M lt max Sj), as the number that
    corresponds with the minimum value of the series
    defined by equation (3)

(3)
  • Instead of using the value (Sj - M) as a variable
    in the equation (3), we used the value (Pi - Xs)
    Xm log2 (Pi - Xs) Xm . As we can see,
    this expression has the form of the Shannon
    Entropy.

11
Recommendation Method - Stage 1 Dynamic weights
allocation
  • In the case study, we will see that the first
    premise, enunciated in this section, is
    satisfied the optimal value calculated
    according to the method is one of the discrete
    allowed values.
  • This is the main reason to use this expressions
    instead of arithmetic average.
  • This stage must be applied when we do not have
    the weights of service attributes (opinion of
    expert) and must calculate them from the
    opinions of its users.

12
Recommendation Method - Stage 2 Calculation of
Shannon Entropy
  • Step 1.We must obtain the relative frequency
    allocation of the weights of each attribute
    starting from the purchases made by the users.
    The analysis is made by type of product (service)
    so we must consider all the products bought by
    the user of the same type during a certain period
    of time.
  • In order to calculate the distribution of the
    relative frequencies of the different values from
    each attribute j by user/type of product, the
    following equation (4) is used

(4)
13
Recommendation Method - Stage 2 Calculation of
Shannon Entropy
(4)
  • Where b is the amount of different weights from
    attribute j of all products of the same type
    bought by the user. We identified pij as the
    relative frequency of purchase of a product that
    is characterized by a particular value of
    attribute j like in equation (5)

14
Recommendation Method - Stage 2 Calculation of
Shannon Entropy
(5)
  • Where "n" is the total of bought units of all the
    products of the same type and "a" the total of
    bought units of those products that have a
    certain value of attribute j. In table 2, the
    purchases made by a user can be visualized. It is
    assumed that the products belong to the same
    group.

15
Recommendation Method - Stage 2 Calculation of
Shannon Entropy
Step1 The entropy of all the attributes before
including the new product to recommend (P4) is
calculated
16
Recommendation Method - Stage 2 Calculation of
Shannon Entropy
Step2 The entropy of all the attributes
including the new product is calculated. In the
example, we included in this step the P4 product
and calculated H
17
Recommendation Method Stage 3 Recommendation
  • We calculated the difference Abs(H - H') for each
    attribute and selected the one whose value is
    maximum. In the example, it is the price
    attribute with a value of 0.05793.
  • If this value is inferior to a e predetermined,
    the product will be recommended in opposite
    case, it will not be recommended.

18
Recommendation Method Stage 3 Recommendation
  • If this value is inferior to a e predetermined,
    the product will be recommended in opposite
    case, it will not be recommended.
  • The value of this parameter e will depend on the
    size of the sample and on the type of values. In
    general, we can establish that its value
    oscillates between 1/n and 1/nlog2(n) , where
    "n" is the total number of cases (bought units)
    at the moment for carrying out the
    recommendation.
  • In the case of example, we used the expression
    1/nlog2(n)0.05089 as the value of e in which
    case we would not recommend the P4 product
    because 0.05793 gt 0.05089.

19
Index
  • Introduction
  • Recommendation method
  • Case Study
  • Diagnostic Method
  • Conclusions and Future work

20
Case Study
  • Case study will be presented, using the data of
    the opinions of movies published by the GroupLens
    Research Project at the University of Minnesota.
    In this case, we have binary attributes with
    quantitative opinions. In this paper, it will be
    demonstrated that with the same methodology we
    can consider binary attributes to characterize
    the products (films in this case). A movie
    belongs to one or several sorts and has a score
    depending on the satisfaction of the user.

21
Case Study opinions of movies published by the
GroupLens Research Project at the University of
Minnesota
  • In this case, we have binary attributes with
    quantitative opinions. In this paper, it will be
    demonstrated that with the same methodology we
    can consider binary attributes to characterize
    the products (films in this case). A movie
    belongs to one or several sorts and has a score
    depending on the satisfaction of the user.
  • The study has been made on a sample of 2,234
    opinions expressed by different users who have
    seen the films. The number of opinions varies
    based on the user. The answers of a total of 45
    users have been analyzed. For each user, the 80
    of the films have been used to find the behavior
    of the user and the remaining 20 to test and
    evaluate the methodology as a recommendation
    tool.

22
Case Study opinions of movies published by the
GroupLens Research Project
  • 1. Normalization of the values of the opinions of
    the users on the films to 0..1.
  • 2. To use equation (1) to find a value of each
    movie, starting from the opinions of the users
    who have seen this film. This calculated opinion
    simulates the opinion of the expert.

(1)
23
Case Study opinions of movies published by the
GroupLens Research Project
  • The column Opinion calculated inquires into the
    result of the calculation according to the
    equation (1). The column no. of cases indicates
    the number of times that the film has been scored
    with the value of the column Opinion according
    to the consulted users.
  • It is interesting to observe that the calculated
    value is always one of allowed values with an
    approach of at least 0.001, and that not
    necessarily the most frequently voted value is
    the one that is obtained (i.e. Copycat).

Table 3 Summary of the results after applying
equation (1).
24
Case Study opinions of movies published by the
GroupLens Research Project
  • 3. We calculate H for each user and attribute.
    For each attribute, in this case the genre, we
    calculate pi of each score of each attribute by
    each costumer, pi being the frequencies of the
    scores of each attribute. For example, in the
    case of a costumer who has seen 10 movies of
    which 4 are westerns and 6 are comedies, the
    analysis of the movies of the sort western in
    which two of them has a score of 3, one of them 4
    and rest 5, we calculate pi, p3 2/10, p41/10,
    p51/10 using equation (4).

(4)
Whe associate a value Hj for every genre of movie
seen by the customer.
25
Case Study opinions of movies published by the
GroupLens Research Project
  • 4. We establish in this case the value of epsilon
    as 1/n where n is the total of films seen by
    the customer before adding the movie to
    recommend. For every Hj calculated before
    incorporating the movie (H) and after
    incorporating it (H) we consider the expression
  • 5. Rules of recommendation We recommend if
    (HDIFj - epsilon)/HDIFj lt 0.15 for every j,
    that is, if the relative difference between HDIFj
    and epsilon lt 15. We can be more or less
    rigorous in the recommendation criterion varying
    this percentage.

26
Case Study opinions of movies published by the
GroupLens Research Project
  • In Table 4, the column ok, is a summary
    of the number of films that have been
    recommended, and the column number is the total
    of films seen by the user registered in the test
    file. We can see the percentage of success in the
    column pct.ok.

27
Case Study opinions of movies published by the
GroupLens Research Project
Figure 2 shows the of successful recommendation
of the users analyzed in Table 4.
28
Index
  • Introduction
  • Recommendation method
  • Case Study
  • Diagnostic Method
  • Conclusions and Future work

29
Diagnostic Method
  • We present a proposal for the application of the
    methodology in the field of medical diagnosis
    founded on the results obtained from the
    recommendation method.
  • The defined premises and selection algorithm in
    the future are subject to later adjustments in
    function with the results obtained in real case
    studies.

30
Diagnostic Method
  • Premises/Definitions
  • 1. We have access to the previous correct
    diagnoses (di).
  • 2. Each diagnosis di is characterized by a
    determined number of attributes j which
    represent the symptoms of the patient. We will
    call this group of values, Diagnosis Matrix
    (Dij).

31
Diagnostic Method
  • Premises/Definitions
  • 3. Due to the existence of attribute values whose
    medical significance are equivalent (for example,
    body temperature between 36.5ºC and 37.0ºC), a
    previous step must be established to discretely
    account for these equivalences.

32
Diagnostic Method
  • Premises/Definitions
  • 4. Each one of the attributes has an associated
    value inside a range of discrete allowed values.
    For convenience, we normalize these values in the
    range 0..1.
  • 5. Not all of the attributes are defined for a
    determined diagnosis. This means that the matrix
    Dij contains null elements with undefined
    values.

33
Diagnostic Method
  • Premises/Definitions
  • 6. For a determined illness, we have e
    associated diagnoses, each one characterized by a
    group of attributes (symptoms) j. We will call
    this group of illnesses/diagnoses, Diagnoses
    Matrix (Dej). Not all elements of this matrix
    necessarily have to have defined values.

34
Diagnostic Method
  • Selection Algorithm
  • Step 1. Obtain the relative frequencies fejk of
    each value k, attribute j, and illness e
    from the matrix Dej of correct diagnoses.
    Suppose, for example, the following matrix

35
Diagnostic Method
  • Step 2. Calculate the Shannon entropy of illness
    e (He) from the diagnoses matrix Dej according
    to expression

36
Diagnostic Method
  • Step 3. Given a new diagnosis dj to classify, we
    calculate the Shannon entropy He supposing that
    the diagnosis corresponds to the illness e.
    From all the sampled diagnoses, we consider the
    correct diagnosis as the one that minimizes
    following expression, whenever it is less than 1

37
Diagnostic Method
  • where e f(d)/(nlog(n)), n being the number
    of correct diagnoses of illness e (i.e., the
    number of rows of matrix Dej) and f(d) a function
    whose value depends upon the number of attributes
    of the new diagnosis with a defined value. In
    principle, we can suppose f(d)d.

38
Index
  • Introduction
  • Recommendation method
  • Case Study
  • Diagnostic Method
  • Conclusions and Future work

39
Conclusion and Future work
  • We have presented a methodology to extract the
    knowledge of the tastes of the users based on
    their opinions with no need to use any model.
  • As we have showed in the case study, this can be
    done through the analysis of the distribution of
    the opinions and the use of a given rules, based
    on Shannon Entropy.

40
Conclusion and Future work
  • Our objective is to homogenize the criteria of
    prediction and recommendation of services in
    dynamic and heterogeneous environments.

41
Conclusion and Future work
  • The next step would be to prove this methodology
    in a real time e-business application.
  • It is possible to be integrated to other domains
    using the knowledge of each domain by means of
    its ontologies and to any type of architecture as
    it can be a recommender system based on agents

42
Future work
  • In the field of diagnosis, verification and
    adjustments to the behavior of the method in real
    cases of illness diagnosis have yet to be done.

43
  • Thanks
Write a Comment
User Comments (0)
About PowerShow.com