Title: Shannon entropy as a measure for making decisions
1Shannon entropy as a measure for making decisions
- Dra. Josefina López Herrera
- Departamento de Lenguajes y Sistemas
- Universitat Politècnica de Catalunya
- C/Colom 11, 08222 Terrassa
- jlopez_at_lsi.upc.edu
2Index
- Introduction
- Recommendation method
- Case Study
- Diagnostic Method
- Conclusions and Future work
3Introduction
- The goal Shannon entropy, which acts as an aid
for making decisions - The tool a measurement based on Shannon entropy
that characterize the tastes of consumers or a
diagnosis according to the symptoms of a patient
over a variety of illnesses. - The methodology the new product is recommended
if the variation of that measurement is less than
a certain value. The correct diagnosis is that
that minimizes the variation of this measurement.
4Index
- Introduction
- Recommendation method
- Case Study
- Diagnostic Method
- Conclusions and Future work
5Recommendation Method
Stage 1 Dynamic allocation of weights to the
attributes that characterize the products
Stage 2 Calculation of the Shannon entropy
of the customers product attributes before and
after to add the new product to recommend.
Stage3 Recommendation (or not)
6Recommendation Method - Stage 1 Dynamic weights
allocation
- An analysis by type of service will be made
(food, household-electric products, movies, etc),
being type of service a set of homogenous
services. - All the attributes have the same importance for a
particular service. - The characterization of the product or service in
attributes.
Table1. Products perfectly identified by the
weights of its attributes given by an expert.
7Recommendation Method - Stage 1 Dynamic weights
allocation
- The attributes must have assigned a weight or
value selected from a discrete rank of allowed
values. - The historical information of the customer is
available. - The weights of each attribute of the service are
calculated from the opinions of the users of the
service.
Table2. Transactions of the registered purchases
of a Customer for period 01.01.2005 to the
05.03.2005.
8Recommendation Method - Stage 1 Dynamic weights
allocation
- We define Pjn as the weight (opinion given by the
users) of attribute j of service n . - In order to obtain the optimal weights that
define each one of the attributes (j) of the
service (n), we will use the Xs value (value
between min Pjn Xs max Pjn), provided that
Xs is the value of between all the possible ones
that minimizes equation (1).
(1)
k is the number of opinions of that service, Xm
is a constant, aprox 0.3679 and is the value of
pi that corresponds to the maximum of the
function (2)
9Recommendation Method - Stage 1 Dynamic weights
allocation
(2)
- If pi is a probability then H will be the Shannon
Entropy. - When the values of pi are in the interval of 0
and 1 the values of H have the representation
that is in Fig. 1, with a maximum in the value of
pi Xm.
Figure 1. Representation of function p log2 p
0, 1
10Recommendation Method - Stage 1 Dynamic weights
allocation
- Multiplying the value by Xm
is due to the necessity to limit the value of the
variable that determines Hminjn to those values
that correspond solely to the area limited by Xm,
to eliminate the decreasing part of the function.
- Using equation (1) to obtain the optimal opinion
has its justification by analogy to the
calculation of the value of the arithmetic
average (M) of a series of values included in the
rank 0..1. We can calculate M (defined in the
rank min Sjlt M lt max Sj), as the number that
corresponds with the minimum value of the series
defined by equation (3)
(3)
- Instead of using the value (Sj - M) as a variable
in the equation (3), we used the value (Pi - Xs)
Xm log2 (Pi - Xs) Xm . As we can see,
this expression has the form of the Shannon
Entropy.
11Recommendation Method - Stage 1 Dynamic weights
allocation
- In the case study, we will see that the first
premise, enunciated in this section, is
satisfied the optimal value calculated
according to the method is one of the discrete
allowed values. - This is the main reason to use this expressions
instead of arithmetic average. - This stage must be applied when we do not have
the weights of service attributes (opinion of
expert) and must calculate them from the
opinions of its users.
12Recommendation Method - Stage 2 Calculation of
Shannon Entropy
- Step 1.We must obtain the relative frequency
allocation of the weights of each attribute
starting from the purchases made by the users.
The analysis is made by type of product (service)
so we must consider all the products bought by
the user of the same type during a certain period
of time. - In order to calculate the distribution of the
relative frequencies of the different values from
each attribute j by user/type of product, the
following equation (4) is used
(4)
13Recommendation Method - Stage 2 Calculation of
Shannon Entropy
(4)
- Where b is the amount of different weights from
attribute j of all products of the same type
bought by the user. We identified pij as the
relative frequency of purchase of a product that
is characterized by a particular value of
attribute j like in equation (5)
14Recommendation Method - Stage 2 Calculation of
Shannon Entropy
(5)
- Where "n" is the total of bought units of all the
products of the same type and "a" the total of
bought units of those products that have a
certain value of attribute j. In table 2, the
purchases made by a user can be visualized. It is
assumed that the products belong to the same
group.
15Recommendation Method - Stage 2 Calculation of
Shannon Entropy
Step1 The entropy of all the attributes before
including the new product to recommend (P4) is
calculated
16Recommendation Method - Stage 2 Calculation of
Shannon Entropy
Step2 The entropy of all the attributes
including the new product is calculated. In the
example, we included in this step the P4 product
and calculated H
17Recommendation Method Stage 3 Recommendation
- We calculated the difference Abs(H - H') for each
attribute and selected the one whose value is
maximum. In the example, it is the price
attribute with a value of 0.05793.
- If this value is inferior to a e predetermined,
the product will be recommended in opposite
case, it will not be recommended.
18Recommendation Method Stage 3 Recommendation
- If this value is inferior to a e predetermined,
the product will be recommended in opposite
case, it will not be recommended. - The value of this parameter e will depend on the
size of the sample and on the type of values. In
general, we can establish that its value
oscillates between 1/n and 1/nlog2(n) , where
"n" is the total number of cases (bought units)
at the moment for carrying out the
recommendation. - In the case of example, we used the expression
1/nlog2(n)0.05089 as the value of e in which
case we would not recommend the P4 product
because 0.05793 gt 0.05089.
19Index
- Introduction
- Recommendation method
- Case Study
- Diagnostic Method
- Conclusions and Future work
20Case Study
- Case study will be presented, using the data of
the opinions of movies published by the GroupLens
Research Project at the University of Minnesota.
In this case, we have binary attributes with
quantitative opinions. In this paper, it will be
demonstrated that with the same methodology we
can consider binary attributes to characterize
the products (films in this case). A movie
belongs to one or several sorts and has a score
depending on the satisfaction of the user.
21Case Study opinions of movies published by the
GroupLens Research Project at the University of
Minnesota
- In this case, we have binary attributes with
quantitative opinions. In this paper, it will be
demonstrated that with the same methodology we
can consider binary attributes to characterize
the products (films in this case). A movie
belongs to one or several sorts and has a score
depending on the satisfaction of the user. - The study has been made on a sample of 2,234
opinions expressed by different users who have
seen the films. The number of opinions varies
based on the user. The answers of a total of 45
users have been analyzed. For each user, the 80
of the films have been used to find the behavior
of the user and the remaining 20 to test and
evaluate the methodology as a recommendation
tool.
22Case Study opinions of movies published by the
GroupLens Research Project
- 1. Normalization of the values of the opinions of
the users on the films to 0..1. - 2. To use equation (1) to find a value of each
movie, starting from the opinions of the users
who have seen this film. This calculated opinion
simulates the opinion of the expert.
(1)
23Case Study opinions of movies published by the
GroupLens Research Project
- The column Opinion calculated inquires into the
result of the calculation according to the
equation (1). The column no. of cases indicates
the number of times that the film has been scored
with the value of the column Opinion according
to the consulted users. - It is interesting to observe that the calculated
value is always one of allowed values with an
approach of at least 0.001, and that not
necessarily the most frequently voted value is
the one that is obtained (i.e. Copycat).
Table 3 Summary of the results after applying
equation (1).
24Case Study opinions of movies published by the
GroupLens Research Project
- 3. We calculate H for each user and attribute.
For each attribute, in this case the genre, we
calculate pi of each score of each attribute by
each costumer, pi being the frequencies of the
scores of each attribute. For example, in the
case of a costumer who has seen 10 movies of
which 4 are westerns and 6 are comedies, the
analysis of the movies of the sort western in
which two of them has a score of 3, one of them 4
and rest 5, we calculate pi, p3 2/10, p41/10,
p51/10 using equation (4).
(4)
Whe associate a value Hj for every genre of movie
seen by the customer.
25Case Study opinions of movies published by the
GroupLens Research Project
- 4. We establish in this case the value of epsilon
as 1/n where n is the total of films seen by
the customer before adding the movie to
recommend. For every Hj calculated before
incorporating the movie (H) and after
incorporating it (H) we consider the expression
- 5. Rules of recommendation We recommend if
(HDIFj - epsilon)/HDIFj lt 0.15 for every j,
that is, if the relative difference between HDIFj
and epsilon lt 15. We can be more or less
rigorous in the recommendation criterion varying
this percentage.
26Case Study opinions of movies published by the
GroupLens Research Project
- In Table 4, the column ok, is a summary
of the number of films that have been
recommended, and the column number is the total
of films seen by the user registered in the test
file. We can see the percentage of success in the
column pct.ok.
27Case Study opinions of movies published by the
GroupLens Research Project
Figure 2 shows the of successful recommendation
of the users analyzed in Table 4.
28Index
- Introduction
- Recommendation method
- Case Study
- Diagnostic Method
- Conclusions and Future work
29Diagnostic Method
- We present a proposal for the application of the
methodology in the field of medical diagnosis
founded on the results obtained from the
recommendation method. - The defined premises and selection algorithm in
the future are subject to later adjustments in
function with the results obtained in real case
studies.
30Diagnostic Method
- Premises/Definitions
- 1. We have access to the previous correct
diagnoses (di). - 2. Each diagnosis di is characterized by a
determined number of attributes j which
represent the symptoms of the patient. We will
call this group of values, Diagnosis Matrix
(Dij).
31Diagnostic Method
- Premises/Definitions
- 3. Due to the existence of attribute values whose
medical significance are equivalent (for example,
body temperature between 36.5ºC and 37.0ºC), a
previous step must be established to discretely
account for these equivalences.
32Diagnostic Method
- Premises/Definitions
- 4. Each one of the attributes has an associated
value inside a range of discrete allowed values.
For convenience, we normalize these values in the
range 0..1. - 5. Not all of the attributes are defined for a
determined diagnosis. This means that the matrix
Dij contains null elements with undefined
values.
33Diagnostic Method
- Premises/Definitions
- 6. For a determined illness, we have e
associated diagnoses, each one characterized by a
group of attributes (symptoms) j. We will call
this group of illnesses/diagnoses, Diagnoses
Matrix (Dej). Not all elements of this matrix
necessarily have to have defined values.
34Diagnostic Method
- Selection Algorithm
- Step 1. Obtain the relative frequencies fejk of
each value k, attribute j, and illness e
from the matrix Dej of correct diagnoses.
Suppose, for example, the following matrix
35Diagnostic Method
- Step 2. Calculate the Shannon entropy of illness
e (He) from the diagnoses matrix Dej according
to expression
36Diagnostic Method
- Step 3. Given a new diagnosis dj to classify, we
calculate the Shannon entropy He supposing that
the diagnosis corresponds to the illness e.
From all the sampled diagnoses, we consider the
correct diagnosis as the one that minimizes
following expression, whenever it is less than 1
37Diagnostic Method
- where e f(d)/(nlog(n)), n being the number
of correct diagnoses of illness e (i.e., the
number of rows of matrix Dej) and f(d) a function
whose value depends upon the number of attributes
of the new diagnosis with a defined value. In
principle, we can suppose f(d)d.
38Index
- Introduction
- Recommendation method
- Case Study
- Diagnostic Method
- Conclusions and Future work
39Conclusion and Future work
- We have presented a methodology to extract the
knowledge of the tastes of the users based on
their opinions with no need to use any model. - As we have showed in the case study, this can be
done through the analysis of the distribution of
the opinions and the use of a given rules, based
on Shannon Entropy.
40Conclusion and Future work
- Our objective is to homogenize the criteria of
prediction and recommendation of services in
dynamic and heterogeneous environments.
41Conclusion and Future work
- The next step would be to prove this methodology
in a real time e-business application. - It is possible to be integrated to other domains
using the knowledge of each domain by means of
its ontologies and to any type of architecture as
it can be a recommender system based on agents
42Future work
- In the field of diagnosis, verification and
adjustments to the behavior of the method in real
cases of illness diagnosis have yet to be done.
43