Title: Reputation Network Analysis for Email Filtering
1Reputation Network Analysis for Email Filtering
- Ravi Emani
- Ramesh Ravindran
2Describes about
- E-mail Scoring mechanism based on a social
network augmented with reputation ratings - Algorithm for inferring reputation ratings
- Integration into a mail application TrustMail
3Preventing Spam
- Trying to prevent spam from even reaching the
users mailbox - Methods
- - Whitelist filters
- - Social Networks
- - Connecting Users
4Whitelist Filters
- Messages accepted according to a list of approved
addresses created by the user - Advantages
- - No spam in users inbox
- - Filters the spam into a low-priority
folder - Disadvantages
- -Extra burden on the user
- -Filters even the valid emails
5Social Networks
- Proposed by Boykin and Roychowdhury
- Social network created from the messages received
by the user - Messages identified as spam, valid or unknown
based on clustering thresholds and structural
properties like the propensity for local
clustering. - Classifies about 50 of users email into spam or
other valid categories
6Optimization
- Extension of whitelisting and social network
based filtering - Uses a network that connects users
- A score of reputation or trust is assigned by
the users to the people they know - Results in a large reputation network connecting
thousands of users - Messages sorted by the score shown next to the
messages in the inbox
7Optimization
- Overcomes the problem of the whitelists
- More reliable than the whitelists even though the
user takes the burden for creating an initial set
of reputation ratings - Less work comparatively
8Creating the Reputation Network
- Uses a Distributed, web based social network
- Reputation rating inferred from one user to
another - Individuals are connected to each person they
rated - Results in a large interconnected network of users
9How is it related to Semantic Web?
- The only requirement is that the individuals
should assert their reputation ratings for one
another in the network - Individuals will be controlling their own data
- Data is maintained in a distributed fashion
- Data can be stored anywhere and integrated
through a common foundation
10Role of Semantic Web...
- Semantic web, along with its component languages
RDF, RDFS, OWL utilize web architecture - Supports distributed data management
- Users create ontologies with classes and
properties and hence instances - The instances of the classes help in describing
the data on the web
11FOAF Project
- Friend-Of-A-Friend project developed on Semantic
Web - An ontological vocabulary for describing people
and their relationships - Extended by providing a mechanism describing the
reputation relationships - Allows people to rate the reputation or
trustworthiness of another person
12Fig The reputation network developed as part of
the semantic web trust project at
http//trust.mindswap.org.
13Algorithms for Inferring Reputation between
Individuals
- Recommendations are made to one person(source)
about the reputation of another person(sink) - Trust and reputation literature contains many
different metrics - These metrics are categorized according to the
perspective used for making calculations
14Perspective in Reputation Inference Algorithms
- Global metrics calculate a single value for each
entity in the network - Local metrics calculate a reputation rating for
an individual in the network - In global system an entity will always have the
same inferred rating - In local system an entity could be rated
differently depending on the node the inference
is made for
15Perspective in Reputation Inference Algorithms
- Global metrics can be highly effective in
situations where the experiences of users are
similar - Local metrics can be appropriate where users
opinions vary about the same topic
E
9
1
D
C
10
10
A
B
16Accurate Metrics for Inferring Reputation
- The inferred rating from the source to the sink
is given by a weighted average of the neighbors
reputation ratings of the sink. - Reputation rating t from source i to sink s
is written as tis - No inference needed if source is directly
connected to the sink - If not, the reputation rating is calculated by
weighted average of the reputation ratings
returned for the sink by each of its n neighbors.
17- getRating(source, sink)
- mark source as seen
- if source has no rating for sink
- denom 0
- num 0
- for each j in neighbors(source)
- if j has not been seen
- denom
- j2sink in(rating(source,j),ge
tRating(j,sink)) - num rating(source,j) j2sink
- mark j unseen
- rating(source,sink) num/denom
- return rating(source,sink)
18Accurate metrics for Inferring Reputation
The concise representation of how tis is weighted
is shown as follows
The condition in this formula ensures that the
source will never trust the sink more than any
intermediate node
19Reputation Metric Evaluation
- To determine the accuracy of this metric
- Reputation rating tij is recorded for each
neighbor j by iterating through each individual
i in the network - Later the connection from i to j is removed and
the reputation rating tij is recorded - The accuracy is measured as tij-tij
20TrustMail A Prototype
- Message Scoring System
- Adds reputation ratings to the folder views of a
message - Helps sort messages accordingly by the user after
he sees the reputation ratings - Highlights the important and relevant messages
21Conclusion and Future Work
- Our algorithm infers reputation relationships in
a network - Benefit - Valid emails from unknown people can
receive high scores because of the connections
within the social network - Future work involves the refinement of the
algorithm for inferring reputation ratings
22Conclusion and Future work
- May involve developing and studying the TrustMail
interface - The number of ratings received will change with
the size of a network - Important issues to be considered
- -Techniques combining best with reputation
filtering - - Percentage of messages accurately scored
23References
- Boykin, P. O. Roychowdhury, V. Personal email
networks an effective anti-spam tool
http//www.arxiv.org/abs/cond-mat/0402143,
(2004). - http//sites.wiwiss.fu-berlin.de/suhl/bizer/SWTSGu
ide/ - RDFWeb FOAF The Friend of a Friend
Vocabulary, http//xmlns.com/foaf/0.1/ - Golbeck, Jennifer, Bijan Parsia, James Hendler,
Trust Networks on the Semantic Web, - Richardson, Matthew, Rakesh Agrawal, Pedro
Domingos. Trust Management for the Semantic
Web, Proceedings of the Second International
Semantic Web Conference, Sanibel Island, Florida,
2003.