Title: Spam and Personal Privacy
1Spam and Personal Privacy
- Presented by Ashley Embry
2(No Transcript)
3Outline
- What is Spam?
- A. Types of Spam
- Where Did the Word spam Originate?
- How Spam Begins A General Explanation
- Who Has the Potential to be a Spammer?
- Statistics About Spam
- Getting Rid of Spam
- Breakdown of a Spam Filter
- Conclusions
- Questions for the class
4What is Spam?
- There are many definitions of spam that are used.
- Electronic junk mail or junk newsgroup postings.
- Any unsolicited automated e-mail.
- Email advertising for some product sent to a
mailing list or newsgroup. - Spam is simply flooding the internet with many
copies of the same message in an attempt to force
the message on people who would not otherwise
choose to receive it.
5Types of Spam
- There are two main types of Spam
- 1. Usenet Spam is aimed at people who read
newsgroups but rarely or never post and give
their information away. - 2. E-mail spam targets individual users with
direct mail messages. E-mail spam lists are
created by scanning Usenet postings, stealing
Internet mailing list, or searching for
addresses.
6Where Did the Word spam Originate?
- The history of calling inappropriate postings in
great numbers spam is from a Monty Python skit
where a couple goes into a restaurant and the
wife tries to get something other than Spam. In
the background there is a group of Vikings who
are singing the praises of Spam. Pretty soon the
only thing that you can hear is - Like the song spam is the endless repetition of
worthless text.
7- Another proposal is that spam was thought of by
a computer lab group at the University of
Southern California, who gave it the name because
it has many of the same characteristics as the
lunch meat Spam. - Nobody wants it or ever asks for it.
- No one ever eats it it is the first item to be
pushed to the side when eating the entrée. - Sometimes it is actually tasty, like the 1 of
junk mail that is really useful to some people.
8How Spam Begins A General Explanation
- Spammers only need access to your address. After
that its just a matter of sending the e-mails. - The primary sources that spammers use are
newsgroups and chat rooms. - The second source used is the Web itself.
Spammers can create search engines that look for
the _at_ sign which indicates an e-mail
address. - The third source is sites created specifically to
attract e-mail recipients. - Win 1 million!!! Just Click Here!
- Would you like news letters form our partners
-
9- Finally, probably the most common source of
e-mail addresses comes from searching the e-mail
servers of large e-mail hosting companies like
Hotmail. - The Hotmail article A Spammers Paradise reads
A dictionary attack utilizes
software that opens a connection to the
mail server and rapidly submits millions of
random e-mail addresses. Many of these
addresses have slight variations, such as
"jdoe1abc_at_hotmail.com" and jdoe2def_at_hotmail.com.
The software then records the address
locations and adds those
addresses to the spammer's list.
These lists are typically resold to
many other spammers .
10Who Has the Potential to be a Spammer ?
- Anyone can be a spammer.
- Scenario
- Lets say your grandmother bakes the best banana
nut bread ever created, and you want to sell the
recipe for 5. - You have 100 people in your personal e-mail
address book. You send out an e-mail advertising, - Big Mommas Nana Nut Bread - only 5 !!!
- From your 100 e-mails you get 2 orders and make
10. - Imagine if you had sent out 1,000,000 e-mails
11Statistics About Spam
- In a single day in May, the No. 1 internet
service provider AOL Time Warner (AOL) blocked 2
billion spam messages88 per subscriberfrom
hitting its customers e-mail accounts. - Microsoft (MSFT) which operates the No.2 service
provider MSN and Hotmail says it blocks an
average of 2.4 billion spams per day.
12Getting Rid of Spam
- Avoid giving out your e-mail address to
unfamiliar or unknown recipients. - Use your e-mail applications filtering features.
- Report the spam e-mailer to the spammers ISP.
- Use spam filtering software.
13Breakdown of a spam filter
- Most spam blockers use filters that search for
commonly used phrases or writing styles that are
overly aggressive and found in mass e-mail
marketing. Spammers try to fool the filters by
changing their writing styles and formats so that
their messages can sneak past the filters. - The best technology currently available to stop
spam is spam filtering software. - The simplest filters use keywords such as xxx,
viagra, etc, but they are also more likely to
block the e-mails that you do want to receive.
14Example
- The more advanced filters, Bayesian filters for
example, take this approach further to
statistically identify spam based on frequency. - An example of how this statistical filtering
works - Start with one collection of spam and one of
nonspam mail, and each collection had about 4000
messages in it. - Scan the entire text of each message of the
collection. - Consider alphanumeric characters, dashes,
apostrophes, and dollar signs to be as part of
tokens (words) and everything else to be a token
separator. (i.e. qt234abc, 75, utt) - Count the number of times each token occurs in
each message. You will end up with two large
tables with each one showing the different tokens
and how many times it appeared in the messages.
15- Finally, create a third table that relates the
token to the probability (ranging from .01 to
.99) that an e-mail containing it is a spam. -
- When new mail arrives now, it is scanned into
tokens, and the fifteen tokens whose
probabilities are the farthest from the neutral
probability of .5 are then used to calculate the
probability that the e-mail is a spam.
16- Algorithms/Program language
- To determine probability of the token being in a
spam - let ((g ( 2 (or (gettable token good) 0 ))
- (b (or (gettable token bad) 0 ))
- (unless (
- (max .01
- (min .99 (float (/
(min 1 (/ b nbad)) - ( (min 1 (/ g
ngood)) - (min 1 (/
b nbad)))))) - To determine if the e-mail is a spam using the
probabilities of the 15 chosen tokens - let ((prod (apply probs)))
- (/ prod ( prod (apply (mapcar
(lambda (x) -
(-1 x)) -
probs))))
17- Example token list with probabilities
-
- madam 0.99
- promotion 0.99
- shortest 0.047225013
- sorry 0.0499
- valuable 0.82347
- information taken from www.paulgraham.com
18Wrapping it Up
- Whether constructing a spam list or implementing
a spam filtering program, spam is based on the
concept and utilization of computer science. -
19Questions for the Class
- By the end of this presentation you should be
able to answer the following question - Name 2 techniques we learned in CIS class that
are used by spammers or in spam filtering. - Pattern-Matching when searching for email
addresses or when evaluating words for spam
tendencies. - Writing algorithms to eventually implement
program. -
20Bibliography
- Before Spam Brings the Web to Its Knees. June
10, 2003. - http.//www.businessweek.com/technology/content/jun
2003/tc20030610_1670_tc104.htm - Brain, Marshall. How Spam Works
- http//computer.howstuffworks.com/spam.htm
- Getting Rid of Spam
- http//www.webopedia.com/DidYouKnow/Internet/2002/
GettingRidofSpam.asp - Graham, Paul. A Plan for Spam. Aug.2002.
- http//www.paulgraham.com/spam.html
21- Mueller, Scott H. What is Spam?
- http//spam.abuse.net/overview/whatisspam.shtml
- Origins of Spam
- http//digital.net/gandalf/spamfaq.htmlitem8c
- Spam July 20, 2004.
- http//www.webopedia.com/TERM/s/spam.html