Title: Online Data Collection
1Online Data Collection
Web Systems Architecture Protocols UCSC Santa
Cruz Extension Don Krafft don_at_donkrafft.com Jin
g Luan Ph.D jing_at_cabrillo.cc.ca.us
2Overview
- Online Data Collection
- Introduction
- Voluntary Collection
- Online Forms
- Data Gathering
- Data Manipulation
- Data Usage
- Involuntary Collection
- Cookies
- Web Bugs
- Spyware
- Solutions
3Introduction
- The web is a source of information
- In order to exploit information, we sometimes
have to provide it - Information provided is often the cost of using
the internet - Information is used to provide services
- Information is used for marketing purposes to
generate income - Information collection is a major portion of the
web activities that warrants a special study
4Part I
5Voluntary Section
- Voluntary Collection
- Data Gathering
- Online Forms
- Data Manipulation
- Data Usage
- Scope of this presentation
6Modes of Voluntary Data Collection
- Types of online voluntary data collection
- e-commerce (Travel, books, eBay)
- Membership (AOL, Earthlink, Portal sites)
- Survey (product follow-up, CRM, freebees)
- Services (DMV, library)
- Web surfing/FTP
- others
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Form Technology
Data Holding Medium
Internet
?
db
?
?
- Web
- Design Tools
- - FrontPage
- - Dreamweaver
- - ColdFusion
- Web Server
- NT
- Unix
- CGI
- FrontPage Extensions
12(No Transcript)
13Online Data Usage
- CRM (how did we do?)
- Profiling (what you are likely to buy)
- Marketing (what appears in your email else)
- Info. Sharing (beyond mailing list)
- Tracking (related to profiling and marketing)
- Legal/Criminal (digital trail)
- Removal of Barriers (Hello, welcome back!)
14Part II
15Involuntary Data Collection
- Involuntary Data Collection
- Cookie files
- Web Bugs
- Spyware
- Protecting Yourself
- Easy
- More advanced
- Scope is intrusive but not malicious data
collection - This briefing deals primarily with MS Windows
16Dubious Connections
- The data collection game is getting aggressive
- Online advertising is not lucrative and personal
information collection is - The data collectors
- have access to your computer
- can have access to your systems registry
- can download and execute software on your
computer - can correlate data collected online with offline
databases - are aggressive
- Persistence and common sense will provide
reasonable protection from undue intrusion
17Involuntary Data Collection
If they do it without telling you about it then
theres probably a privacy concern (theres a
reason they dont tell you)
18Cookie Files
- What is a cookie?
- A cookie (or persistent cookie) is a small data
file created by a Web server that, with a
Browsers cooperation, is stored on a user's
computer - Cookies can contain personal information as well
as users Web site preferences - The cookies contain a range of URLs for which
they are valid. - How are cookies used?
- Cookies provides a way for the Web site to keep
track of a user's patterns and preferences - A cookie could, for example, save a person from
typing in the same password and other information
all over again on subsequent site visits - When the browser encounters URLs for which it has
cookies, it sends those specific cookies to the
Web server - Cookies can also be used to track a users
movements on the web and possibly to share
personal information with third parties
19Cookie Files
- How are cookies set?
- Set-Cookie NAMEVALUE expiresDATEpathPATH
domainDOMAIN_NAME secure - Example, setting and picking up a cookie?
- Client requests a document, and receives in the
response from the server - Set-Cookie CUSTOMERWILE_E_COYOTE path/
expiresWednesday, 09-Nov-99 231240 GMT - When client requests a URL in path "/" on this
server, the client sends - Cookie CUSTOMERWILE_E_COYOTE
- Example from www.netscape.com
20Cookie Files
- Scenario 1
- Client visits Site A
- Marketing Site B is called to place an ad on a
page and at that time places a cookie on the
client for later retrieval - Marketing Site B picks up the cookie whenever any
of its partner sites are visited, thus tracking
the client - Scenario 2
- Site A collects clients personal data
voluntarily - Site A places a cookie on client addressed to
Site B - Called a 3rd party or illegal cookie (violation
of cookie spec) - Site B is called to place an ad and picks up the
cookie containing personal information
21Index.dat
- Internet Explorer maintains a Browser Index file
- Its a binary file that contains the users
cookies (or a summary of them) and a history of
URLs visited - This file retains the data after the cookies have
been deleted - This file cannot be deleted from within Windows
(can be deleted from DOS) - Microsoft tech support pages contain no
information about it - I cant determine who has read-access to this file
22Browser Data
- Whats available to any site from your web
browser - IP address (this may be dynamic)
- Time
- Last site visited
- Next site visited
- Browser type
- Operating system
23Web Bugs
- Web bugs are embedded objects that are loaded
from a third-party source - Web bugs can track your movements on the web
- Each time a graphic or other object is called
from a web bug site by a web page, the Browser
information is retrieved - A source, such as an advertiser can keep track of
each user or computer that loads the image - Advertisers serving a significant number of the
popular web sites will have access to your
movements about the web - Much of this information is available from Web
Server log files - But advertisers would need the servers log files
- And this is easier and automated
- A site that collects voluntary information may
share it with affiliates
24Web Bugs
- Examples of web bugs (HTML Source Code)
- Generic example
- ltIMG SRChttp//ad.doubleclick.net/activitysrc3
28142typemmticatinvstrordltTimegt? WIDTH1
HEIGHT1 BORDER0gt - www.investorplace.com
- ltIMG SRC"http//ad.doubleclick.net/ad/investorpl
ace.com/sz127x155tile3ord982372051"
border0 height"155" width"127"gtlt/Agt
25Web Bugs
- Request for Service
- Personal Data
B
WS-1
- Service
- Reference SRCWS-2
- WS-1 Cookie
- WS-1 Cookie data?
- Other data
?
WS-2
- Request for graphic
- Browser Data
- Tracking Information
- Third-party cookie?
26Web Bugs
- Web Bugs can be in any type of document
- Web pages
- Email
- Any document that can have a third-party reference
27Spyware
- What is Spyware?
- Spyware is any software which employs a user's
Internet connection in the background without the
users knowledge or explicit permission - Steve Gibson
28Spyware
- It is classified as Spyware if it
- is not explained in advance to the user
- gathers information not expressly required for a
requested service - introduces insecurities (eg Aureate promotes
their ability to secretly download and execute
third party programs) - uses data in ways that are not clearly defined in
the Privacy Statement - does not register with the Operating System and
is not removable with the Add/Remove facility
29Spyware
Radiate's Privacy Disclosure Radiate has
developed and distributed technology which allows
Radiate to send advertising to computers which
have downloaded Radiate's technology as part of
larger programs Radiate's technology exchanges
information with a user's computer, and as part
of this exchange, Radiate collects certain,
nonspecific data about its users and aggregates
its data Radiate is very concerned with the
privacy of its users and, therefore, never sells
data it collects, never collects data uniquely
identifiable without the user's knowledge, and
never combines user-specific data with the
generic data it collects Radiate has taken
steps to ensure that the owners and distributors
of the shareware programs containing Radiate's
codes provide full disclosure of the functioning
of Radiate's codes
30Junos Service Agreement!
- You expressly permit and authorize Juno (ISP) to
- (i) download to your computer one or more pieces
of software designed to perform computations,
which may be unrelated to the operation of the
Service - (ii) run the Software on your computer to perform
and store the results of such computations - (iii) upload such results to Juno's computers
during a subsequent connection, whether initiated
by you in the course of using the Service or by
the Software - ... you agree not to take any action to disable
or interfere with the operation of the Software - you agree to run the computer continuously and
to pay any phone charges associated with
uploading results
31Privacy Statements
- Privacy Statements
- Explain sites general data collection scenarios
- can be changed at will
- are non-binding
- can be abandoned
- When advertising revenue is down
- When a dot-com goes out of business
- do not reveal the use of web bugs
- never reveal that personal information is sold
for profit - Sometimes reveal that that user data is collected
and shared and is not secure - TRUSTe is an industry group that recommends
content of Privacy Statements but does not
discourage the collection and sale of personal
data - Good privacy policy statements are short and
dont start with - Your privacy is important to us
32Purpose of Data Collection
- Involuntary personal data is collected in order
to - Profile individual users and track movements
about the Internet - Improve site content
- Provide customized services
- Billing and shipping
- Target advertising
- Develop a marketable personal information
database - Personal preferences
- Demographic data
- Profit in an competitive environment
- Primary Internet business is often not profitable
- Provide a free service and in the process
- collect and sell personal data
33Solutions
- The easy stuff
- Open a junk email account (eg Hotmail)
- Install and configure a Layer 7 firewall (eg
ZoneAlarm) - programs phoning home
- Manage your Cookies (eg 12Ghosts)
- and your Index.dat file
- Scan for Spyware (eg Ad-aware)
- Clean your registry (eg Jouni Vuorios RegEdit)
- Dont read Spam
- Dont click on banner ads
- Visit newsletter sites
- rather than receiving email newsletters
34Solutions
- Taking it a step further
- Surf through a proxy service (eg websafe)
- Install a router (eg Linksys, Netgear)
- Hardware Layer 3 firewall
- Anonymous IP address
- Also keeps your DSL or cable modem up