Title: Agenda
1Agenda
- What is (Web) data mining? And what does it have
to do with privacy? a simple view - Examples of data mining and "privacy-preserving
data mining" - Association-rule mining ( privacy-preserving AR
mining) - Collaborative filtering ( privacy-preserving
collaborative filtering) - A second look at ...privacy
- A second look at ...Web / data mining
- The goal More than modelling and hiding
Towards a comprehensive view of Web mining and
privacy. Threats, opportunities and solution
approaches. - An outlook Data mining for privacy
2Privacy Problems Example 1
- Technical background of the problem
- The dataset allows for Web mining (e.g., which
search queries lead to which site choices), - it violates k-anonymity (e.g. "Lilburn" ? a
likely k inhabitants of Lilburn)
3Where do people live who will buy the Koran soon?
Privacy Problems Example 2
- Technical background of the problem
- A mashup of different data sources
- Amazon wishlists
- Yahoo! People (addresses)
- Google Maps
- each with insufficient k-anonymity, allows for
attribute matching and thereby inferences
4Predicting political affiliation from Facebook
profile and link data (1) Most Conservative
Traits
Privacy Problems Example 3
Trait Name Trait Value Weight Conservative
Group george w bush is my homeboy 45.88831329
Group college republicans 40.51122488
Group texas conservatives 32.23171423
Group bears for bush 30.86484689
Group kerry is a fairy 28.50250433
Group aggie republicans 27.64720818
Group keep facebook clean 23.653477
Group i voted for bush 23.43173116
Group protect marriage one man one woman 21.60830487
Lindamood et al. 09 Heatherly et al. 09
5Predicting political affiliation from Facebook
profile and link data (2) Most Liberal Traits
per Trait Name
Trait Name Trait Value Weight Liberal
activities amnesty international 4.659100601
Employer hot topic 2.753844959
favorite tv shows queer as folk 9.762900035
grad school computer science 1.698146579
hometown mumbai 3.566007713
Relationship Status in an open relationship 1.617950632
religious views agnostic 3.15756412
looking for whatever i can get 1.703651985
Lindamood et al. 09 Heatherly et al. 09
6"Privacy-preserving Web mining" example find
patterns, unlink personal data
- Volvo S40 website targets people in 20s
- Are visitors in their 20s or 40s?
- Which demographic groups like/dislike the
website? - An example of the "Randomization Approach" to
PPDM - R. Agrawal and R. Srikant, "Privacy Preserving
Data Mining", SIGMOD 2000.
7Randomization Approach Overview
50 40K ...
30 70K ...
...
Randomizer
Randomizer
65 20K ...
25 60K ...
...
Reconstruct distribution of Age
Reconstruct distribution of Salary
...
Data Mining Algorithms
Model
8Seems to work well!
9What is collaborative filtering?
- "People like what
- people like them
- like"
- regardless of support and confidence
10User-based Collaborative Filtering
- Idea People who agreed in the past are likely to
agree again - To predict a users opinion for an item, use the
opinion of similar users - Similarity between users is decided by looking at
their overlap in opinions for other items - Next step build a model of user types ? "global
model" rather than "local patterns" as mining
result
111. Privacy as confidentiality"the right to be
let alone" and to hide data
Data
Is this all there is to privacy?
122. Privacy as controlinformational
self-determination
Data
Dont do THIS !
- e.g. data privacy "the right of the individual
to decide what information about himself should
be communicated to others and under what
circumstances" (Westin, 1970) - behind much of data-protection legislation (see
Eleni Kostas talk)
13Discussion item What is this an example
of?Tracing anonymous edits in Wikipedia
http//wikiscanner.virgil.gr/
14Method Attribute matching
15Results (an example)