Title: Privacy and anonymity
1Privacy and anonymity
- Claudia Diaz
- Katholieke Universiteit Leuven
- Dept. Electrical Engineering ESAT/COSIC
- UPC Seminar, Barcelona, November 24, 2008
2Introducing myself...
- 2000 I.T.S. de Telecomunicaciones, Universidad
de Vigo - Master thesis (PfC) partially done at KULeuven as
Erasmus student - 2000-2001 Pre-doctoral student at COSIC/KULeuven
- 2001-2005 PhD student at COSIC/KULeuven
- 2005-now Post-doc researcher at COSIC/KULeuven
- My work
- Anonymity metrics
- Anonymous communications analysis and attacks
- Traffic analysis
- Stego file systems
- Other PETs private search, negative databases,
... - http//www.esat.kuleuven.be/cdiaz
3Introducing COSIC (1/2)
- Group at the Dept. of Electrical Engineering of
the Katholieke Universiteit Leuven - COSIC COmputer Security and Industrial
Cryptography - Professors Bart Preneel, Ingrid Verbauwhede and
Vincent Rijmen - 5 post-docs (more arriving soon)
- 40 PhD students (more arriving soon)
- 5-15 visitors and external associated researchers
at any given time - Very international 15-25 nationalities, 60
non-Belgians
4Introducing COSIC (2/2)
- Research areas
- Symmetric Key Crypto
- Public Key Crypto
- Secure Hardware Implementations
- Identity Management
- Privacy Technologies
- Mobile Secure Networks
- Software Security
- Involved in 25-30 active projects
- 10 European Projects (FP6 and FP7)
- Coordinator of FP6 NoE E-CRYPT, FP7 NoE E-CRYPT
II, FP7 IP TAS3 - Biggest success Rijndael encryption algorithm
(AES)
5 a few words on the scope of the talk
- New field, in development
- Give an idea on the problems, concepts and
solutions for anonymity
6Outline
- Privacy and anonymity
- Anonymous communications
- Anonymity metrics
- Social networks
7Solove on I have nothing to hide
- "the problem with the nothing to hide argument
is its underlying assumption that privacy is
about hiding bad things. - "Society involves a great deal of friction, and
we are constantly clashing with each other. Part
of what makes a society a good place in which to
live is the extent to which it allows people
freedom from the intrusiveness of others. A
society without privacy protection would be
suffocation, and it might not be a place in which
most would want to live."
8Diffie and Landau on Internet eavesdropping
- Governments are expanding their surveillance
powers to protect against crime and terrorism. - BUT
- Secrecy, lack of transparency, and diminished
safeguards may easily lead to abuses - Will the governments monitoring tools be any
more secure than the network they are trying to
protect? If not, we run the risk that the
surveillance facilities will be subverted or
actually used against the U.S.A. - They conclude Communication is fundamental to
our species private communication is fundamental
to both our national security and our democracy.
Our challenge is to maintain this privacy in the
midst of new communications technologies and
serious national security threats. But it is
critical to make choices that preserve privacy,
communications security and the ability to
innovate. Otherwise, all hope of having a free
society will vanish.
9(No Transcript)
10Privacy properties from a technical point of view
- Anonymity
- Hiding link between identity and action
- Unlinkability
- Hiding link between two or more actions /
identities - Unobservability
- Hiding user activity
- Pseudonymity
- One-time pseudonyms / persistent pseudonyms
- Plausible deniability
- Not possible to prove user knows / has done
something
11The concept of Privacy Solove
- Privacy threats we are trying to protect against
(out of 16 identified by Solove) - Surveillance monitoring of electronic
transactions - Preventive properties anonymity, unobservability
- Interrogation forcing people to disclose
information - Preventive property plausible deniability
- Aggregation combining several sources of
information - Preventive property unlinkability
- Identification connecting data to individuals.
- Preventive properties anonymity and
unlinkability
12Anonymity Data and Communication Layers
App
App
Com
Com
IP
Alice
Bob
13Classical Security Model
- Confidentiality
- Integrity
- Authentication
- Non repudiation
- Availability
Alice
Bob
Eve
Passive / Active
14Anonymity Concept and Model
Set of Alices
Set of Bobs
15Anonymity Adversary
- Passive/Active
- Partial/Global
- Internal/External
Recipient? Third Parties?
16Anonymity Adversary
- The adversary will
- Try to find who is sending messages to whom.
- Observe
- All links (Global Passive Adversary)
- Some links
- Modify, delay, delete or inject messages.
- Control some nodes in the network.
- The adversary's limitations
- Cannot break cryptographic primitives.
- Cannot see inside nodes he does not control.
17Soft privacy enhancing technologies
- Hard privacy
- Focus on data minimization
- Adversarial data holder / service provider
- Soft privacy
- Policies, access control, liability, right to
correct information - Adversary 3rd parties, corrupt insider in honest
SP, errors - BUT user has already lost control of her data
18Other privacy-enhancing technologies
- Anonymous credentials / e-cash / ZK protocols
- Steganography / covert communication
- Censorship resistance techniques
- Anonymous publication
- Private information retrieval (PIR)
- Private search
- K-anonymity
- Location privacy
19Outline
- Privacy and anonymity
- Anonymous communications
- Anonymity metrics
- Social networks
20Concept of Mix collect inputs
Router that hides correspondence between inputs
and outputs
21Concept of Mix mix and flush
Router that hides correspondence between inputs
and outputs
22Functionality of Mixes
- Mixes modify
- The appearance of messages
- Encryption / Decryption
- Sender ? Mix1 Mix2, Rec, msgKMix2KMix1
- Padding / Compression
- Substitution of information (e.g., IP)
- The flow of messages
- Reordering
- Delaying - Real-time requirements!
- Dummy traffic - Cost of traffic!
23Pool Mixes
- Based on the mix proposed by Chaum in 1981
- Collect N inputs
- Shuffle
- Flush (Forward)
- Pool selection algorithm
- No pool / Static pool / Dynamic pool
- Influences the performance and anonymity provided
by the mix - Flushing condition
- Time / Threshold
- Deterministic / Random
Round
24Example of pool mix
Deterministic threshold static pool Mix
Pool 2 Threshold 4
25Stop-and-Go Mix
- Proposed by Kesdogan in 1998
- Reordering strategy based on delaying
- M/M/8
- Delays generated by the user from an Exponential
distribution - Timestamping to prevent active attacks
- Trusted Time Service
- Anonymity estimates based on the assumption of
Poisson incoming traffic
26Network topology
- Mixes are combined in networks in order to
- Distribute trust
- Improve availability
Cascade
Fully connected network
Restricted route network
27Cascades vs Free Route topologies
- Flexibility of routing
- Surface of attack
- Advantage free routes
- Availability
- Advantage free routes
- Intersection attacks
- Advantage cascades (anonymity set smaller but no
partitioning possible) - Trust
- Advantage free routes (more choices available to
user)
28Peer-to-peer vs client-server architectures
- Symmetric vs asymmetric systems
- Surface of attack
- Advantage peer-to-peer
- Liability issues
- Advantage client-server
- Resources / incentives / quality of service
- Advantage client-server
- Availability
- Advantage peer-to-peer
- Sybil attacks
- Advantage? Depending on admission controls (for
peers/servers)
29Deployed systems I
- Anon.penet.fi (Helsingius 1993)
- Simple proxy, substituted email headers
- Kept table of correspondences nym-email
- Brought down by legal attack in 1996
- Type I Cypherpunk remailers (Hughes, Finney1996)
- No tables (routing info in msgs themselves),
- PGP encryption (no attacks based on content)
attacks based on size are possible - Chains of mixes (distribution of trust)
- Reusable reply blocks (source of insecurity)
30Deployed systems II
- Mixmaster (Cottrell, evolving since 1995)
- Fixed size (padding / dividing large messages)
- Integrity protection measures
- Multiple paths for better reliability
- No replies
- Mixminion (Danezis, 2003)
- SURBs (Single-Use Reply Blocks)
- Packet format detection of tagging attacks
(all-or-nothing) - Forward security trail of keys, updated with
one-way functions
31Low-latency applications
- Stream-based instead of message-based
communication - Web browsing, interactive applications, voice
over IP, etc. - Bi-directional circuits
- Ephemeral session keys, onion-encrypted (forward
secrecy) - Real-time requirements
- Delaying not an option
- Proposed systems
- C-S Onion Routing, ISDN, Web Mixes
- P2P Crowds, P5 (broadcast), Herbivore (DC-nets)
- Implemented systems
- TOR, JAP
32Onion encryption
R2
R3
D
R1
R3
R2
R1
D
33Onion Routing
R2
R3
D
R1
D
R3
R2
R1
34TOR adversary model
35Anonymizing http traffic not trivial
- Difficult to conceal traffic pattern
- Difficult to pad
- Lots of padding scalability / cost problem
- Little padding not enough to conceal pattern
- Vulnerable to strong adversaries (entryexit)
- Fingerprinting attacks
- Adversary observes only user side
- Internet exchanges
36Dummy Traffic
- Fake messages/traffic introduced to confuse the
attacker - Undistinguishable from real traffic
- Dummies improve the anonymity by making more
difficult the traffic analysis - Neccessary for unobservability
- Expensive
37(Some) Attacks on mix-based systems
- Passive attacks
- Long-term intersection attacks (statistical
disclosure) - Persistent communication patterns
- Extract social network information
- Traffic correlation / confirmation
- Firgerprinting
- Source separation
- Epistemic attacks (route selection)
- Active attacks
- N-1 attacks
- Sybil
- Tagging
- Replay
- DoS
38Long-Term Intersection Attacks
- Family of attacks with many variants
- Disclosure attack (Agrawal, Kesdogan)
- Hitting set attack (Kesdogan)
- Statistical disclosure attack (Danezis,
Serjantov) - Extensions to SDA (Dingledine and Mathewson)
- Two-Sided SDA (Danezis, Diaz, Troncoso)
- Receiver-bound cover traffic (Mallesh, Wright)
- Assumption
- Alice has persistent communication relationships
(she communicates repeatedly with her friends) - Method
- Combine many observations (looking at who
receives when Alice sends) - Intuition
- If we observe rounds in which Alice sends, her
likely recipients will appear frequently - Result
- We can create a vector that expresses Alices
sending probabilities (a sending profile)
39Outline
- Privacy and anonymity
- Anonymous communications
- Anonymity metrics
- Social networks
40Definition PfiHan2000
- First clear definition of anonymity (2000)
- Anonymity is the state of being not identifiable
within a set of subjects, the anonymity set. - The anonymity set is the set of all possible
subjects who might cause an action or be
addressed. - Anonymity depends on
- The number of subjects in the anonymity set
- The probability distribution of each subject in
the anonymity set being the target
41Example computing anonymity metrics for a pool
mix
p12-1
Recipient R1
Potentially infinite subjects in the recipient
anonymity set
p22-2
Recipient R2
p32-3
Probability of recipient Ri pi2-i
Recipient R3
p42-4
Recipient R4
42Entropy information-theoretic anonymity metrics
DSCP02, SD02
- Entropy measure of the amount of information
required on average to describe the random
variable - Measure of the uncertainty of a random variable
- Increases with N and with uniformity of
distribution - Distribution with entropy H equivalent to uniform
distribution with 2H subjects - Other information theoretic metrics min-entropy,
max-entropy, Rényi entropy, relative entropy,
mutual information, ....
43Combining traffic analysis with social network
knowledge
- On the Impact of Social Network Profiling on
Anonymity DTS-PETS08 - Use of Bayesian inference to combine different
sources of information (SN knowledge, text
mining) - Results
- Combined anonymity decreases with network
growth (if only one source of information is
considered, then growing the network does not
decrease anonymity) - Anonymity degradation as more profiles become
known - ERRORS Bad quality SN profile information
- Small profile errors lead to large errors in the
results - If adversary not completely confident of SN info
? cannot have any confidence on results
44Combinatorial approaches
- Edman et al.
- Consider deanonymization for a system as a whole
(instead of individual users) - Find perfect matching inputs/outputs
- Perfect anonymity for t messages t! equiprobable
combinations - GTDPV-WPES08
- Edman et al.s metric overestimates anonymity if
users send/receive more than one message - Generalization to users sending/receiving more
than one message - Divide and conquer algorithm to compute the
metric - Upper and lower bounds on anonymity (easy to
compute)
45Outline
- Privacy and anonymity
- Anonymous communications
- Anonymity metrics
- Social networks
46A social network approach
- Most anonymity techniques focus on channels and
one-to-one relationships - Mailing lists, twikis, blogs, online social
networks - Many-to-many communication (communities)
- Shared spaces and collective content (group photo
albums, forums) - Moreover, information about third parties (not in
the community) is also leaked (pictures, stories,
references in posts) - This scenario presents new privacy challenges
47The Economics of Mass Surveillance DanWit06
- Assumptions
- If one of the club participants is under
surveillance all information shared in this club,
and the membership list of the club becomes known
to Eve - Eve has a limited budget (number of people it can
put under surveillance)
Clubs
People
48Target selection
- How best to choose those to be put under
surveillance to maximize returns in terms of
observed clubs, people and their relationships? - How does the lack of information, due to the use
of an anonymizing networks, affect the
effectiveness of such target selection? - Data mail archives of political network
2003-2006 - Clubs mailing lists (373)
- People email addresses posting more than 5
times to those mailing lists (2338) - Links number of posts of person over 3 years
(3879)
49With and without anonymizing network
- With
- Eve can only observe aggregated amount of traffic
generated by people - Eve puts under surveillance the nodes that
generate the most volume - Results 50 links known when surveying 5 of
people. To get 80 of links an adversary needs to
observe 30 of people (vs. 3)
- Without
- Eve can observe all traffic flow (not the
content) and thus construct the social graph
(links people-clubs) - Eve chooses to put under surveillance the nodes
with the highest degrees, excluding all spaces
already under surveillance - Result surveilling 8 of the people is enough to
control 100 of the clubs
50The Economics of Covert Community Detection and
Hiding Nag08
- Adversary tries to uncover communities and learn
community membership - Counter-surveillance techniques that can be
deployed by covert groups (e.g., topological
rewiring) - Our study confirms results from previous work
showing that invasion of privacy of most people
is cheap and easy - With certain counter-surveillance strategies
70 of the covert community going undetected
even when 99 of the network is under direct
surveillance.
51Conclusions
- Privacy / anonymity is an area of research
getting increasing attention - Technical, legal, sociological, psychological,
economic, political, philosophical aspects - Economics of privacy
- Crypto little overhead ? lots of security
- Anonymity lots of overhead ? a little bit of
security - Privacy invasion is
- Profitable for businesses
- Increases the power of Government
- High-latency applications (email)
- Problems with persistent user behavior
- Low-latency applications
- Insecure towards strong adversaries
- Anonymous communications are fragile if you want
to propose a new system - Check the literature
- Check known attacks
52(Some) Open Questions
- Do individuals care enough to pay the price of
privacy? Will they in the future? What is privacy
anyway? - Will privacy technology be implemented to
re-establish the tradeoffs and power balances of
the pre-information era? Or will society and
individuals redefine privacy and power tradeoffs
and adapt their behaviour to accomodate a reality
of privacy invasion and surveillance? - Anonymity metrics
- Do we really understand what is anonymity?
- Which is the best way to measure anonymity? Which
are the adequate levels? - Current metrics very limited in application
(e.g., how to measure privacy losses due to
personal data becoming public? - Real user behavioral models
- Models for adversary knowledge
- Solutions for Low-Latency Anonymous
Communication? - Solutions for Social Networks?
53Thank you!Recommended bibliography on the
subjecthttp//www.freehaven.net/anonbib/