Title: The Index Poisoning Attack in P2P File Sharing Systems
1The Index Poisoning Attack in P2P File Sharing
Systems
- Keith W. Ross
- Polytechnic University
2Joint work with
3Internet Traffic
CF CacheLogic
4File Distribution Systems 2005
5Attacks on P2P Decoying
- Two types
- File corruption pollution
- Index poisoning
- Investigated in two networks
- FastTrack/Kazaa
- Unstructured P2P network
- Overnet
- Structured (DHT) P2P network
- Part of eDonkey
6File Pollution
original content
polluted content
pollution company
7File Pollution
pollution server
pollution company
file sharing network
pollution server
pollution server
pollution server
8File Pollution
Unsuspecting users spread pollution !
9File Pollution
Unsuspecting users spread pollution !
Yuck
10Index Poisoning
index title location
bigparty 123.12.7.98smallfun
23.123.78.6heyhey 234.8.89.20
23.123.78.6
123.12.7.98
file sharing network
234.8.89.20
11Index Poisoning
index title location
bigparty 123.12.7.98smallfun
23.123.78.6heyhey 234.8.89.20
23.123.78.6
index title location
bigparty 123.12.7.98smallfun
23.123.78.6heyhey 234.8.89.20bighit
111.22.22.22
123.12.7.98
234.8.89.20
111.22.22.22
12(No Transcript)
13(No Transcript)
14Overnet DHT
- (version_id, location) stored in nodes with ids
close to version_id - (hash_title, version_id) stored in nodes with ids
close to hash_title - First search hash_title, get version_id and
metada - Then search version_id, get location
15Overnet
0001
0011
1111
0100
Publish
Query
1100
0101
Download
1010
1000
16FastTrack Overlay
ON ordinary node SN super node
SN
ON
ON
ON
Each SN maintains a local index
17FastTrack Query
ON ordinary node SN super node
SN
ON
ON
ON
18FastTrack Download
ON ordinary node SN super node
HTTP request for hash value
SN
ON
ON
ON
19FastTrack Download
ON ordinary node SN super node
P2P file transfer
SN
ON
ON
ON
20Attacks How Effective?
- For a given title, what fraction of the copies
are - Clean ?
- Poisoned?
- Polluted?
- Brute-force approach
- attempt download all versions
- For those versions that download, listen/watch
each one - How do we determine pollution levels without
downloading?
21Titles, versions, hashes copies
- The title is the title of song/movie/software
- A given title can have thousands of versions
- Each version has its own hash
- Each version can have thousands of copies
- A title can also have non-existent versions, each
identified by a hash
22Definition of Pollution and Poisoning Levels
- (t, t ?) investigation interval
- V set of all versions of title T
- V1, V2, V3 sets of poisoned, polluted, clean
versions - Cv number of advertised copies of version v
23How to Estimate?
- Need Cv, v?V
- Need V1, V2, V3
- Dont want to download and listen to files!
- Solution
- Harvest Cv, v?V, and copy locations
- Overnet Insert node, receive publish msgs
- FastTrack Crawl
- Heuristic for V1, V2, V3
24Copies at Users
FastTrack
Overnet
25Heuristic
- Identify heavy and light publishers
- Hh set of hashes from heavy publishers
- Hl set of hashes from light publishers
polluted versions
Hh
Hl
clean versions
poisonedversions
26Heuristic More
Heuristic is accurate does not involve any
downloading!
27FastTrack Versions
28FastTrack Copies
29Overnet Copies
30Blacklisting
- Assign reputations to /n subnets
- Bad reputation to subnets with large number of
advertised copies of any title - Obtain reputations locally share with
distributed algorithm - Locally blacklist /n subnets with bad reputations
31Blacklisting More
32The Inverse Attack
- Attacks on P2P systems
- But can also exploit P2P sytems for DDoS attacks
against innocent host
33SummaryThank You!