Title: Towards Street-Level Client-Independent IP Geolocation
1Towards Street-Level Client-Independent IP
Geolocation
- Yong Wang, UESTC/Northwestern
- Daniel Burgener, Northwestern
- Marcel Flores, Northwestern
- Aleksandar Kuzmanovic, Northwestern
- Cheng Huang, Microsoft Research
http//networks.cs.northwestern.edu
2Problem and Motivation
- How to accurately locate IP addresses on the
Internet? - Host-dependent solutions
- GPS
- WiFi (e.g., Google My Location, Skyhook)
- Host-independent solutions
- Server cannot always expect clients cooperation
- Security / access restrictions
- Online service access analytics
- Location-based online advertising
3A Scenario of Street-Level Online Advertising
Users location
Local Businesses
4Prior Work
- Constrained Based Geolocation ToN 06
- Median error distance 228 km
- Measure delays from active vantage points
- Topology Based Geolocation IMC 06
- Median error distance 67 km
- CBG consider network topological information
- Octant NSDI 07
- Median error distance 35.2 km
- CBG consider routers location, geographical
and demographics information
5Methodology Highlights
- Our methodology is based on two insights
- Websites often provide the actual geographical
location of associated entities - E.g., universities, businesses, government
offices, etc. - Develop methods to determine if web- or e-mail
servers reside at the corresponding locations - Relative network delays highly correlate with
geographical distances - Absolute network delay measurements are
fundamentally limited in their ability to achieve
fine-grained geolocation results
6Institutional Network Example
Web cloud-sourcing
mail server
to external network
web server
router
IP subnet
7The Role of Relative Network Delays
Measured delays
lt
lt
lt
8A Case Study
- Target IP address 38.100.25.196
- Target postal address 1850, K Street NW,
Washington DC, DC, 20006
9Three-Tier Geolocation System
Tier 1
Goal Find the coarse- grained region for the
targeted IP
Create intersection
10Three-Tier Geolocation System
Tier 2
Goal Use passive landmarks to determine
finer-grained region for the targeted IP
Populate the intersection with landmarks
Estimate the delay between landmarks and the
target
D1 D2 lt D3 D4
Create a new intersection
11Three-Tier Geolocation System
Tier 3
Goal Geolocate the target IP using passive
landmarks
Select the landmark with the minimum delay to
the target, and associate the targets location
with it.
Measured distance ? Geographical distance
12Remaining Issues
- Verifying landmarks
- Sweep-out most of the erroneous landmarks
- Errors are still possible!
- Resilience to errors
- The larger the error the more resilient our
method is - We prove that the likelihood that an erroneous
landmark will affect the accuracy is small
13Evaluation
- Three datasets
- Planetlab dataset (Academic)
- Collected dataset (Residential)
- Online Maps dataset (In the wild)
- Factors impact the accuracy
- Landmark density
- Population density
- Access networks
14Dataset Characteristics
Urban areas
Rural areas
The three datasets cover both urban areas and
rural areas.
15Baseline Results
Error distance (km) Planetlab Residential Online Maps The best previous result
Median 0.69 2.25 2.11 35.2
Maximum 5.24 8.1 13.2 276.8
16Landmark Density
Density sequence Planetlab gt Residential gt
Online Maps
The larger the number of landmarks we can
discover in the vicinity of a target, the larger
the probability we will be able to more
accurately geolocate the targeted IP.
17The Role of Population Density
The error distance is smallest in densely
populated areas
The error grows as the population density
decreases
Middle of nowhere
18The Role of Access Networks
Error distance (km) ATT Comcast Verizon
Median 1.68 2.38 1.48
Cable access networks (Comcast) have a much
larger latency variance than DSL networks (ATT
and Verizon)
19Conclusions
- A geolocation system able to geolocate IP
addresses with more than an order of magnitude
better precision than the best previous method - Our methodology consists of two components
- Mining landmarks from the Web and using Web or
E-mail servers as landmarks - Using relative network distances as opposed to
absolute network distances
20Thank You
http//networks.cs.northwestern.edu