Title: Geolocation by IP address
1Geolocation by IP address
Sándor Laki lakis_at_inf.elte.hu http//lakis.web.elt
e.hu
2Outline
- RADAR a wireless solution
- IP2Geo on the Internet
- Constraint-Based Geolocation
- (GeoLim)
- OCTANT framework
- Topology-Based Geolocation
3RADAR
4RADAR, a wireless approach
- Focus on the indoor environment
- GPS does not work indoors
- Dedicated technologies
- Goals
- Leverage existing infrastucture
- Use wireless LAN
- Software solution
- Better scalability and lower cost than dedicated
technology
5RADAR
- Key idea Signal strength matching
- Offline calibration
- Construct radio map (ltlocation, Sstrgt)
- Real-time location and tracking
- Extract SStr from beacons
- Find table entry that best matches the measured
SStr
6RADAR Determine location
- Find nearest neighbor in signal space (NNSS)
- 1st solution
- Physical position of NNSS gives the user location
- 2nd solution
- K-NNSS
- Average the coordinates of k nearest neighbor
gives the wanted position
7Correlation between physical location and signal
strength
- Base system
- INFOCOM 2000 paper
- Enhanced system
- Microsoft Technical Report MSR-TR-2000-12
8IP2Geo
- Single-point localization
9IP2Geo - Motivation
- Much focus on location-aware services in wireless
and mobile contexts - Such services are relevant in the Internet
context too - targeted advertising
- event notification
- territorial rights management
- network diagnostics
- It is a challenging problem
- IP address does not inherently contain an
indication of location
10IP2Geo
- Multi-pronged approach that exploits various
properties of the Internet - DNS names of router interfaces often indicate
location - network delay tends to correlate with geographic
distance - hosts that are aggregated for the purposes of
Internet routing also tend to be clustered
geographically - GeoTrack
- determine location of closest router with a
recognizable DNS name - GeoPing
- use delay measurements to estimate location
- GeoCluster
- extrapolate partial (and possibly inaccurate)
IP-to-location mapping information using BGP
prefix clusters
11GeoTrack main idea
- Extract geographical information from DNS names
of routers on the path - Localizes the target to the last router whose
position is known - Example
- ngcore1-serial8-0-0-0.Seattle.cw.net gt Seattle
- 184.atm6-0.xr2.ewr1.alter.net gt New York
- dnvr-scrm.abilene.ucaid.edu gt Denver
12GeoTrack
- GeoTrack operation
- do a traceroute to the target IP address
- determine location of last recognizable router
along the path - Key ideas in GeoTrack
- partitioned city code database to minimize chance
of false match - ISP-specific parsing rules
- delay-based correction
- Limitations
- routers may not respond to traceroute
- DNS name may not contain location information or
lookup may fail - target host may be behind a proxy or a firewall
13GeoPing - Delay based localization
- Delay-based triangulation is conceptually simple
- delay to distance
- distance from 3 or more non-colinear points gt
target location - But there are practical difficulties
- network path may be circuitous
- transmission queuing delays may corrupt delay
estimate - OWD is hard to measure
- OWD ? RTT/2 because of routing asymmetry
14GeoPing - details
- Measure the network delay to the target host from
several geographically distributed probes - typically more than 3 probes are used
- round-trip delay measured using ping utility
- small-sized packets gt transmission delay is
negligible - pick minimum among several delay samples
- Nearest Neighbor in Delay Space (NNDS)
- akin to Nearest Neighbor in Signal Space (NNSS)
in RADAR - construct a delay map containing (delay
vector,location) tuples - given a vector of delay measurements, search
through the delay map for the NNDS - location of the NNDS is our estimate for the
location of the target host - More robust that directly trying to map from
delay to distance
15GeoPing Delay tends to increase with geographic
distance
16GeoPing Estimation error
17GeoCluster
18GeoCluster
- A passive technique unlike GeoTrack and GeoPing
- Basic idea
- breaks the IP address space into clusters
- assign a geographical location to each cluster
based on IP-to-location third party databases - given a target IP address, first find the
matching cluster using longest-prefix match. - location of matching cluster is our estimate of
host location
19GeoCluster
- Example
- consider the cluster 128.95.0.0/16 (containing
65536 IP addresses) - suppose we know that the location corresponding
to a few IP addresses in this cluster is Seattle - then given a new address, say 128.95.4.5, we
deduce that it is likely to be in Seattle too
20GeoCluster Clustering IP addresses
- Exploit the hierarchical nature of Internet
routing - inter-domain routing in the Internet uses the
Border Gateway Protocol (BGP) - BGP operates on address aggregates
- we treat these aggregates as clusters
- in all we had about 100,000 clusters of different
sizes
21IP-to-location mapping
- Data sources
- e-mail service, business web-hosting companies,
etc. - requires a large, fine-grain and fresh database!
- Information
- partial information (i.e., only for a small
subset of addresses) - possibly inaccurate (e.g., manual input from
user)
22Extrapolating IP-to-location mapping
- Determine location most likely to correspond to a
cluster - majority polling
- average location
- dispersion is an indicator of our confidence in
the location estimate - What if there is a large geographic spread in
locations? - some clusters correspond to large ISPs and the
internal subdivisions are not visible at the BGP
level - sub-clustering algorithm keep sub-dividing
clusters until there is sufficient consensus in
the individual sub-clusters - some clients connect via proxies or firewalls
(e.g., AOL clients) - sub-clustering may help if there are local or
regional proxies - otherwise large dispersion gt no location
estimate made - many tools fail in this regard
23Performance of GeoCluster
Median errors GeoCluster 30km GeoPing
300km GeoTrack 100km
24Other database-oriented applications
- NetGeo and IP2LL
- based on WHOIS DB
- not closely regulated
- the address information often indicates the head
office of the owner which may be far from the
actual target - Quova
- Commercial service with thier own database
- Gtrace
- using DNS LOC entries
25Octant framework
- A very impressive solution
26Octant overview
- Combine very different techniques
- Active and passive
- Constraint-based
- Weighted positive and negative constraints
- Constraint gt region
- Using Bézier-regions
- Efficient implementations of clipping and union
operations are available
27Octant - Notations
- bi the region in which the target node is
located - gj a constraint
- It is a region where the node might be reside
associated with weight - Set of nodes
- Landmarks physical locations are at least
partially known (Lj) - Every Lj has an estimated location bLj
28Octant Landmarks and constraints
- Primary landmark
- GPS, street address
- Low error
- Secondary landmark
- Position computed by Octant itself
- Positive constraints ( set ? )
- Node A is within d miles of Lk
- g ?(x,y) in bk c(x,y,d), where c(x,y,d) is a
disc. - Negative constraints( set ? )
- Node A is further than d miles from Lk
- g ?(x,y) in bk c(x,y,d)
29Estimated location
30Mapping latencies to distances
- Latency between a target and a landmark
- bounds thier maximum distance
- Calculate with speed of light
- delay2/3c
- Low precision
- Octants way
- Dynamic calibration
- For each L landmark compute two bounds RL(d) and
rL(d) - where d is the ping time of node i
- rL(d) ? loc(L) loc(i) ? RL(d)
- When queuing delays are dominant then rL(d) 0.
31Mapping latencies to distances
- Each landmark periodically pings all other
landmarks gt creating a correlation table - Determines the convex hull around the points gt
R(d) and r(d) - It is sufficient when the target has a direct and
congestion-free path to the landmark - Octant introduce a cut off at latency p
- a tunable percentage of landmark lie to the left
of p - discard the others
- (z is a fictitious datapoint,
- placed far away)
32Mapping latencies to distances
33Last hop delays
- Mapping is further complicated by queuing and
transmission delays associated with the last hop - Cable and DSL connections
- Overloaded PlanetLAB nodes
- Goal isolate the delay components which
artificially inflate latencies - Detailed maps of the underlying physical network,
as in network tomography (not in Octant) - Octant introduce a simple metric called height
34Last hop delays in Octant
- Based on pair-wise latency measurements between
landmarks - Primary landmarks a, b, c
- Measure thier latencies(RTT) a,b, a,c, b,c
- The positions of primary landmarks are known -gt
we can estimate the transmission delays (a,b),
(a,c), (b,c) - Lasthop delay(a,b) a,b - (a,b)
- Landmark coordinates (alon, alat),
35Last hop delays in Octant
- How much of the delays can be attributed to each
landmark? - Denoted by a, b and c // height
- Similarly, for a target t, we can compute t, as
an estimation - We can solve for
- t, tlon, tlat
36Last hop delays
- tlon and tlat has relatively high error
- not used in the later stages
- Given the target and landmark heights
- Each landmark can shift its
- RL up if t lt heights of the other landmarks
- rL down if t gt heights of the other landmarks
37Indirect routes
- The preceding assumption
- Route lengths are proportional to great circle
distances - not the case in practise, due to policy routing
- Example a subscriber Ithaca, NY -gt Cornell Univ.
(Ithaca) - Syracuse, NY -gt Brockport, IL -gt New York City -gt
Cornell Univ. - 1 mile physical distance VS. 800 miles length
path
38Indirect routes discovery
- Landmarks heigth can indicate
- Localizing routers on the network path
- Secondary landmarks
- Localization by latencies
- Extract location from router names
- Reverse DNS lookup undns tool
- Using ZIP code to determine geographical location
39Handling uncertainty
- Filter out errorneous constraint
- Latency based constraints
- Weight system that decreases exponentially with
increasing latency - Weight threshold
40Iterative refinement
- Two phase
- First, we use accurate and mostly conservative
constraints - Second, less acurate and more aggressive
constraints to obtain a better estimation (inside
the initial estimated region)
41Results
42Results
43Topology-based Geolocation
44Motivations
- Problems with CBG
- Use constraints that are less than speed of light
- Risk of underestimates
- When an underestimate occurs, the final region
does not contain the true location - Topology based geolocation
- using the speed of light to generate constraints
- inspired by Sensor Network Localization
45Summary of techniques
- Traceroute from landmarks
- Map topology
- Estimate hop latency
- Improve accuracy
- Cluster network interfaces
- Increase structuring
- Validate location hints
- Incorporate location hints
- Constraint optimization
- Geolocate targets
46Estimate hop latencies
- Using traceroute tool to infer link latency
- Estimate hop latency from the difference in RTT
to adjacent routers - Accurate only if the link is traversed both
directions (symmetric routing) - How can we discover this property?
- Three different techniques
47Estimate hop latencies
- First, observing the reverse TTL values
- Most routers initialize the TTL values for thier
packets from a small set. - 30,32,64,128,150,255
- If TTL values changes significantly from one node
to the next gt discard the link estimate - Second, measuring paths in both direction between
pairs of landmarks - If both paths traverse a particular link gt
taking the differences of measurements to the two
endpoints - This estimation has high confidence
- Third, increasing vantage points from which we
probe a certain link - For every link on a path from a landmark we
probes to both endpoints from all other
landmarks - If these probes pass over the link gt estimate
for the link
48Clustering interfaces
- Clustering interfaces that belong to the same
router (IP aliases)
49Clustering interfaces
- Two IP-aliases techniques
- Mercator
- UDP probes are send to high-numbered ports on a
set of interfaces - Routers send back a port-unreachable ICMP message
with the source address - If two diff. interfaces replie with the same
source address gt aliases - Ally
- Used on pairs of interfaces
- Sends probes to the two if.
- Examines the IP-ID
- Most routers generate the IP-ID using a single
counter that has incremented after each packet
has been created
50Validating location hints
- DNS names -gt locations
- Some names are incorrect
- Missnamed, reconfig, reassignment of IP addresses
- Topology constraints can be used to verify
location hints - RTT measurements -gt upper bounds
- Clustering -gt aliases
- Hop latencies
51Constraint optimization
- TargetsX xi LandmarksL li
- Distance bw i and j d(i,j)
- Hard delay constraint
- D(li xj) lt cij // cij Speed of
light - Set of hdc Cd
- Soft Link Latency Constraints
- Hop latency bw i and j hij
- D(xi,xj) hij eij
- Where eij is some error
- Set of SLLC Cl
52Constraint optimization
- Minimize ?i,j in Cleij
- Subject to Cd , Cl
- Not a convex optimization problem
- But we can recast it as a semidefinite program
- Using fast solvers
- SeDuMi
- Vivaldi
53Results
54Results
55References
- RADAR and IP2Geo
- http//eris.prakinf.tu-ilmenau.de/res/papers/coop
Streaming/padmanabhan01Locating.pdf - Bernard Wong, Ivan Stoyanov and Emin Gün Sirer.
- Octant A Comprehensive Framework for the
Geolocalization of Internet Hosts.In Proceedings
of the Symposium on Networked System Design and
Implementation, Cambridge, Massachusetts, April
2007. - Katz-Bassett, E., John, J. P., Krishnamurthy, A.,
Wetherall, D., Anderson, T., and Chawathe, Y. - Towards IP geolocation using delay and topology
measurements. - In Proceedings of the 6th ACM SIGCOMM on
internet Measurement (Rio de Janeriro, Brazil,
October 25 - 27, 2006). IMC '06. ACM Press, New
York, NY, 71-84.